This is work in progress.
BUG WARNING: according to the XML specs, <foo bar=">"/> is a well-formed XML document, while <foo bar="<"/> is not. These are not parsed correctly by the current version of PiXPull.Parser. Will be fixed soon.
I am making available a couple of Pike modules for dealing with XML. Currently they are neither complete in any sense, nor well-tested; but hopefully already somewhat useful. Any bugfixes, suggestions for improvement, even flames about my coding style, are most welcome.
The modules contain lots of AutoDoc comments, but I haven't yet figured out how to generate pretty HTML doc pages for standalone Pike modules. Any help on this will be appreciated.
In case you want to use this code in your own apps, consider it free software, provided under the terms of the GNU LGPL.
These modules and scripts were coded against Pike 7.4. Please let me know
should they fail when used with 7.5.
They will not work currently with Pike 7.2, let me know if there's
demand for fixing this.
Being added: sample scripts demonstrating usage of these modules.
PiXPull.pmod
aims to be a Pike implementation of an API not unlike
the one described and advocated at http://xmlpull.org/
(which was designed for Java). In brief, it is a streaming XML parser, based on
the "pull" parsing model rather then the more usual event-callback
scheme (SAX, etc.). Currently it neither validates XML nor parses DOCTYPE
declarations (merely skips them), and is not fully compliant with the XML
spec in detecting violations of well-formedness (many checks are missing
that are somewhat expensive to do in Pike). XML Namespaces are not implemented
(this is optional in the XMLPull API), but shouldn't be too hard to add (and are on the
TODO list).
Other bugs/omissions:
It does however seem to work OK for about all XML input files I tried ;-)
Being written entirely in Pike, PiXPull is not lightning-fast... but it does not seem to be significantly slower than pure Java implementations (and is of course no match for compiled native code...)
In addition to the main Parser
class, the module includes a simple
XmlSerializer
for XML output streaming. Both classes are mostly
compliant with (a sizeable subset of) the XmlPull
API (version 1), to the extent that the API ports from Java to Pike.
PiXPull.pmod
can be downloaded here.
The sample scripts expect the PiXTools
modules in the same
directory as the script; if you have read this far, you will obviously know
to edit the import
statement(s) according to your liking.
testXPull
: a trivial script that
simply tallies the events observed by a PiXPull.Parser
and prints
some summary output.
Usage:
If given the option -n
the script will use the
next()
API, otherwise the parsing will proceed by calls to
nextToken()
. A single non-option argument may be given, this
is the name of the XML file to parse; if none, standard input will be read.
Sample output:
$ ./testXPull much_ado.xml Input encoding used: utf-8 Parsing time: 0.3267s Characters read: 195263 PARSE_ERROR : 0 END_DOCUMENT : 1 START_DOCUMENT : 1 START_TAG : 4727 END_TAG : 4727 TEXT : 9421 ENTITY_REF : 3 CDSECT : 0 COMMENT : 0 DOCDECL : 1 IGNORABLE_WHITESPACE : 3 PROCESSING_INSTRUCTION : 0
slashnews
: get your hourly fix of slashdot.org headlines on your terminal or text
console -- no web browser required ;-) Note that it works best with a huge terminal window
(such as I use: a full-screen KDE Konsole or xterm).
Caveat: this script fetches slashdot.xml
every time it is run, and Slashdot says don't fetch this too often or your IP will be blocked
(but AFAIK they never actually do this, well maybe if you are really obnoxious...)
PiXTree.pmod
is a rough sketch of an interface for generating XML
output. If PiXPull is pre-alpha, this is just an experimental prototype.
Your feedback (if any ;-) will tell me whether this is at all a good idea.
The general idea is to provide a simple API for building an XML tree in memory, with liberal use of operator overloading to streamline the syntax of most common operations.
Download PiXTree.pmod
.
Accessed: by
Last modified: