Will 33137706c2 pattern matcher | 3 months ago | |
---|---|---|
.. | ||
README.org | 3 months ago | |
edgecase.xml | 3 months ago | |
walkthrough.txt | 3 months ago | |
xml.k | 3 months ago |
This is a very raw XML parser for ngn/k. More proof of concept than viable code, it may still be usable for simple extraction.
In its current state possibly the only useful entry point is xml.parse
which takes a string and
returns a pair of lists. The first list is the parent vector and the second is list of
dictionaries representing the nodes. (I'll probably revisit this representation at a later date.)
Nodes are either text nodes which look like this:
`tp`loc`len`cnt!(`txt;1520;29;,(`txt;"Fri, 30 May 2003 11:06:42 GMT"))
Or tag nodes which look like this:
`tp`nm`loc`len`attrs!(`tag;"/pubDate";1135;10;())
As I say, this is very raw. Currently there is no attempt at substituting entity types such as
<
, nor for that matter parsing a DTD whatsoever. Some of this code could probably be used
for such a project though.
This does handle CDATA
, though as well as <!-- -->
style comments. The contents of both of
which are taken as raw text and not parsed in any way.
I hope to add running examples of this shortly.
Parsing takes place in several phases:
As an experiment, I've added a script walking through bits of the code which can be used with the tutorial script.
I hope to get back to this but it's been on the shelf for a while. Some ideas: