Will 33137706c2 pattern matcher | 4 months ago | |
---|---|---|
.. | ||
README.org | 4 months ago | |
json.k | 4 months ago |
ngn/k has a built in json parser which is many times faster than this. So why have one written in K?
First, K is interpretted and so it's fundamentally easier to modify. Also, since K objects don't match 1-1 with json objects (no symbols in json, no distinct boolean types in K, etc.) there are necessarily choices made in any implementation of a parser.
All functions are created under the json
namespace. The primary API points are tree
and
prs
. prs
takes a string and returns a K "object". That is to say if the toplevel json
structure is a list prs
returns a list and if its an "object" prs
returns a dict.
tree
returns a list of three items: the nodes, the parent vector and the left vector. This
amounts to an array representation of a tree, which I've taken to calling Apter trees though I
haven't done any scholarship as to the actual origins. There's likely more information available
in Aaron Hsu's thesis, but I haven't been through it.
Most people will probably make use of the prs
interface, but having the tree
available opens
up possibilities for alternative traversals.
From the tree representation, the nodes of interest will be "{"
, "["
and ":"
, which serve
as root nodes for objects, lists and pairs where an object is essentially just a list of
pairs. From any object node, you can use the parent vector to find an array of pairs within it.
You can use the left vector to discover the default ordering of such pairs.
As mentioned above it's quite a bit slower than the built-in json parser. When I last checked it was a couple of magnitudes slower for reasonable sized json files and quite a bit slower for much larger files. This last is because it doesn't handle memory terribly efficiently. If possible, you should break up very large json files in to smaller chunks for processing.
Actually, it should be possible to turn this into a version which operates on the file in chunks, but I haven't done any work in this direction. Another possibility is to generate the tree without doing any split at all.
The version of pr
here was offered by mlochbaum as discussed on
the J mailing list. It was quite a bit faster than the naive version I had originally. ngn also
offered a fast version which has the potential to be more useful for a chunked version, since it
uses a fold which can be resumed. From that thread you can see contributions by @ovs and
@chrispsn to make other speed enhancements over the orginal version.