Quantcast
Channel: python: how do I parse a stream of json arrays with ijson library - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Answer by bbc for python: how do I parse a stream of json arrays with ijson library

$
0
0

You can use json.JSONDecoder.raw_decode to walk through the string. Its documentation indeed says:

This can be used to decode a JSON document from a string that may have extraneous data at the end.

The following code sample assumes all the JSON values are in one big string:

def json_elements(string):
    while True:
        try:
            (element, position) = json.JSONDecoder.raw_decode(string)
            yield element
            string = string[position:]
        except ValueError:
            break

To avoid dealing with raw_decode yourself and to be able to parse a stream chunk by chunk, I would recommend a library I made for this exact purpose: streamcat.

def json_elements(stream)
    decoder = json.JSONDecoder()
    yield from streamcat.stream_to_iterator(stream, decoder)

This works for any concatenation of JSON values regardless of how many white-space characters are used within them or between them.

If you have control over how your input stream is encoded, you may want to consider using line-delimited JSON, which makes parsing easier.


Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>