Anthony Liguori <[email protected]> writes: > Markus Armbruster <[email protected]> writes: > >> Anthony Liguori <[email protected]> writes: >> >>> Markus Armbruster <[email protected]> writes: >>> >>>> If you think I'm exaggerating, check out the list of issues in PATCH >>>> 3/9. >>> >>> You are not. >>> >>> However, I think we can drop the whole thing and just use the JSON >>> module in Python. The bit below seems to work: >>> >>> import json.decoder, re >>> from ordereddict import OrderedDict >>> >>> WHITESPACE = re.compile(r'(#.*\n|[ \r\t\n]*)*', re.MULTILINE) >>> >>> def make_object(pairs): >>> return OrderedDict(pairs) >>> >>> def qapi_parse(data): >>> _w = WHITESPACE.match >>> idx = 0 >>> while idx < len(data): >>> idx = _w(data, idx).end() >>> if idx == len(data): >>> break >>> decoder = json.decoder.JSONDecoder(object_pairs_hook=make_object) >>> obj, idx = decoder.raw_decode(data, idx) >>> yield obj >>> >>> if __name__ == '__main__': >>> with open('qapi-schema.json', 'r') as fp: >>> data = fp.read().replace("'", '"') >>> >>> exprs = list(qapi_parse(data)) >>> print exprs >> >> I tried to find a way to use JSONDecoder, but not hard enough, >> apparently. >> >> The fp.read().replace("'", '"') is no good, because it blindly replaces >> within strings, such as 'the cat\'s meow'. >> >> Can your code handle comments between arbitrary tokens? I suspect they >> work only between top-level expressions, but I could be wrong; Python >> isn't my strongest language, and I didn't test this. > > It cannot. The python JSON module has an optimized C implementation > which is less flexible than the python one. Unfortunately it looks like > at least in my copy of python, the python version has bitrotted. > > OTOH, the warnings are very clear when you attempt to do this.
Outlawing comments within expressions feels very wrong to me. An example of such a comment is in my [PATCH v2 3/3] qapi: Rename ChardevBackend member "memory" to "ringbuf". Since JSON's lexical structure is so simple, you could strip comments with a simple state machine, or maybe with a (not so simple) regexp. Basically a stripped down JSON lexer. Then using JSONDecoder still saves maintaining a parser, but not a lexer. You'd trade maintaining a pretty trivial parser for maintaining not so trivial JSONDecoder glue.
