FWIW, I ended up writing some code that does a best effort turning the named list into a dict representation, if it can't, it'll keep it as a python tuple.
def every_other_zipped(lst): return zip(lst[0::2],lst[1::2]) def dictify(nl_tups): """ Return dict if all keys unique, otherwise dont modify """ as_dict = dict(nl_tups) if len(as_dict) == len(nl_tups): return as_dict return nl_tups def parse_named_list(lst): shallow_tups = [tup for tup in every_other_zipped(lst)] nl_as_tups = [] for tup in shallow_tups: if isinstance(tup[1], list): tup = (tup[0], parse_named_list(tup[1])) nl_as_tups.append(tup) return dictify(nl_as_tups) if __name__ == "__main__": solr_nl = [ "D100000", [ "uniqueKey", "D100000", "body", [ "1", [ "positions", [ "position", 92, "position", 113 ], "2", [ "positions", [ "position", 22, "position", 413 ] ]]]]] print(repr(parse_named_list(solr_nl))) Outputs { 'D100000': { 'uniqueKey': 'D100000', 'body': { '1': { 'positions': [('position', 92), ('position', 113)] }, '2': { 'positions': [('position', 22), ('position', 413)] } } } } On Thu, Feb 6, 2020 at 12:59 PM Edward Ribeiro <edward.ribe...@gmail.com> wrote: > Python's json lib will convert text as '{"id": 1, "id": 2}' to a dict, that > doesn't allow duplicate keys. The solution in this case is to inject your > own parsing logic as explained here: > > https://stackoverflow.com/questions/29321677/python-json-parser-allow-duplicate-keys > > One possible solution (below) is to turn the duplicate keys into key-list > pairs > > from json import JSONDecoder > > jsonStr = '{"positions": {"position": 155,"position": 844,"position": > 1726}}' > > def dict_treat_duplicates(ordered_pairs): > d = {} > for k,v in ordered_pairs: > if k in d: > # duplicate keys > prev_v = d.get(k) > if isinstance(prev_v, list): > # append to list > prev_v.append(v) > else: > # turn into list > new_v = [prev_v, v] > d[k] = new_v > else: > d[k] = v > return d > decoder = JSONDecoder(object_pairs_hook=dict_treat_duplicates) > decoder.decode(jsonStr) > > will give you {'positions': {'position': [155, 844, 1726]}}, while > > def dict_raise_on_duplicates(ordered_pairs): > return ordered_pairs > > will give you [('positions', [('position', 155), ('position', 844), > ('position', 1726)])] > > Best, > Edward > > On Thu, Feb 6, 2020 at 1:57 PM Doug Turnbull < > dturnb...@opensourceconnections.com> wrote: > > > > Well that is interesting, I did not know that! Thanks Walter... > > > > > > https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object > > > > I gave it a go in Python (what I'm using) to see what would happen, > indeed > > it gives some odd behavior > > > > In [4]: jsonStr = ' {"test": 1, "test": 2} ' > > > > > > In [5]: json.loads(jsonStr) > > > > Out[5]: {'test': 2} > > > > On Thu, Feb 6, 2020 at 11:49 AM Walter Underwood <wun...@wunderwood.org> > > wrote: > > > > > Repeated keys are quite legal in JSON, but many libraries don’t support > > > that. > > > > > > It does look like that data layout could be redesigned to be more > portable. > > > > > > wunder > > > Walter Underwood > > > wun...@wunderwood.org > > > http://observer.wunderwood.org/ (my blog) > > > > > > > On Feb 6, 2020, at 8:38 AM, Doug Turnbull < > > > dturnb...@opensourceconnections.com> wrote: > > > > > > > > Thanks for the tip, > > > > > > > > The issue is json.nl produces non-standard json with duplicate keys. > > > Solr > > > > generates the following, which json lint fails given multiple keys > > > > > > > > { > > > > "positions": { > > > > "position": 155, > > > > "position": 844, > > > > "position": 1726 > > > > } > > > > } > > > > > > > > On Thu, Feb 6, 2020 at 11:36 AM Munendra S N < > sn.munendr...@gmail.com> > > > > wrote: > > > > > > > >>> > > > >>> Notice the lists, within lists, within lists. Where the keys are > > > adjacent > > > >>> items in the list. Is there a reason this isn't a JSON dictionary? > > > >>> > > > >> I think this is because of NamedList. Have you tried using json.nl > =map > > > as > > > >> a > > > >> query parameter for this case? > > > >> > > > >> Regards, > > > >> Munendra S N > > > >> > > > >> > > > >> > > > >> On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull < > > > >> dturnb...@opensourceconnections.com> wrote: > > > >> > > > >>> Hi all, > > > >>> > > > >>> I was curious if anyone had any tips on parsing the JSON response > of > > > the > > > >>> term vectors component? Or anyway to force it to be more standard > JSON? > > > >> It > > > >>> appears to be very heavily nested and idiosyncratic JSON, such as > > > below. > > > >>> > > > >>> Notice the lists, within lists, within lists. Where the keys are > > > adjacent > > > >>> items in the list. Is there a reason this isn't a JSON dictionary? > > > >> Instead > > > >>> you have to build a stateful list parser that just seems prone to > > > >> errors... > > > >>> > > > >>> Any thoughts or ideas are very welcome, I probably just need to do > > > >>> something rather simple here... > > > >>> > > > >>> "termVectors": [ > > > >>> "D100000", [ > > > >>> "uniqueKey", "D100000", > > > >>> "body", [ > > > >>> "1", [ > > > >>> "positions", [ > > > >>> "position", 92, > > > >>> "position", 113 > > > >>> ] > > > >>> ], > > > >>> "10", [ ... > > > >>> > > > >>> -- > > > >>> *Doug Turnbull **| CTO* | OpenSource Connections > > > >>> <http://opensourceconnections.com>, LLC | 240.476.9983 > > > >>> Author: Relevant Search <http://manning.com/turnbull> > > > >>> This e-mail and all contents, including attachments, is considered > to > > > be > > > >>> Company Confidential unless explicitly stated otherwise, regardless > > > >>> of whether attachments are marked as such. > > > >>> > > > >> > > > > > > > > > > > > -- > > > > *Doug Turnbull **| CTO* | OpenSource Connections > > > > <http://opensourceconnections.com>, LLC | 240.476.9983 > > > > Author: Relevant Search <http://manning.com/turnbull> > > > > This e-mail and all contents, including attachments, is considered to > be > > > > Company Confidential unless explicitly stated otherwise, regardless > > > > of whether attachments are marked as such. > > > > > > > > > > -- > > *Doug Turnbull **| CTO* | OpenSource Connections > > <http://opensourceconnections.com>, LLC | 240.476.9983 > > Author: Relevant Search <http://manning.com/turnbull> > > This e-mail and all contents, including attachments, is considered to be > > Company Confidential unless explicitly stated otherwise, regardless > > of whether attachments are marked as such. > -- *Doug Turnbull **| CTO* | OpenSource Connections <http://opensourceconnections.com>, LLC | 240.476.9983 Author: Relevant Search <http://manning.com/turnbull> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.