FWIW, I ended up writing some code that does a best effort turning the
named list into a dict representation, if it can't, it'll keep it as a
python tuple.

def every_other_zipped(lst):
    return zip(lst[0::2],lst[1::2])

def dictify(nl_tups):
    """ Return dict if all keys unique, otherwise
        dont modify """
    as_dict = dict(nl_tups)
    if len(as_dict) == len(nl_tups):
        return as_dict
    return nl_tups

def parse_named_list(lst):
    shallow_tups = [tup for tup in every_other_zipped(lst)]

    nl_as_tups = []

    for tup in shallow_tups:
        if isinstance(tup[1], list):
            tup = (tup[0], parse_named_list(tup[1]))
        nl_as_tups.append(tup)
    return dictify(nl_as_tups)



if __name__ == "__main__":
    solr_nl =  [
"D100000", [
"uniqueKey", "D100000",
"body", [
"1", [
"positions", [
"position", 92,
"position", 113
],
"2", [
"positions", [
"position", 22,
"position", 413
]
]]]]]
    print(repr(parse_named_list(solr_nl)))



Outputs

{
'D100000': {
'uniqueKey': 'D100000',
'body': {
'1': {
'positions': [('position', 92), ('position', 113)]
},
'2': {
'positions': [('position', 22), ('position', 413)]
}
}
}
}


On Thu, Feb 6, 2020 at 12:59 PM Edward Ribeiro <edward.ribe...@gmail.com>
wrote:

> Python's json lib will convert text as '{"id": 1, "id": 2}' to a dict, that
> doesn't allow duplicate keys. The solution in this case is to inject your
> own parsing logic as explained here:
>
> https://stackoverflow.com/questions/29321677/python-json-parser-allow-duplicate-keys
>
> One possible solution (below) is to turn the duplicate keys into key-list
> pairs
>
> from json import JSONDecoder
>
> jsonStr = '{"positions": {"position": 155,"position": 844,"position":
> 1726}}'
>
> def dict_treat_duplicates(ordered_pairs):
>      d = {}
>      for k,v in ordered_pairs:
>          if k in d:
>             # duplicate keys
>             prev_v = d.get(k)
>             if isinstance(prev_v, list):
>                     # append to list
>                     prev_v.append(v)
>             else:
>                     # turn into list
>                     new_v = [prev_v, v]
>                     d[k] = new_v
>          else:
>             d[k] = v
>      return d
> decoder = JSONDecoder(object_pairs_hook=dict_treat_duplicates)
> decoder.decode(jsonStr)
>
> will give you {'positions': {'position': [155, 844, 1726]}}, while
>
> def dict_raise_on_duplicates(ordered_pairs):
>       return ordered_pairs
>
> will give you [('positions', [('position', 155), ('position', 844),
> ('position', 1726)])]
>
> Best,
> Edward
>
> On Thu, Feb 6, 2020 at 1:57 PM Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
> >
> > Well that is interesting, I did not know that! Thanks Walter...
> >
> >
>
> https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object
> >
> > I gave it a go in Python (what I'm using) to see what would happen,
> indeed
> > it gives some odd behavior
> >
> > In [4]: jsonStr = ' {"test": 1, "test": 2} '
> >
> >
> > In [5]: json.loads(jsonStr)
> >
> > Out[5]: {'test': 2}
> >
> > On Thu, Feb 6, 2020 at 11:49 AM Walter Underwood <wun...@wunderwood.org>
> > wrote:
> >
> > > Repeated keys are quite legal in JSON, but many libraries don’t support
> > > that.
> > >
> > > It does look like that data layout could be redesigned to be more
> portable.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > > > On Feb 6, 2020, at 8:38 AM, Doug Turnbull <
> > > dturnb...@opensourceconnections.com> wrote:
> > > >
> > > > Thanks for the tip,
> > > >
> > > > The issue is json.nl produces non-standard json with duplicate keys.
> > > Solr
> > > > generates the following, which json lint fails given multiple keys
> > > >
> > > > {
> > > > "positions": {
> > > > "position": 155,
> > > > "position": 844,
> > > > "position": 1726
> > > > }
> > > > }
> > > >
> > > > On Thu, Feb 6, 2020 at 11:36 AM Munendra S N <
> sn.munendr...@gmail.com>
> > > > wrote:
> > > >
> > > >>>
> > > >>> Notice the lists, within lists, within lists. Where the keys are
> > > adjacent
> > > >>> items in the list. Is there a reason this isn't a JSON dictionary?
> > > >>>
> > > >> I think this is because of NamedList. Have you tried using json.nl
> =map
> > > as
> > > >> a
> > > >> query parameter for this case?
> > > >>
> > > >> Regards,
> > > >> Munendra S N
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull <
> > > >> dturnb...@opensourceconnections.com> wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> I was curious if anyone had any tips on parsing the JSON response
> of
> > > the
> > > >>> term vectors component? Or anyway to force it to be more standard
> JSON?
> > > >> It
> > > >>> appears to be very heavily nested and idiosyncratic JSON, such as
> > > below.
> > > >>>
> > > >>> Notice the lists, within lists, within lists. Where the keys are
> > > adjacent
> > > >>> items in the list. Is there a reason this isn't a JSON dictionary?
> > > >> Instead
> > > >>> you have to build a stateful list parser that just seems prone to
> > > >> errors...
> > > >>>
> > > >>> Any thoughts or ideas are very welcome, I probably just need to do
> > > >>> something rather simple here...
> > > >>>
> > > >>> "termVectors": [
> > > >>> "D100000", [
> > > >>> "uniqueKey", "D100000",
> > > >>> "body", [
> > > >>> "1", [
> > > >>> "positions", [
> > > >>> "position", 92,
> > > >>> "position", 113
> > > >>> ]
> > > >>> ],
> > > >>> "10", [ ...
> > > >>>
> > > >>> --
> > > >>> *Doug Turnbull **| CTO* | OpenSource Connections
> > > >>> <http://opensourceconnections.com>, LLC | 240.476.9983
> > > >>> Author: Relevant Search <http://manning.com/turnbull>
> > > >>> This e-mail and all contents, including attachments, is considered
> to
> > > be
> > > >>> Company Confidential unless explicitly stated otherwise, regardless
> > > >>> of whether attachments are marked as such.
> > > >>>
> > > >>
> > > >
> > > >
> > > > --
> > > > *Doug Turnbull **| CTO* | OpenSource Connections
> > > > <http://opensourceconnections.com>, LLC | 240.476.9983
> > > > Author: Relevant Search <http://manning.com/turnbull>
> > > > This e-mail and all contents, including attachments, is considered to
> be
> > > > Company Confidential unless explicitly stated otherwise, regardless
> > > > of whether attachments are marked as such.
> > >
> > >
> >
> > --
> > *Doug Turnbull **| CTO* | OpenSource Connections
> > <http://opensourceconnections.com>, LLC | 240.476.9983
> > Author: Relevant Search <http://manning.com/turnbull>
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
>


-- 
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Reply via email to