The primary recommendation is that you flatten nested documents. That means one Solr document per cpc, not multivalued.
As always, queries should drive your data model, so please specify what a typical query might be like, in plain English. -- Jack Krupansky On Tue, Nov 24, 2015 at 4:39 AM, István <lecc...@gmail.com> wrote: > Hi all, > > I would like to find documents in a key-value store (Riak) with Solr and I > am running into a challenge. I have nested JSON documents with patent > information. Patents have a one or many CPC ( > http://www.cooperativepatentclassification.org/index.html) codes something > like these: > > { > > // more data > > "cpc": [ > { > "class": "61", > "section": "A", > "sequence": "1", > "subclass": "K", > "subgroup": "06", > "main-group": "45", > "classification-value": "I" > }, > { > "class": "61", > "section": "A", > "sequence": "2", > "subclass": "K", > "subgroup": "506", > "main-group": "31", > "classification-value": "I" > } > ] > > } > > I would like to find the documents that match to a certain CPC code, > sometimes with partial code sometimes with the full code. I used the > following schema to index the documents: > > <field name="cpc.class" type="int" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.section" type="string" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.sequence" type="int" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.subclass" type="string" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.subgroup" type="int" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.main-group" type="int" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.classification-value" type="string" indexed="true" > stored="true" multiValued="true" /> > > > The problem with this approach is that when we query a certain combination > of partial CPC codes it returns document that don't actually match that > combination. > > This behavior described in this blog post: > > > http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html > > My understanding is that I need to apply termPositions=”true” to the field > definition and than Solr maintains the position information and it will > return only the documents that actually match the combination of the > partial CPC codes. Am I on the right track with this or there is a better > solution to query nested documents with partial codes? > > Thank you in advance, > Istvan > > PS: I also posted this on Stackoverflow: > > http://stackoverflow.com/questions/33724556/how-to-index-an-array-of-hashes-with-solr > > -- > the sun shines for all >