Hello Istvan, - when flattern subdocs, you can concatenate its' fields which are necessary for retrieval, eg "K-06-45", it solves retrieval, but isn't really flexible. - term positions is not easier to implement, if you really prefer this way I'd suggest to look on http://siren.solutions/siren/overview/ I haven't tried it, but it sounds like they implemented this approach. - if you follow recent blog post, you see our favorite approach http://blog.griddynamics.com/2013/09/solr-block-join-support.html
Also, query time join {!join} and field collapsing are also alternatives to consider. On Tue, Nov 24, 2015 at 12:39 PM, István <lecc...@gmail.com> wrote: > Hi all, > > I would like to find documents in a key-value store (Riak) with Solr and I > am running into a challenge. I have nested JSON documents with patent > information. Patents have a one or many CPC ( > http://www.cooperativepatentclassification.org/index.html) codes something > like these: > > { > > // more data > > "cpc": [ > { > "class": "61", > "section": "A", > "sequence": "1", > "subclass": "K", > "subgroup": "06", > "main-group": "45", > "classification-value": "I" > }, > { > "class": "61", > "section": "A", > "sequence": "2", > "subclass": "K", > "subgroup": "506", > "main-group": "31", > "classification-value": "I" > } > ] > > } > > I would like to find the documents that match to a certain CPC code, > sometimes with partial code sometimes with the full code. I used the > following schema to index the documents: > > <field name="cpc.class" type="int" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.section" type="string" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.sequence" type="int" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.subclass" type="string" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.subgroup" type="int" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.main-group" type="int" indexed="true" > stored="true" multiValued="true" /> > <field name="cpc.classification-value" type="string" indexed="true" > stored="true" multiValued="true" /> > > > The problem with this approach is that when we query a certain combination > of partial CPC codes it returns document that don't actually match that > combination. > > This behavior described in this blog post: > > > http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html > > My understanding is that I need to apply termPositions=”true” to the field > definition and than Solr maintains the position information and it will > return only the documents that actually match the combination of the > partial CPC codes. Am I on the right track with this or there is a better > solution to query nested documents with partial codes? > > Thank you in advance, > Istvan > > PS: I also posted this on Stackoverflow: > > http://stackoverflow.com/questions/33724556/how-to-index-an-array-of-hashes-with-solr > > -- > the sun shines for all > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>