The primary recommendation is that you flatten nested documents.

That means one Solr document per cpc, not multivalued.

As always, queries should drive your data model, so please specify what a
typical query might be like, in plain English.

-- Jack Krupansky

On Tue, Nov 24, 2015 at 4:39 AM, István <lecc...@gmail.com> wrote:

> Hi all,
>
> I would like to find documents in a key-value store (Riak) with Solr and I
> am running into a challenge. I have nested JSON documents with patent
> information. Patents have a one or many CPC (
> http://www.cooperativepatentclassification.org/index.html) codes something
> like these:
>
> {
>
> // more data
>
> "cpc": [
>     {
>       "class": "61",
>       "section": "A",
>       "sequence": "1",
>       "subclass": "K",
>       "subgroup": "06",
>       "main-group": "45",
>       "classification-value": "I"
>     },
>     {
>       "class": "61",
>       "section": "A",
>       "sequence": "2",
>       "subclass": "K",
>       "subgroup": "506",
>       "main-group": "31",
>       "classification-value": "I"
>     }
> ]
>
> }
>
> I would like to find the documents that match to a certain CPC code,
> sometimes with partial code sometimes with the full code. I used the
> following schema to index the documents:
>
> <field name="cpc.class"                 type="int"    indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.section"               type="string" indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.sequence"              type="int"    indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.subclass"              type="string" indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.subgroup"              type="int"    indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.main-group"            type="int"    indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.classification-value"  type="string" indexed="true"
> stored="true" multiValued="true" />
>
>
> The problem with this approach is that when we query a certain combination
> of partial CPC codes it returns document that don't actually match that
> combination.
>
> This behavior described in this blog post:
>
>
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
>
> My understanding is that I need to apply termPositions=”true” to the field
> definition and than Solr maintains the position information and it will
> return only the documents that actually match the combination of the
> partial CPC codes. Am I on the right track with this or there is a better
> solution to query nested documents with partial codes?
>
> Thank you in advance,
> Istvan
>
> PS: I also posted this on Stackoverflow:
>
> http://stackoverflow.com/questions/33724556/how-to-index-an-array-of-hashes-with-solr
>
> --
> the sun shines for all
>

Reply via email to