Hi, This is a sample doc - <doc> <field name="doc_type">parent</field> <field name="item">shirt</field> <doc> <field name="doc_type">child</field> <field name="c_COLOR">Red</field> <field name="c_SIZE">XL</field> <field name="c_PRICE">6</field> </doc> <field name="p_COLOR">Red</field> <field name="p_SIZE">XL</field> <field name="p_PRICE">6</field> </doc>
The parent doc represents an item/object and the nested docs contain extended properties of the object in parent doc. So while searching the nested docs are filtered out for proper result count. This required duplicating the nested doc fields in the parent doc. This duplication of fields has resulted in huge Solr index size and I am planning to get rid of them and use blockjoin for nested doc fields. This has caused another serious problem where if the value I am searching for is present in a nested doc, no results are found (as nested docs are filtered out as a rule. This used to work before because even if the nested doc is filtered out, the parent doc is still returned) I have come up with 2 approaches to solve this. 1. Include global field while indexing: For each field in nested doc add the corresponding value in global field in the parent doc. <doc> <field name="doc_type">parent</field> <doc> <field name="doc_type">child</field> <field name="c_COLOR">Red</field> <field name="c_SIZE">XL</field> <field name="c_PRICE">6</field> </doc> <field name="global">Red</field> <field name="global">XL</field> <field name="global">6</field> </doc> 2. Use a new copy field: The fields in nested doc have unique name patterns from other fields so I can easily create another copy field that contains only the nested doc fields. Now while querying, I use block-join on this copy field along with the existing global field like so - global:(red) OR {!parent which=doc_type:parent}c_global:(red) Add this in schema: <copy desc="c_global" src="c_*"> 3. I came across another approach/hack accidentally. I had modified the existing schema to remove duplicate parent fields but the data I used for reindexing contained the duplicate parent fields. So the global field contains values from both parent and nested field. But the indexed doc itself will skip the parent doc fields as the schema doesn't have them. I was able to search for nested doc field values, and the total index size was less than the above two. Can someone please suggest which is the better option and why? Thanks! Soham -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html