Dear Solr gurus, I'm having hard time using block-join queries on nested documents with multi-select facets. We currently index products with variations as nested-documents as the following:
Product-1: t-shirt => brand:Nike doc_type:0 ... - SKU-A => size:S color:blue doc_type:1 text-fields:"small t-shirt ..." - SKU-B => size:M color:blue doc_type:1 text-fields:"medium t-shirt ..." - SKU-C => size:L color:red doc_type:1 text-fields:"large t-shirt ..." We would like to apply filters and facets on the child-level fields, return search results on parent-level and calculate facet counts also on parent-level (i.e. return products and count the number of matching products). We initially implemented our query as separate block-join queries: Q1 q={!parent which="doc_type:0"} ({!dismax v="t-shirts"} AND doc_type:1) &fq={!parent which="doc_type:0"} color:blue &fq={!parent which="doc_type:0"} size:L but it didn't work for us - Product-1 would be returned by this query while it should NOT be returned (i.e. it has no SKU with color:blue and size:L). So we re-wrote our block-join query as the following: Q2 q={!parent which="doc_type:0"} ({!dismax v="t-shirts"} AND (filter(doc_type:1) AND filter(color:blue) AND filter(size:L))) We used filter(...) syntax to use filter-query caching on child-level queries. Then, we ran into issues with multi-select faceting. After testing, we found out that only top-level queries can be excluded by facet queries. As a workaround, we had to execute a separate query for facets on child-level and use blockParent with json.facet: Q3 q=*:* &rows=0 &fq={!dismax cache=false}t-shirts &fq=doc_type:1 &fq={!tag=color}color:blue &fq={!tag=size}size:L &json.facet= { size:{ type:terms, field:size, domain: { blockParent: "doc_type:0", excludeTags: "size" } } } Ideally, we would like to handle facets within the same query as document retrieval. We found https://issues.apache.org/jira/browse/SOLR-9510 and, based on its specs, we could handle both case with the following single query: Q4 q={!parent which="doc_type:0" filters=$child.fq v=$childquery} &childquery={!dismax v="t-shirts"} &child.fq=doc_type:1 &child.fq={!tag=color}color:blue &child.fq={!tag=size}size:L &json.facet= { size:{ type:terms, field:size, domain: { blockChild: "doc_type:0", excludeTags: "size", // this is applied on the first pass, which computes root-domain documents filter: { queryParam: [ "child.fq", "childquery" ], excludeTags: "size" // this is applied on the second pass, which filters children of root-domain documents } }, facet: { product-count: "unique(_root_)" } } } This query would work, but I find this query a bit too complicated. It executes too many queries - it first executes queries on child-level, rolls up to parent-level using block-join ({!parent}), rolls back down to child-level for faceting using 'blockChild' in json.facet, filter on child-level again with 'filter' and finally rolls back up to parent-level using 'unique(_root_)'. The query could've been simpler if we could leave the root-domain on child-level and apply parent-block-join on query and facets individually, but I believe the current Solr syntax does not support it. Though we've managed to make it work, I would like to know if we can make these queries more efficient: - Is there any better way to cache filter queries and write block-join queries? - Without SOLR-9510, can we handles both document-retrieval and multi-select faceting with a single query? Maybe with field-collapsing? - Do we have an ETA on SOLR-9510? Another problem we had with block-join parser is using dismax parser. Before we started using nested documents, we used dismax parser on multiple text fields - i.e. &qf="title^10 description^2". With nested documents, we initially indexed some text fields on parent-level and some on child-level. We could not find a way to use dismax parser across parent-child document and we had to index all text fields on child-level, wasting quite a bit of space. Is there a way to achieve the same (or similar) functionality while indexing text fields across the levels? My last question! - Is there a way to use dismax query parser on fields across multiple levels? Would appreciate your help! Thanks, Hyun Goo