Dear Solr gurus,
I'm having hard time using block-join queries on nested documents with
multi-select facets. We currently index products with variations as
nested-documents as the following:
Product-1: t-shirt => brand:Nike doc_type:0 ...
- SKU-A => size:S color:blue doc_type:1 text-fields:"small t-shirt ..."
- SKU-B => size:M color:blue doc_type:1 text-fields:"medium t-shirt ..."
- SKU-C => size:L color:red doc_type:1 text-fields:"large t-shirt ..."
We would like to apply filters and facets on the child-level fields,
return search results on parent-level and calculate facet counts also
on parent-level (i.e. return products and count the number of matching
products).
We initially implemented our query as separate block-join queries:
Q1
q={!parent which="doc_type:0"} ({!dismax v="t-shirts"} AND doc_type:1)
&fq={!parent which="doc_type:0"} color:blue
&fq={!parent which="doc_type:0"} size:L
but it didn't work for us - Product-1 would be returned by this query
while it should NOT be returned (i.e. it has no SKU with color:blue
and size:L). So we re-wrote our block-join query as the following:
Q2
q={!parent which="doc_type:0"} ({!dismax v="t-shirts"} AND
(filter(doc_type:1) AND filter(color:blue) AND filter(size:L)))
We used filter(...) syntax to use filter-query caching on child-level queries.
Then, we ran into issues with multi-select faceting. After testing, we
found out that only top-level queries can be excluded by facet
queries. As a workaround, we had to execute a separate query for
facets on child-level and use blockParent with json.facet:
Q3
q=*:*
&rows=0
&fq={!dismax cache=false}t-shirts
&fq=doc_type:1
&fq={!tag=color}color:blue
&fq={!tag=size}size:L
&json.facet=
{
size:{
type:terms,
field:size,
domain: { blockParent: "doc_type:0", excludeTags: "size" }
}
}
Ideally, we would like to handle facets within the same query as
document retrieval. We found
https://issues.apache.org/jira/browse/SOLR-9510 and, based on its
specs, we could handle both case with the following single query:
Q4
q={!parent which="doc_type:0" filters=$child.fq v=$childquery}
&childquery={!dismax v="t-shirts"}
&child.fq=doc_type:1
&child.fq={!tag=color}color:blue
&child.fq={!tag=size}size:L
&json.facet=
{
size:{
type:terms,
field:size,
domain: {
blockChild: "doc_type:0",
excludeTags: "size", // this is applied on the first pass,
which computes root-domain documents
filter: {
queryParam: [ "child.fq", "childquery" ],
excludeTags: "size" // this is applied on the second pass,
which filters children of root-domain documents
}
},
facet: { product-count: "unique(_root_)" }
}
}
This query would work, but I find this query a bit too complicated. It
executes too many queries - it first executes queries on child-level,
rolls up to parent-level using block-join ({!parent}), rolls back down
to child-level for faceting using 'blockChild' in json.facet, filter
on child-level again with 'filter' and finally rolls back up to
parent-level using 'unique(_root_)'. The query could've been simpler
if we could leave the root-domain on child-level and apply
parent-block-join on query and facets individually, but I believe the
current Solr syntax does not support it.
Though we've managed to make it work, I would like to know if we can
make these queries more efficient:
- Is there any better way to cache filter queries and write block-join queries?
- Without SOLR-9510, can we handles both document-retrieval and
multi-select faceting with a single query? Maybe with
field-collapsing?
- Do we have an ETA on SOLR-9510?
Another problem we had with block-join parser is using dismax parser.
Before we started using nested documents, we used dismax parser on
multiple text fields - i.e. &qf="title^10 description^2". With nested
documents, we initially indexed some text fields on parent-level and
some on child-level. We could not find a way to use dismax parser
across parent-child document and we had to index all text fields on
child-level, wasting quite a bit of space. Is there a way to achieve
the same (or similar) functionality while indexing text fields across
the levels?
My last question!
- Is there a way to use dismax query parser on fields across multiple levels?
Would appreciate your help!
Thanks,
Hyun Goo