Block-join, JSON faceting and dismax parser on nested documents

Hyungoo Kang Wed, 25 Jan 2017 23:31:04 -0800

Dear Solr gurus,

I'm having hard time using block-join queries on nested documents with
multi-select facets. We currently index products with variations as
nested-documents as the following:


Product-1: t-shirt => brand:Nike doc_type:0 ...
- SKU-A => size:S color:blue doc_type:1 text-fields:"small t-shirt ..."
- SKU-B => size:M color:blue doc_type:1 text-fields:"medium t-shirt ..."
- SKU-C => size:L color:red  doc_type:1 text-fields:"large t-shirt ..."

We would like to apply filters and facets on the child-level fields,
return search results on parent-level and calculate facet counts also
on parent-level (i.e. return products and count the number of matching
products).

We initially implemented our query as separate block-join queries:

Q1
q={!parent which="doc_type:0"} ({!dismax v="t-shirts"} AND doc_type:1)
&fq={!parent which="doc_type:0"} color:blue
&fq={!parent which="doc_type:0"} size:L

but it didn't work for us - Product-1 would be returned by this query
while it should NOT be returned (i.e. it has no SKU with color:blue
and size:L). So we re-wrote our block-join query as the following:

Q2
q={!parent which="doc_type:0"} ({!dismax v="t-shirts"} AND
(filter(doc_type:1) AND filter(color:blue) AND filter(size:L)))

We used filter(...) syntax to use filter-query caching on child-level queries.

Then, we ran into issues with multi-select faceting. After testing, we
found out that only top-level queries can be excluded by facet
queries. As a workaround, we had to execute a separate query for
facets on child-level and use blockParent with json.facet:

Q3
q=*:*
&rows=0
&fq={!dismax cache=false}t-shirts
&fq=doc_type:1
&fq={!tag=color}color:blue
&fq={!tag=size}size:L
&json.facet=
{
  size:{
    type:terms,
    field:size,
    domain: { blockParent: "doc_type:0", excludeTags: "size" }
  }
}

Ideally, we would like to handle facets within the same query as
document retrieval. We found
https://issues.apache.org/jira/browse/SOLR-9510 and, based on its
specs, we could handle both case with the following single query:

Q4
q={!parent which="doc_type:0" filters=$child.fq v=$childquery}
&childquery={!dismax v="t-shirts"}
&child.fq=doc_type:1
&child.fq={!tag=color}color:blue
&child.fq={!tag=size}size:L
&json.facet=
{
  size:{
    type:terms,
    field:size,
    domain: {
      blockChild: "doc_type:0",
      excludeTags: "size",  // this is applied on the first pass,
which computes root-domain documents
      filter: {
        queryParam: [ "child.fq", "childquery" ],
        excludeTags: "size" // this is applied on the second pass,
which filters children of root-domain documents
      }
    },
    facet: { product-count: "unique(_root_)" }
  }
}

This query would work, but I find this query a bit too complicated. It
executes too many queries - it first executes queries on child-level,
rolls up to parent-level using block-join ({!parent}), rolls back down
to child-level for faceting using 'blockChild' in json.facet, filter
on child-level again with 'filter' and finally rolls back up to
parent-level using 'unique(_root_)'. The query could've been simpler
if we could leave the root-domain on child-level and apply
parent-block-join on query and facets individually, but I believe the
current Solr syntax does not support it.

Though we've managed to make it work, I would like to know if we can
make these queries more efficient:
- Is there any better way to cache filter queries and write block-join queries?
- Without SOLR-9510, can we handles both document-retrieval and
multi-select faceting with a single query? Maybe with
field-collapsing?
- Do we have an ETA on SOLR-9510?


Another problem we had with block-join parser is using dismax parser.
Before we started using nested documents, we used dismax parser on
multiple text fields - i.e. &qf="title^10 description^2". With nested
documents, we initially indexed some text fields on parent-level and
some on child-level. We could not find a way to use dismax parser
across parent-child document and we had to index all text fields on
child-level, wasting quite a bit of space. Is there a way to achieve
the same (or similar) functionality while indexing text fields across
the levels?

My last question!
- Is there a way to use dismax query parser on fields across multiple levels?

Would appreciate your help!

Thanks,
Hyun Goo

Block-join, JSON faceting and dismax parser on nested documents

Reply via email to