Hi,
I have a schema that has a descendent_path field as configured in the
PathTokenizerHierarchyFactory docs:
<fieldType name="descendent_path" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
</analyzer>
</fieldType>
Using the example in the docs: *For example, in the configuration below a
query for Books/NonFic will match documents indexed with values like
Books/NonFic, Books/NonFic/Law, Books/NonFic/Science/Physics, etc. But it
will not match documents indexed with values like Books, or Books/Fic.* This
works great and solves a primary use case.
However, we have a secondary use case where we need to get all documents
that match a single level. For example, let's say I wanted all of the
categories in Books/NonFic/, like Books/NonFic/Science, Books/NonFic/Art,
Books/NonFic/Math, etc.. I can query for Books/NonFic, but this gives me
all children records too. One solution is to query for:
category:Books/NonFic/* -category:Books/NonFic/*/*
which seems like it works, but feels a little clunky.
The other solution I can think of is to put a separate, non-tokenized field
into the document at index time for each record, something like
parentCategory, which would be non-tokenized and indexed (not stored) like
Books/NonFic for each of the Books/NonFic/[Science, Art, Math] documents.
However, with this solution I'm duplicating the information and increasing
my index size. This is not the worst thing, I know, but the field is by far
the largest contributor to the index size already, and doubling the
information there will have a noticeable impact on the disk footprint.
So my question: with a projected index size in the billions of documents,
would you take either one of those two approaches? Or a third that I
haven't thought of?
Thanks,
Kyle