PathHierarchyTokenizerFactory single level match

lstusr 5u93n4 Fri, 23 Nov 2018 06:24:38 -0800

Hi,

I have a schema that has a descendent_path field as configured in the
PathTokenizerHierarchyFactory docs:


 <fieldType name="descendent_path" class="solr.TextField">
   <analyzer type="index">
     <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.KeywordTokenizerFactory" />
   </analyzer>
 </fieldType>


Using the example in the docs:  *For example, in the configuration below a
query for Books/NonFic will match documents indexed with values like
Books/NonFic, Books/NonFic/Law, Books/NonFic/Science/Physics, etc. But it
will not match documents indexed with values like Books, or Books/Fic.* This
works great and solves a primary use case.

However, we have a secondary use case where we need to get all documents
that match a single level. For example, let's say I wanted all of the
categories in Books/NonFic/, like Books/NonFic/Science, Books/NonFic/Art,
Books/NonFic/Math, etc..  I can query for Books/NonFic, but this gives me
all children records too. One solution is to query for:

category:Books/NonFic/* -category:Books/NonFic/*/*

which seems like it works, but feels a little clunky.

The other solution I can think of is to put a separate, non-tokenized field
into the document at index time for each record, something like
parentCategory, which would be non-tokenized and indexed (not stored) like
Books/NonFic for each of the Books/NonFic/[Science, Art, Math] documents.
However, with this solution I'm duplicating the information and increasing
my index size. This is not the worst thing, I know, but the field is by far
the largest contributor to the index size already, and doubling the
information there will have a noticeable impact on the disk footprint.

So my question: with a projected index size in the billions of documents,
would you take either one of those two approaches? Or a third that I
haven't thought of?

Thanks,

Kyle

PathHierarchyTokenizerFactory single level match

Reply via email to