Hello Everyone,

I'm using suggesters with Solr 6.4 to get suggestions for a field with a
decent number of different values across a large number of documents that
is configured like this:

<lst name="suggester">
    <str name="name">vendorSuggester</str>
    <str name="lookupImpl">BlendedInfixLookupFactory</str>
    <int name="numFactor">600</int>
    <bool name="highlight">false</bool>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">attrib_vendor</str>
    <str name="contextField">group</str>
    <str name="contextFilterQueryTokenizer">keyword</str>
    <str name="buildOnStartup">false</str>
    <str name="suggestAnalyzerFieldType">text_result_autocomplete</str>
    <str name="indexPath">vendorSuggester</str>
</lst>

The field type is configured like this, although I'm pretty sure that it's
not the culprit because I tried multiple field types with no improvement:

<fieldType class="solr.TextField" name="text_result_autocomplete_de">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" rule="unicode"/>
        <filter class="solr.WordDelimiterFilterFactory"
preserveOriginal="1" splitOnCaseChange="0"
                catenateWords="1"
                catenateNumbers="1" catenateAll="1" splitOnNumerics="0"
stemEnglishPossessive="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <filter class="custom.DictionaryCompoundWordTokenFilterFactory"
                dictionary="lang/german-common-nouns.txt"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" rule="unicode"/>
        <filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="0" splitOnNumerics="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
    </analyzer>
</fieldType>

It works very well for small data sets, but for larger ones, less
suggestions than requested with count are returned, although I know that
the data set would contain more values suitable for completion. Even with a
query that matches exactly the expected suggestion, I don't get that
suggestion. In particular, it doesn't seem to suggest shorter values with
the same prefix, only the longest, but that just might be the cases I
tested.

I already remedied the situation by cranking up numFactor, but that only
makes users less likely to experience this problem, and increasing
numFactor further would make performance unacceptable.

Unfortunately, I can't use other lookup factory implementations, because
context filtering is necessary to limit suggestions to certain groups of
users. AnalyzingInfixLookupFactory, which also supports context fields,
doesn't help with the situation.

Are there any ideas on how to solve getting exhaustive suggestions with
decent performance for large data sets? I'd appreciate any hints.

Cheers,
Chris

Reply via email to