Hello Everyone, I'm using suggesters with Solr 6.4 to get suggestions for a field with a decent number of different values across a large number of documents that is configured like this:
<lst name="suggester"> <str name="name">vendorSuggester</str> <str name="lookupImpl">BlendedInfixLookupFactory</str> <int name="numFactor">600</int> <bool name="highlight">false</bool> <str name="dictionaryImpl">DocumentDictionaryFactory</str> <str name="field">attrib_vendor</str> <str name="contextField">group</str> <str name="contextFilterQueryTokenizer">keyword</str> <str name="buildOnStartup">false</str> <str name="suggestAnalyzerFieldType">text_result_autocomplete</str> <str name="indexPath">vendorSuggester</str> </lst> The field type is configured like this, although I'm pretty sure that it's not the culprit because I tried multiple field types with no improvement: <fieldType class="solr.TextField" name="text_result_autocomplete_de"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory" rule="unicode"/> <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnCaseChange="0" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnNumerics="0" stemEnglishPossessive="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.GermanNormalizationFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="custom.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/german-common-nouns.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" rule="unicode"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" splitOnNumerics="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.GermanNormalizationFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> </analyzer> </fieldType> It works very well for small data sets, but for larger ones, less suggestions than requested with count are returned, although I know that the data set would contain more values suitable for completion. Even with a query that matches exactly the expected suggestion, I don't get that suggestion. In particular, it doesn't seem to suggest shorter values with the same prefix, only the longest, but that just might be the cases I tested. I already remedied the situation by cranking up numFactor, but that only makes users less likely to experience this problem, and increasing numFactor further would make performance unacceptable. Unfortunately, I can't use other lookup factory implementations, because context filtering is necessary to limit suggestions to certain groups of users. AnalyzingInfixLookupFactory, which also supports context fields, doesn't help with the situation. Are there any ideas on how to solve getting exhaustive suggestions with decent performance for large data sets? I'd appreciate any hints. Cheers, Chris