Hi Roland, Could you check Analysis tab ( https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell how the term is analyzed for both query and index?
Kind Regards, Furkan KAMACI On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland <szucs.rol...@bookandwalk.hu> wrote: > Hi All, > > I have an author suggester (searchcomponent and the related request > handler) defined in solrconfig: > <searchComponent name="suggest" class="solr.SuggestComponent"> > <!-- All suggester component must have different filepath to avoid > write lock issues-->> > <lst name="suggester"> > <str name="name">author</str> > <str name="lookupImpl">AnalyzingInfixLookupFactory</str> > <str name="dictionaryImpl">DocumentDictionaryFactory</str> > <str name="field">BOOK_productAuthor</str> > <str name="suggestAnalyzerFieldType">short_text_hu</str> > <str name="indexPath">suggester_infix_author</str> > <str name="buildOnStartup">false</str> > <str name="buildOnCommit">false</str> > <str name="minPrefixChars">2</str> > </lst> > </searchComponent> > > <requestHandler name="/suggesthandler" class="solr.SearchHandler" > startup="lazy" > > <lst name="defaults"> > <str name="suggest">true</str> > <str name="suggest.count">10</str> > <str name="suggest.dictionary">author</str> > </lst> > <arr name="components"> > <str>suggest</str> > </arr> > </requestHandler> > > Author field has just a minimal text processing in query and index time > based on the following definition: > <fieldType name="short_text_hu" class="solr.TextField" > positionIncrementGap="100" multiValued="true"> > <analyzer type="index"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.ClassicTokenizerFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords_hu.txt" > ignoreCase="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.ClassicTokenizerFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords_hu.txt" > ignoreCase="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > <fieldType name="string" class="solr.StrField" sortMissingLast="true" > docValues="true"/> > <fieldType name="strings" class="solr.StrField" sortMissingLast="true" > docValues="true" multiValued="true"/> > <fieldType name="text_ar" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" words="lang/stopwords_ar.txt" > ignoreCase="true"/> > <filter class="solr.ArabicNormalizationFilterFactory"/> > <filter class="solr.ArabicStemFilterFactory"/> > </analyzer> > </fieldType> > > When I use qeries with only ASCII characters, the results are correct: > "Al":{ > "term":"<b>Al</b>exandre Dumas", "weight":0, "payload":""} > > When I try it with Hungarian authorname with special character: > "Jó":"author":{ > "Jó":{ "numFound":0, "suggestions":[]}} > > When I try it with three letters, it works again: > "Józ":"author":{ > "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza <b>Józ</b>sef", " > weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, " > payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, { > "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, { > "term":"<b>Józ</b>sef > Attila", "weight":0, "payload":""}.. > > Any idea how can it happen that a longer string has more matches than a > shorter one. It is inconsistent. What can I do to fix it as it would > results poor customer experience. > They would feel that sometimes they need 2 sometimes 3 characters to get > suggestions. > > Thanks in advance, > Roland >