Bug with solr.SuggestComponent, Solr 6.30, and the delimiter between multiple words in term?

Purple Mynxie Wed, 13 Dec 2017 07:06:49 -0800

Is there a bug with solr.SuggestComponent, Solr 6.30, and the delimiter
between multiple words in term?


If there are multiple words in the term suggested, the words are delimited
with an information separator, U+001E.
https://www.fileformat.info/info/unicode/char/001e/index.htm

In Chrome Version 62.0.3202.94 (Official Build) (64-bit) the separator
appears as square boxes.
In Microsoft Edge 40.15063.674.0 I can see \u001e:

{"responseHeader":{"status":0,"QTime":1},"suggest":{"docSuggester":{"signal
processing
to":{"numFound":5,"suggestions":[{"term":"signal\u001eprocessing\u001etoolbox","weight":8106483484358234112,"payload":""},{"term":"signal\u001eprocessing\u001eto","weight":10637033833300398,"payload":""},{"term":"signal\u001eprocessing\u001etool","weight":2127406766660079,"payload":""},{"term":"signal\u001eprocessing\u001etools","weight":1063703383330039,"payload":""},{"term":"processing\u001etogether","weight":335151600176409,"payload":""}]}}}}

I am expecting just a plain space as a delimiter.

solrconfig.xml

  <searchComponent class="solr.SuggestComponent" name="doc_suggest">
    <lst name="suggester">
      <str name="name">docSuggester</str>
      <str name="lookupImpl">FreeTextLookupFactory</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">doc_suggestions</str>
      <str name="weightField">weight</str>
      <str name="suggestFreeTextAnalyzerFieldType">suggestTypeLc</str>
      <str name="ngrams">3</str>
      <str name="buildOnStartup">false</str>
    </lst>
  </searchComponent>

  <requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/doc_suggest">
    <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">5</str>
    </lst>
    <arr name="components">
      <str>doc_suggest</str>
    </arr>
  </requestHandler>

schema.xml

  <fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
       <tokenizer class="solr.KeywordTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EdgeNGramFilterFactory" maxGramSize="100"/>
    </analyzer>
    <analyzer type="query">
       <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

  <fieldType name="suggestTypeLc" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[^a-zA-Z0-9]" replacement=" " />
        <tokenizer class="solr.WhiteSpaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>


<field name="doc_suggestions" type="textSpell" stored="true" indexed="true"
multiValued="true" />

  <copyField source="body_en" dest="doc_suggestions" />
  <copyField source="title_en" dest="doc_suggestions" />
  <copyField source="primary_header_en" dest="doc_suggestions" />


I am using FreeTextLookupFactory because of this bug:
https://issues.apache.org/jira/browse/SOLR-9458

Thank you.

Tiffany

Bug with solr.SuggestComponent, Solr 6.30, and the delimiter between multiple words in term?

Reply via email to