Hi,

I've implemented a solr suggester with FuzzyLookupFactory and its working
perfectly. Except for a minor glitch, its only treating case sensitive
searches as an exact match.
For example, results for "mumbai" vs "Mumbai" is different.

This is too restrictive and kind of defeating the purpose of the suggester.

I've posted this on stackoverflow:

http://stackoverflow.com/questions/41320424/solr-fuzzylookupfactory-exactmatch-is-case-sensitive

Following is the text I posted on stackoverflow

I have implemented a solr suggester for list of cities and areas. I have
user FuzzyLookupFactory for this. My schema looks like this:

<fieldType name="suggestTypeLc" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

synonym.txt is used for mapping older city names with new ones, like
Madras=>Chennai, Saigon=>Ho Chi Minh city

My suggester definition looks like this:

  <searchComponent name="suggest" class="solr.SuggestComponent">
        <lst name="suggester">
              <str name="name">suggestions</str>
              <str name="lookupImpl">FuzzyLookupFactory</str>
              <str name="dictionaryImpl">DocumentDictionaryFactory</str>
              <str name="field">searchfield</str>
              <str name="weightField">searchscore</str>
              <str name="suggestAnalyzerFieldType">suggestTypeLc</str>
              <str name="buildOnStartup">false</str>
              <str name="buildOnCommit">false</str>
              <str name="storeDir">autosuggest_dict</str>
        </lst>
  </searchComponent>

My request handler looks like this:

  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
        <lst name="defaults">
                <str name="suggest">true</str>
                <str name="suggest.count">10</str>
                <str name="suggest.dictionary">suggestions</str>
                <str name="suggest.dictionary">results</str>
        </lst>
        <arr name="components">
                <str>suggest</str>
        </arr>
  </requestHandler>

Now the problem is that suggester is showing the exact matches first But it
is case sensitive. for eg,

/suggest?suggest.q=mumbai (starting with a lower case "m")

will give, exact result at 4th place:

{
  "responseHeader":{
    "status":0,
    "QTime":19},
  "suggest":{
    "suggestions":{
      "mumbai":{
        "numFound":10,
        "suggestions":[{
            "term":"Mumbai Domestic Airport",
            "weight":11536},
          {
            "term":"Mumbai Chhatrapati Shivaji Intl Airport",
            "weight":11376},
          {
            "term":"Mumbai Pune Highway",
            "weight":2850},
          {
            "term":"Mumbai",
            "weight":2248},
.....

Whereas, calling /suggest?suggest.q=Mumbai (starting with an upper case "M")

is giving exact result at 1st place:

{
  "responseHeader":{
    "status":0,
    "QTime":16},
  "suggest":{
    "suggestions":{
      "Mumbai":{
        "numFound":10,
        "suggestions":[{
            "term":"Mumbai",
            "weight":2248},
          {
            "term":"Mumbai Domestic Airport",
            "weight":11536},
          {
            "term":"Mumbai Chhatrapati Shivaji Intl Airport",
            "weight":11376},
          {
            "term":"Mumbai Pune Highway",
            "weight":2850},
...

What am I missing here ? What can be done to make Mumbai as the first
result even if it is called from a lower case "mumbai" as query. I thought
the case sensitivity is being handled by "suggestTypeLc" field I've
generated.
-- 
Ciao
Diwakar

Reply via email to