Bug with full text search fields in multiple languages (solr 5)

erantone Thu, 30 Apr 2015 16:33:12 -0700

Dear all,

I have defined two dynamic fields:


    <dynamicField name="*_texts_en" stored="true" type="text_en"
multiValued="true" indexed="true"/>
    <dynamicField name="*_texts_pt" stored="true" type="text_pt"
multiValued="true" indexed="true"/>

for documents in English and in Portuguese, with the following index and
query analyzers:

    <fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>  
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="text_pt" class="solr.TextField" omitNorms="false">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_pt.txt" format="snowball" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PortugueseLightStemFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory" 
preserveOriginal="true"/> 
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> 
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_pt.txt" format="snowball" />
        <filter class="solr.LowerCaseFilterFactory"/>
        
        <filter class="solr.PortugueseLightStemFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory" 
preserveOriginal="false"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> 
      </analyzer>
    </fieldType>

A document can be either in Portuguese and English, and it will use
something like 'body_texts_en' as a field in English. If in Portuguese:
'body_text_pt'.

However, I am experiencing problems with a search query to both fields
simultaneously when solr.StopFilterFactory is used in the filter chain. That
is, when I search for a certain query without knowing the language, I query
solr in this way:

{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "q": "suco de limão",
      "defType": "edismax",
      "indent": "true",
      "qf": " body_texts_pt  body_texts_en",
      "wt": "json",
      "lowercaseOperators": "true",
      "stopwords": "true",
      "_": "1430434475811"
    }
  },
  "response": {
    "numFound": 0,
    "start": 0,
    "docs": []
  }
}

The query above was done using terms in Portuguese. Even though the index
had matching documents, no results are returned.
On the other hand, as soon as I:
- remove 'body_texts_en' from 'qf' param (in the solr request), OR
- remove all solr.StopFilterFactory filters from all analyzers,  
the matching documents are correctly returned.

Thus, the problem here is in the use of solr.StopFilterFactory and
simultaneous query to two fields, each one having its own use of
solr.StopFilterFactory (as shown above).

Is there any hope of having the query above to work as expected?

Thanks in advance.

With best regards,
Eric








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Bug-with-full-text-search-fields-in-multiple-languages-solr-5-tp4203367.html
Sent from the Solr - User mailing list archive at Nabble.com.

Bug with full text search fields in multiple languages (solr 5)

Reply via email to