Re: Multilingual indexing, search results, edismax and stopwords

Jan Høydahl Tue, 25 Mar 2014 04:31:57 -0700

If using stopwords with edismax, please make sure that ALL fields referred in 
"qf" have stopwords defined in the fieldType and also that the stopword 
dictionary is the SAME for all these. This way you will not encounter the 
infamous edismax+stopwords bug mentioned in 
https://issues.apache.org/jira/browse/SOLR-3085


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

23. mars 2014 kl. 19:37 skrev Jack Krupansky <j...@basetechnology.com>:

> Setting the default query operator to AND is the preferred approach: q.op=AND.
> 
> That said, I'm not sure that counting ignored and empty terms towards the mm 
> % makes sense. IOW, if a term transforms to nothing, either because it is a 
> stop word or empty synonym replacement or pure punctuation, I don't think it 
> should count as a term. I think this is worth a Jira.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: kastania44
> Sent: Thursday, March 20, 2014 11:00 AM
> To: solr-user@lucene.apache.org
> Subject: Multilingual indexing, search results, edismax and stopwords
> 
> On our drupal multilingual system we use apache Solr 3.5.
> The problem is well known on different blogs, sites I read.
> The search results are not the one we want.
> On our code in hook apachesolr_query_alter we override the defaultOperator:
> $query->replaceParam('mm', '90%');
> The requirement is, when I search for: biological analyses, I want to fetch
> only the results which have both of the words.
> When I search for: biological and chemical analyses, I want it to fetch only
> the results which have biological , chemical, analyses. The and is not
> indexed due to stopwords.
> 
> If I set mm to 100% and my query has stopwords it will not fetch any result.
> If I set mm to 100$ and my query does not have stopwords it will fetch the
> desired results.
> If I set mm anything between 50%-99% it fetches not wanted results, as
> results that contain only one of the searched keywords, or words like the
> searched keywords, like analyse (even if I searched for analyses).
> 
> If I search using + before the words that are mandatory it works ok, but it
> is not user friently, to ask from the user to type + before each word
> exvcept from the stopwords.
> 
> Do I make any sense?
> 
> Below are some of our configuration details:
> 
> All the indexed fields are of type text_language,
> e.g from our schema.xml
> /<field name="label" type="text" indexed="true" stored="true"
> termVectors="true" omitNorms="true"/>
> <field name="i18n_label_en" type="text_en" indexed="true" stored="true"
> termVectors="true" omitNorms="true"/>
> <field name="i18n_label_fr" type="text_fr" indexed="true" stored="true"
> termVectors="true" omitNorms="true"/>/
> All the text fieldtypes have the same configuration except from the
> protected, words, dictionary parameters which are language specific.
> e.g from our schema.xml
> /<fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer type="index">
>       <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent_en.txt"/>
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 
> 
>       <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_en.txt" enablePositionIncrements="true"/>
>       <filter class="solr.WordDelimiterFilterFactory"
> protected="protwords.txt" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1" splitOnNumerics="1" stemEnglishPossessive="1"/>
>       <filter class="solr.LengthFilterFactory" min="2" max="100"/>
>       <filter class="solr.LowerCaseFilterFactory"/><filter
> class="solr.DictionaryCompoundWordTokenFilterFactory"
> dictionary="compoundwords_en.txt" minWordSize="5" minSubwordSize="4"
> maxSubwordSize="15" onlyLongestMatch="true"/>
>       <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords_en.txt"/>
>       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent_en.txt"/>
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.SynonymFilterFactory" synonyms="synonyms_en.txt"
> ignoreCase="true" expand="true"/>
>       <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_en.txt" enablePositionIncrements="true"/>
>       <filter class="solr.WordDelimiterFilterFactory"
> protected="protwords.txt" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1" splitOnNumerics="1" stemEnglishPossessive="1"/>
>       <filter class="solr.LengthFilterFactory" min="2" max="100"/>
>       <filter class="solr.LowerCaseFilterFactory"/><filter
> class="solr.DictionaryCompoundWordTokenFilterFactory"
> dictionary="compoundwords_en.txt" minWordSize="5" minSubwordSize="4"
> maxSubwordSize="15" onlyLongestMatch="true"/>
>       <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords_en.txt"/>
>       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>     </analyzer>
>   </fieldType>/
> 
> <solrQueryParser defaultOperator="AND"/>
> 
> solrconfig.xml
> 
> /<requestHandler name="pinkPony" class="solr.SearchHandler"
> default="true">
>   <lst name="defaults">
>     <str name="defType">edismax</str>
>     <str name="echoParams">explicit</str>
>     <bool name="omitHeader">true</bool>
>     <float name="tie">0.01</float>
> 
>     <int name="timeAllowed">${solr.pinkPony.timeAllowed:-1}</int>
>     <str name="q.alt">*:*</str>
> 
> 
>     <str name="spellcheck">false</str>
> 
>     <str name="spellcheck.onlyMorePopular">true</str>
>     <str name="spellcheck.extendedResults">false</str>
> 
>     <str name="spellcheck.count">1</str>
>   </lst>
>   <arr name="last-components">
>     <str>spellcheck</str>
>   </arr>
> </requestHandler>/
> 
> 
> ANY ideas are appreciated!
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multilingual-indexing-search-results-edismax-and-stopwords-tp4125746.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multilingual indexing, search results, edismax and stopwords

Reply via email to