On our drupal multilingual system we use apache Solr 3.5. The problem is well known on different blogs, sites I read. The search results are not the one we want. On our code in hook apachesolr_query_alter we override the defaultOperator: $query->replaceParam('mm', '90%'); The requirement is, when I search for: biological analyses, I want to fetch only the results which have both of the words. When I search for: biological and chemical analyses, I want it to fetch only the results which have biological , chemical, analyses. The and is not indexed due to stopwords.
If I set mm to 100% and my query has stopwords it will not fetch any result. If I set mm to 100$ and my query does not have stopwords it will fetch the desired results. If I set mm anything between 50%-99% it fetches not wanted results, as results that contain only one of the searched keywords, or words like the searched keywords, like analyse (even if I searched for analyses). If I search using + before the words that are mandatory it works ok, but it is not user friently, to ask from the user to type + before each word exvcept from the stopwords. Do I make any sense? Below are some of our configuration details: All the indexed fields are of type text_language, e.g from our schema.xml /<field name="label" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/> <field name="i18n_label_en" type="text_en" indexed="true" stored="true" termVectors="true" omitNorms="true"/> <field name="i18n_label_fr" type="text_fr" indexed="true" stored="true" termVectors="true" omitNorms="true"/>/ All the text fieldtypes have the same configuration except from the protected, words, dictionary parameters which are language specific. e.g from our schema.xml /<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent_en.txt"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1" splitOnNumerics="1" stemEnglishPossessive="1"/> <filter class="solr.LengthFilterFactory" min="2" max="100"/> <filter class="solr.LowerCaseFilterFactory"/><filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="compoundwords_en.txt" minWordSize="5" minSubwordSize="4" maxSubwordSize="15" onlyLongestMatch="true"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent_en.txt"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_en.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1" splitOnNumerics="1" stemEnglishPossessive="1"/> <filter class="solr.LengthFilterFactory" min="2" max="100"/> <filter class="solr.LowerCaseFilterFactory"/><filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="compoundwords_en.txt" minWordSize="5" minSubwordSize="4" maxSubwordSize="15" onlyLongestMatch="true"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>/ <solrQueryParser defaultOperator="AND"/> solrconfig.xml /<requestHandler name="pinkPony" class="solr.SearchHandler" default="true"> <lst name="defaults"> <str name="defType">edismax</str> <str name="echoParams">explicit</str> <bool name="omitHeader">true</bool> <float name="tie">0.01</float> <int name="timeAllowed">${solr.pinkPony.timeAllowed:-1}</int> <str name="q.alt">*:*</str> <str name="spellcheck">false</str> <str name="spellcheck.onlyMorePopular">true</str> <str name="spellcheck.extendedResults">false</str> <str name="spellcheck.count">1</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>/ ANY ideas are appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/Multilingual-indexing-search-results-edismax-and-stopwords-tp4125746.html Sent from the Solr - User mailing list archive at Nabble.com.