Setting the default query operator to AND is the preferred approach:
q.op=AND.
That said, I'm not sure that counting ignored and empty terms towards the mm
% makes sense. IOW, if a term transforms to nothing, either because it is a
stop word or empty synonym replacement or pure punctuation, I don't think it
should count as a term. I think this is worth a Jira.
-- Jack Krupansky
-----Original Message-----
From: kastania44
Sent: Thursday, March 20, 2014 11:00 AM
To: solr-user@lucene.apache.org
Subject: Multilingual indexing, search results, edismax and stopwords
On our drupal multilingual system we use apache Solr 3.5.
The problem is well known on different blogs, sites I read.
The search results are not the one we want.
On our code in hook apachesolr_query_alter we override the defaultOperator:
$query->replaceParam('mm', '90%');
The requirement is, when I search for: biological analyses, I want to fetch
only the results which have both of the words.
When I search for: biological and chemical analyses, I want it to fetch only
the results which have biological , chemical, analyses. The and is not
indexed due to stopwords.
If I set mm to 100% and my query has stopwords it will not fetch any result.
If I set mm to 100$ and my query does not have stopwords it will fetch the
desired results.
If I set mm anything between 50%-99% it fetches not wanted results, as
results that contain only one of the searched keywords, or words like the
searched keywords, like analyse (even if I searched for analyses).
If I search using + before the words that are mandatory it works ok, but it
is not user friently, to ask from the user to type + before each word
exvcept from the stopwords.
Do I make any sense?
Below are some of our configuration details:
All the indexed fields are of type text_language,
e.g from our schema.xml
/<field name="label" type="text" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>
<field name="i18n_label_en" type="text_en" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>
<field name="i18n_label_fr" type="text_fr" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>/
All the text fieldtypes have the same configuration except from the
protected, words, dictionary parameters which are language specific.
e.g from our schema.xml
/<fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent_en.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
preserveOriginal="1" splitOnNumerics="1" stemEnglishPossessive="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/><filter
class="solr.DictionaryCompoundWordTokenFilterFactory"
dictionary="compoundwords_en.txt" minWordSize="5" minSubwordSize="4"
maxSubwordSize="15" onlyLongestMatch="true"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords_en.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent_en.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_en.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
preserveOriginal="1" splitOnNumerics="1" stemEnglishPossessive="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/><filter
class="solr.DictionaryCompoundWordTokenFilterFactory"
dictionary="compoundwords_en.txt" minWordSize="5" minSubwordSize="4"
maxSubwordSize="15" onlyLongestMatch="true"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords_en.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>/
<solrQueryParser defaultOperator="AND"/>
solrconfig.xml
/<requestHandler name="pinkPony" class="solr.SearchHandler"
default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<bool name="omitHeader">true</bool>
<float name="tie">0.01</float>
<int name="timeAllowed">${solr.pinkPony.timeAllowed:-1}</int>
<str name="q.alt">*:*</str>
<str name="spellcheck">false</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>/
ANY ideas are appreciated!
--
View this message in context:
http://lucene.472066.n3.nabble.com/Multilingual-indexing-search-results-edismax-and-stopwords-tp4125746.html
Sent from the Solr - User mailing list archive at Nabble.com.