Hi All,
I am trying to use synonyms in solr 3.4 and facing below issue with multiword
synonyms.
I am using edismax query parser with following fields in qf and pf
qf: name^1.2,name_synonym^0.5
pf: phrase_name^3
The analyzers that I am using for name_synonym is as follows
<fieldType name="text_synonym" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0"
splitOnCaseChange="0" preserveOriginal="0" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"
tokenizerFactory="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
With above configuration the below type of synonyms works fine
foobar => foo bar
FnB => foo and bar
aaa,bbb,ccc
However for following multiword synonym, the dismax query is incorrectly formed
for qf field
xxx zzz, aaa bbb, mmm nnn, aaabbb
The parsedquery_tostring that gets formed for the query aaabbb is as follows
+(name:aaabbb^1.2 | name_synonym:" xxx zzz aaa bbb mmm (nnn aaabbb)"^0.5)~0.5
(phrase_name:" xxx zzz aaa bbb mmm (nnn aaabbb)"~5^3.0)~0.5
I am expecting a query like
+(name:aaabbb^1.2 | ((name_synonym:xxx zzz name_synonym:aaa bbb
name_synonym:mmm nnn name_synonym:aaabbb)^0.5))~0.5
Similarly for query xxx zzz I am getting following parsedquery_tostring from
dismax
+((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 |
name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0)~0.5
But I m expecting following query
+((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 |
name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0 | phrase_name:"aaa
bbb"~5^3.0 | phrase_name:"mmm nnn"~5^3.0 | phrase_name:"aaabbb"~5^3.0)~0.5
However it's not the case.
Please let me know if I am missing something or its expected behavior. Also
please let me know what should be done to get my desired output.
Thanks in advance.
Pravin
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the
property of Persistent Systems Ltd. It is intended only for the use of the
individual or entity to which it is addressed. If you are not the intended
recipient, you are not authorized to read, retain, copy, print, distribute or
use this message. If you have received this communication in error, please
notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.