Hi All,


I am trying to use synonyms in solr 3.4 and facing below issue with multiword 
synonyms.



I am using edismax query parser with following fields in qf and pf



qf: name^1.2,name_synonym^0.5

pf: phrase_name^3



The analyzers that I am using for name_synonym is as follows



<fieldType name="text_synonym" class="solr.TextField"

            positionIncrementGap="100">

            <analyzer>

                <tokenizer class="solr.WhitespaceTokenizerFactory"/>

                <filter class="solr.StopFilterFactory"

                    ignoreCase="true" words="stopwords.txt"/>

                <filter class="solr.WordDelimiterFilterFactory"

                    generateWordParts="1" generateNumberParts="1"

                    catenateWords="0" catenateNumbers="0" catenateAll="0"

                    splitOnCaseChange="0" preserveOriginal="0" />

                <filter class="solr.LowerCaseFilterFactory"/>

                <filter class="solr.SynonymFilterFactory" 
synonyms="synonyms.txt" ignoreCase="true" expand="true" 
tokenizerFactory="solr.KeywordTokenizerFactory"/>

                <filter class="solr.LowerCaseFilterFactory"/>

                <filter class="solr.EnglishPorterFilterFactory"

                    protected="protwords.txt"/>

                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

            </analyzer>

</fieldType>



With above configuration the below type of synonyms works fine

foobar => foo bar

FnB => foo and bar

aaa,bbb,ccc





However for following multiword synonym, the dismax query is incorrectly formed 
for qf field

xxx zzz, aaa bbb, mmm nnn, aaabbb





The parsedquery_tostring that gets formed for the query aaabbb is as follows



+(name:aaabbb^1.2 | name_synonym:" xxx zzz aaa bbb mmm (nnn aaabbb)"^0.5)~0.5 
(phrase_name:" xxx zzz aaa bbb mmm (nnn aaabbb)"~5^3.0)~0.5



I am expecting a query like



+(name:aaabbb^1.2 | ((name_synonym:xxx zzz name_synonym:aaa bbb 
name_synonym:mmm nnn name_synonym:aaabbb)^0.5))~0.5



Similarly for query xxx zzz I am getting following parsedquery_tostring from 
dismax



+((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 | 
name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0)~0.5



But I m expecting following query



+((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 | 
name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0 | phrase_name:"aaa 
bbb"~5^3.0 | phrase_name:"mmm nnn"~5^3.0 | phrase_name:"aaabbb"~5^3.0)~0.5





However it's not the case.

Please let me know if I am missing something or its expected behavior. Also 
please let me know what should be done to get my desired output.



Thanks in advance.

Pravin

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to