Re: solr synonyms behaviour

Chris Hostetter Fri, 25 Jan 2008 16:31:10 -0800

: so when i do a debug this is the parsedquery_tostring i see:
: (((text:divorc^0.8 | name:divorc^2.0)~0.01 (text:mediat^0.8 |
: name:mediat^2.0)~0.01)~2) (text:"(divorc altern) (disput mediat)
: resolut"~5^0.8 | name:"(divorc altern) (disput mediat) resolut"~5^2.0)~0.01


FYI: it's very hard to make sense of this kind of information without also 
knowing what your orriginal URL is and how your request handler is 
configured ... i assume you are using dismax and the orriginal request is 
something like...
                  q = divorce mediation
                 qf = text^0.8 name^2
                 pf = text^0.8 name^2
                 ps = 5

...correct?  if so then you didn't cut/paste the full query tostring ... 
there should be "+" in front of that first "("

: Now what i don't understand is how its doing the matching
: 
: Does it mean it will find all matches with either of the words (divorc
: altern), either of the words (disput mediat) (and/or) resolut

anything matching both "divorce" and "mediation" in either the "text" or 
"name" field will be considered a match ... the synonyms aren't affecting 
anything mandatory in the query, because the queyr parser has already 
split the input up before it's analyzed, so the synonyms don't come into 
play at all 

This is "Issue #1" regarding trying to use query time multi word synonyms 
discussed on the wiki...

>> "The Lucene QueryParser tokenizes on white space before giving any 
>> text to the Analyzer, so if a person searches for the words sea biscit 
>> the analyzer will be given the words "sea" and "biscit" seperately, and 
>> will not know that they match a synonym.

on the "boosting" part of the query (where the dismax handler 
automagically quote the entire input and queries it against the "pf" 
fields, the synonyms do get used (because the whole input is analyzed as 
one string) but in this case the phrase queries will match any of these 
phrases...

   divorce dispute resolution
   alternative mediation resolution
   divorce mediation resolution
   etc...

..it will *NOT* match either of these phrases...

   divorce mediation
   alternative dispute resolution

...because the SynonymFilter has no way to tell the query parser which 
words should be linked to which other words when building up the phrase 
query.  

This is "Issue #2" regarding trying to use query time multi word synonyms
discussed on the wiki...

>> Phrase searching (ie: "sea biscit") will cause the QueryParser to pass 
>> the entire string to the analyzer, but if the SynonymFilter is 
>> configured to expand the synonyms, then when the QueryParser gets the  
>> resulting list of tokens back from the Analyzer, it will construct a  
>> MultiPhraseQuery that will not have the desired effect. This is because  
>> of the limited mechanism available for the Analyzer to indicate that 
>> two terms occupy the same position: there is no way to indicate that a  
>> "phrase" occupies the same position as a term. For our example the  
>> resulting MultiPhraseQuery would be "(sea | sea | seabiscuit) (biscuit 
>> | biscit)" which would not match the simple case of "seabisuit" 
>> occuring in a document

: I have the synonym filter only at query time coz i can't re-index data (or
: portion of data) everytime i add a synonym and a couple of other reasons.

Use cases like yours will *never* work as a query time synonym ... hence 
all of the information about multi-word synonyms and the caveats about 
using them in the wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter


-Hoss

Re: solr synonyms behaviour

Reply via email to