Inline...
On Sep 11, 2007, at 7:27 AM, Laurent Gilles wrote:

Hi,



I'm actually facing a relevancy issue with multiword synonyms.



Let's expose it by a test case:



Giving the following synonyms definitions:

--------------------------------------------------------------------

capital punishement, death sentence, death penalty

--------------------------------------------------------------------



And a [EMAIL PROTECTED] defined at index time, so the document:

--------------------------------------------------------------------

The prisoner escaped just before the death sentence had been set.

--------------------------------------------------------------------



Will be indexed like

--------------------------------------------------------------------

The prisoner escaped just before the (death sentence | death penalty |
capital punishment) had been set.

--------------------------------------------------------------------



Now, if a user asks for "capital", the system will match "capital" (that could mean 'Paris, capital of France') into the index time synonyms expanded
document, which doesn't have sense.

I was expecting that in order to match, I'll have to give the entire
expression "capital punishment" to match a document that contains " death
sentence" and not only a part of the expression.



It seems to be the normal Solr behaviour, but what I'm actually facing is a relevance problem with the given results, since a given word contained in an expression could have a completely different meaning compared with the same
isolated word.







Is their a trick or a way to match synonym complete expression and not the
words the expands have added into documents ?


Ah, the ambiguity of language :-)

I can think of a couple of different suggestions to try:
1. Index your phrase synonyms as a single token, such as capital_punishment, death_penalty, etc. This requires that you be able to recognize phrases during indexing and querying, since you will want to transform capital punishment in your documents to capital_punishment. Alternatively, you could create a query like ("capital punishment" OR capital_punishment)

2. On the query side, you could produce queries like: capital AND -"capital punishment"

I don't know your system, but I suppose there is always the chance that a user searching for capital really does want all occurrences of capital (assuming no other context) which may cause problems

HTH,
Grant

Reply via email to