Inline...
On Sep 11, 2007, at 7:27 AM, Laurent Gilles wrote:
Hi,
I'm actually facing a relevancy issue with multiword synonyms.
Let's expose it by a test case:
Giving the following synonyms definitions:
--------------------------------------------------------------------
capital punishement, death sentence, death penalty
--------------------------------------------------------------------
And a [EMAIL PROTECTED] defined at index time, so the
document:
--------------------------------------------------------------------
The prisoner escaped just before the death sentence had been set.
--------------------------------------------------------------------
Will be indexed like
--------------------------------------------------------------------
The prisoner escaped just before the (death sentence | death penalty |
capital punishment) had been set.
--------------------------------------------------------------------
Now, if a user asks for "capital", the system will match
"capital" (that
could mean 'Paris, capital of France') into the index time synonyms
expanded
document, which doesn't have sense.
I was expecting that in order to match, I'll have to give the entire
expression "capital punishment" to match a document that contains "
death
sentence" and not only a part of the expression.
It seems to be the normal Solr behaviour, but what I'm actually
facing is a
relevance problem with the given results, since a given word
contained in an
expression could have a completely different meaning compared with
the same
isolated word.
Is their a trick or a way to match synonym complete expression and
not the
words the expands have added into documents ?
Ah, the ambiguity of language :-)
I can think of a couple of different suggestions to try:
1. Index your phrase synonyms as a single token, such as
capital_punishment, death_penalty, etc. This requires that you be
able to recognize phrases during indexing and querying, since you
will want to transform capital punishment in your documents to
capital_punishment. Alternatively, you could create a query like
("capital punishment" OR capital_punishment)
2. On the query side, you could produce queries like: capital AND
-"capital punishment"
I don't know your system, but I suppose there is always the chance
that a user searching for capital really does want all occurrences of
capital (assuming no other context) which may cause problems
HTH,
Grant