Re: Automatic synonyms for multiple variations of a word

Mike Sokolov Tue, 26 Apr 2011 13:14:21 -0700

Suppose your analysis stack includes lower-casing, but your synonyms areonly supposed to apply to upper-case tokens. For example, "PET" mightbe a synonym of "positron emission tomography", but "pet" wouldn't be.


-Mike


On 04/26/2011 09:51 AM, Robert Muir wrote:

On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic
<otis_gospodne...@yahoo.com>  wrote:

But somehow this feels bad (well, so does sticking word variations in what's
supposed to be a synonyms file), partly because it means that the person adding
new synonyms would need to know what they stem to (or always check it against
Solr before editing the file).

when creating the synonym map from your input file, currently the
factory actually uses your Tokenizer only to pre-process the synonyms
file.

One idea would be to use the tokenstream up to the synonymfilter
itself (including filters). This way if you put a stemmer before the
synonymfilter, it would stem your synonyms file, too.

I haven't totally thought the whole thing through to see if theres a
big reason why this wouldn't work (the synonymsfilter is complicated,
sorry). But it does seem like it would produce more consistent
results... and perhaps the inconsistency isnt so obvious since in the
default configuration the synonymfilter is directly after the
tokenizer.

Re: Automatic synonyms for multiple variations of a word

Reply via email to