Suppose your analysis stack includes lower-casing, but your synonyms are only supposed to apply to upper-case tokens. For example, "PET" might be a synonym of "positron emission tomography", but "pet" wouldn't be.

-Mike

On 04/26/2011 09:51 AM, Robert Muir wrote:
On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic
<otis_gospodne...@yahoo.com>  wrote:

But somehow this feels bad (well, so does sticking word variations in what's
supposed to be a synonyms file), partly because it means that the person adding
new synonyms would need to know what they stem to (or always check it against
Solr before editing the file).
when creating the synonym map from your input file, currently the
factory actually uses your Tokenizer only to pre-process the synonyms
file.

One idea would be to use the tokenstream up to the synonymfilter
itself (including filters). This way if you put a stemmer before the
synonymfilter, it would stem your synonyms file, too.

I haven't totally thought the whole thing through to see if theres a
big reason why this wouldn't work (the synonymsfilter is complicated,
sorry). But it does seem like it would produce more consistent
results... and perhaps the inconsistency isnt so obvious since in the
default configuration the synonymfilter is directly after the
tokenizer.

Reply via email to