Re: Synonyms and stemming revisited

Chris Hostetter Sat, 06 Sep 2008 18:47:58 -0700

: I see two solutions:
: 
: Either put all possible endings in the synonym file - I do not really
: like this solution, as it would make the file very large, and it also
: is too easy to miss some specific ending. Or run the stemmer before
: the synonym filter, in which case the synonym definitions need to
: appear in their stemmed forms. Am I missing something, or does the


Based on my understanding of your description of your problem, i think i 
agree with you.

If i've given differnet advice in the past, I'm sure i had a good reason 
for -- possible due to some aspect of those problems that are subtly 
differnet then yours ... can you post links to hte specific messages 
you're refering to, it might help jog my memory.

: conversion of the synonym text file need to be done by hand at the
: moment? I suppose that it would not be too difficult to write some

A recently added feature is that when configuring SynonymFilterFactory 
you can give it the name of a TokenizerFactory to use when parsing the 
synonym file.  This could be used to stem words *if* you write a 
TokenizerFactory that calls out to your Stemmer.

(see SOLR-319 for the backround on why you can only specify a Tokenizer 
and not a full "fieldType" to get the analysis chain from ... in a 
nutshell: 1. it would have been harder to implement; 2. the only use cases 
people could think of where Tokenization based.)


-Hoss

Re: Synonyms and stemming revisited

Reply via email to