On Fri, Oct 2, 2009 at 11:31 PM, Prasanna Ranganathan < [email protected]> wrote:
> > Does the PatternReplaceFilter have an option where you can keep the > original token in addition to the modified token? From what I looked at it > does not seem to but I want to confirm the same. > > No, it does not. > Alternatively, is there a filter available which takes in a pattern and > produces additional forms of the token depending on the pattern? The use > case I am looking at here is using such a filter to automate synonym > generation. In our application, quite a few of the synonym file entries > match a specific pattern and having such a filter would make it easier I > believe. Pl. do correct me in case I am missing some unwanted side-effect > with this approach. > > I do not understand this. TokenFilters are used for things like stemming, replacing patterns, lowercasing, n-gramming etc. The synonym filter inserts additional tokens (synonyms) from a file for each token. What exactly are you trying to do with synonyms? I guess you could do stemming etc with synonyms but why do you want to do that? > Continuing on that line, what is the performance hit in having additional > index-time filters as opposed to using a synonym file with more entries? > How > does the overhead of using a bigger synonym file as opposed to additional > filters compare? > > Note that a change in synonym file needs a re-index of the affected documents. Also, the synonym map is kept in memory. -- Regards, Shalin Shekhar Mangar.
