Re: Question about PatternReplace filter and automatic Synonym generation

Shalin Shekhar Mangar Mon, 05 Oct 2009 02:48:28 -0700

On Fri, Oct 2, 2009 at 11:31 PM, Prasanna Ranganathan <
[email protected]> wrote:


>
>  Does the PatternReplaceFilter have an option where you can keep the
> original token in addition to the modified token? From what I looked at it
> does not seem to but I want to confirm the same.
>
>
No, it does not.


> Alternatively, is there a filter available which takes in a pattern and
> produces additional forms of the token depending on the pattern? The use
> case I am looking at here is using such a filter to automate synonym
> generation. In our application, quite a few of the synonym file entries
> match a specific pattern and having such a filter would make it easier I
> believe. Pl. do correct me in case I am missing some unwanted side-effect
> with this approach.
>
>
I do not understand this. TokenFilters are used for things like stemming,
replacing patterns, lowercasing, n-gramming etc. The synonym filter inserts
additional tokens (synonyms) from a file for each token.

What exactly are you trying to do with synonyms? I guess you could do
stemming etc with synonyms but why do you want to do that?


> Continuing on that line, what is the performance hit in having additional
> index-time filters as opposed to using a synonym file with more entries?
> How
> does the overhead of using a bigger synonym file as opposed to additional
> filters compare?
>
>
Note that a change in synonym file needs a re-index of the affected
documents. Also, the synonym map is kept in memory.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Question about PatternReplace filter and automatic Synonym generation

Reply via email to