CharFilterFactories are applied to the raw input before tokenization.
Each token output from the tokenization is then sent through
the rest of the chain.

The Analysis page available from the Solr admin page is
invaluable in answering in great detail what each part of
an analysis chain does.

TokenFilterFactories are applied to each token emitted from
the tokenizer, and this includes the similar
PatternReplaceFilterFactory. The difference is that the
PatternReplaceCharFilterFactory is applied before tokenization
to the entire input stream and PatternReplaceFilterFactory
is applied to each token emitted by the tokenizer.

And to make it even more fun, you can do both!

Best
Erick

On Wed, Apr 13, 2011 at 8:14 AM, Ben Davies <ben.dav...@gmail.com> wrote:

> Hi there,
>
> Just a quick question that the wiki page (
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem
> to
> answer very well.
>
> Given an analyzer that has  zero or more Char Filter Factories, one
> Tokenizer Factory, and zero or more Token Filter Factories, which value(s)
> are indexed?
>
> Is every value that is produced from each char filter, tokenizer, and
> filter
> indexed?
> Or is the only the final value after completing the whole chain indexed?
>
> Cheers,
> Ben
>

Reply via email to