Re: ClassicTokenizer

Shawn Heisey Tue, 09 Jan 2018 16:09:03 -0800

On 1/9/2018 9:36 AM, Rick Leir wrote:
> A while ago the default was changed to StandardTokenizer from 
> ClassicTokenizer. The biggest difference seems to be that Classic does not 
> break on hyphens. There is also a different character pr(mumble). I prefer 
> the Classic's non-break on hyphens.


To have any ability to research changes, we're going to need to know
precisely what you mean by "default" in that statement.

Are you talking about the example schemas, or some kind of inherent
default when an analysis chain is not specified?

Probably the reason for the change is an attempt to move into the modern
era, become more standardized, and stop using old/legacy
implementations.  The name of the new default contains the word
"Standard" which would fit in with that goal.

I can't locate any changes in the last couple of years that change the
classic tokenizer to standard.  Maybe I just don't know the right place
to look.

> What was the reason for changing this default? If I understand this better I 
> can avoid some pitfalls, perhaps.

If you are talking about example schemas, then the following may apply:

Because you understand how analysis components work well enough to even
ask your question, I think you're probably the kind of admin who is
going to thoroughly customize the schema and not rely on the defaults
for TextField types that come with Solr.  You're free to continue using
the classic tokenizer in your schema if that meets your needs better
than whatever changes are made to the examples by the devs.  The
examples are only starting points, virtually all Solr installs require
customizing the schema.

Thanks,
Shawn

Re: ClassicTokenizer

Reply via email to