On 1/9/2018 9:36 AM, Rick Leir wrote: > A while ago the default was changed to StandardTokenizer from > ClassicTokenizer. The biggest difference seems to be that Classic does not > break on hyphens. There is also a different character pr(mumble). I prefer > the Classic's non-break on hyphens.
To have any ability to research changes, we're going to need to know precisely what you mean by "default" in that statement. Are you talking about the example schemas, or some kind of inherent default when an analysis chain is not specified? Probably the reason for the change is an attempt to move into the modern era, become more standardized, and stop using old/legacy implementations. The name of the new default contains the word "Standard" which would fit in with that goal. I can't locate any changes in the last couple of years that change the classic tokenizer to standard. Maybe I just don't know the right place to look. > What was the reason for changing this default? If I understand this better I > can avoid some pitfalls, perhaps. If you are talking about example schemas, then the following may apply: Because you understand how analysis components work well enough to even ask your question, I think you're probably the kind of admin who is going to thoroughly customize the schema and not rely on the defaults for TextField types that come with Solr. You're free to continue using the classic tokenizer in your schema if that meets your needs better than whatever changes are made to the examples by the devs. The examples are only starting points, virtually all Solr installs require customizing the schema. Thanks, Shawn