Shawn I did not express that clearly. The reference guide says "The Classic Tokenizer preserves the same behavior as the Standard Tokenizer of Solr versions 3.1 and previous. "
So I am curious to know why they changed StandardTokenizer after 3.1 to break on hyphens, when it seems to me to work better the old way? Thanks Rick On January 9, 2018 7:07:59 PM EST, Shawn Heisey <apa...@elyograg.org> wrote: >On 1/9/2018 9:36 AM, Rick Leir wrote: >> A while ago the default was changed to StandardTokenizer from >ClassicTokenizer. The biggest difference seems to be that Classic does >not break on hyphens. There is also a different character pr(mumble). I >prefer the Classic's non-break on hyphens. > >To have any ability to research changes, we're going to need to know >precisely what you mean by "default" in that statement. > >Are you talking about the example schemas, or some kind of inherent >default when an analysis chain is not specified? > >Probably the reason for the change is an attempt to move into the >modern >era, become more standardized, and stop using old/legacy >implementations. The name of the new default contains the word >"Standard" which would fit in with that goal. > >I can't locate any changes in the last couple of years that change the >classic tokenizer to standard. Maybe I just don't know the right place >to look. > >> What was the reason for changing this default? If I understand this >better I can avoid some pitfalls, perhaps. > >If you are talking about example schemas, then the following may apply: > >Because you understand how analysis components work well enough to even >ask your question, I think you're probably the kind of admin who is >going to thoroughly customize the schema and not rely on the defaults >for TextField types that come with Solr. You're free to continue using >the classic tokenizer in your schema if that meets your needs better >than whatever changes are made to the examples by the devs. The >examples are only starting points, virtually all Solr installs require >customizing the schema. > >Thanks, >Shawn -- Sorry for being brief. Alternate email is rickleir at yahoo dot com