Re: ClassicTokenizer

Shawn Heisey Wed, 10 Jan 2018 19:09:53 -0800

On 1/10/2018 2:27 PM, Rick Leir wrote:

I did not express that clearly.
The reference guide says "The Classic Tokenizer preserves the same behavior as the 
Standard Tokenizer of Solr versions 3.1 and previous. "


So I am curious to know why they changed StandardTokenizer after 3.1 to break 
on hyphens, when it seems to me to work better the old way?

I really have no idea. Those are Lucene classes, not Solr. Maybesomeone who was around for whatever discussions happened on Lucene listsback in those days will comment.

I wasn't able to find the issue where ClassicTokenizer was created, andI couldn't find any information discussing the change.

If I had to guess why StandardTokenizer was updated this way, I think itis to accommodate searches where people were searching for one word intext where that word was part of something larger with a hyphen, and itwasn't being found. There was probably a discussion among thedevelopers about what a typical Lucene user would want, so they coulddecide what they would have the standard tokenizer do.

Likely because there was a vocal segment of the community reliant on theold behavior, they preserved that behavior in ClassicTokenizer, butupdated the standard one to do what they felt would be normal for atypical user.

Obviously *your* needs do not fall in line with what was decided ... sothe standard tokenizer isn't going to work for you.


Thanks,
Shawn

Re: ClassicTokenizer

Reply via email to