LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

Scott Gonyea Tue, 14 Sep 2010 10:55:50 -0700

Hi,

I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create
tokens, based solely on lower-casing characters.  Is there a way to tell it
NOT to drop non-characters?  It's amazingly frustrating that the
TokenizerFactory and the FilterFactory have two entirely different modes of
behavior.  If I wanted it to tokenize based on non-lower case characters....
wouldn't I use, say, LetterTokenizerFactory and tack on the
LowerCaseFilterFactory?  Or any number of combinations that would otherwise
achieve that specific end-result?


So... Is there a way for me to tell it to NOT split based on non-characters?
 If not, I'd really like to submit a patch to make it behave as
advertised--which is the next best thing to yelling incoherently at the poor
guy who wrote it :).

Scott

LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

Reply via email to