Hi, I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create tokens, based solely on lower-casing characters. Is there a way to tell it NOT to drop non-characters? It's amazingly frustrating that the TokenizerFactory and the FilterFactory have two entirely different modes of behavior. If I wanted it to tokenize based on non-lower case characters.... wouldn't I use, say, LetterTokenizerFactory and tack on the LowerCaseFilterFactory? Or any number of combinations that would otherwise achieve that specific end-result?
So... Is there a way for me to tell it to NOT split based on non-characters? If not, I'd really like to submit a patch to make it behave as advertised--which is the next best thing to yelling incoherently at the poor guy who wrote it :). Scott