Thanks Robert, worked perfect for the index side of the house. Now on the query side I have a similar Tokenizer, but it's not operating quite the way I want it to. The query tokenizer generates the tokens properly except I'm ending up with a phrase query, i.e. field:"1 2 3 4" when I really want field:1 OR field:2 OR field:3 OR field:4. Is there something in the tokenizer that needs to be set for this to generate this type of query or is it something in the query parser?
On Thu, Feb 9, 2012 at 9:02 PM, Robert Muir <rcm...@gmail.com> wrote: > On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> Again thanks. I'll take a stab at that are you aware of any >> resources/examples of how to do this? I figured I'd start with >> WhiteSpaceTokenizer but wasn't sure if there was a simpler place to >> start. >> > > Well, easiest is if you can build what you need out of existing resources... > > But if you need to write your own, and If your input is not massive > documents/you have no problem processing the whole field in RAM at > once, you could try looking at PatternTokenizer for an example: > > http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java > > -- > lucidimagination.com