On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson <jej2...@gmail.com> wrote: > Again thanks. I'll take a stab at that are you aware of any > resources/examples of how to do this? I figured I'd start with > WhiteSpaceTokenizer but wasn't sure if there was a simpler place to > start. >
Well, easiest is if you can build what you need out of existing resources... But if you need to write your own, and If your input is not massive documents/you have no problem processing the whole field in RAM at once, you could try looking at PatternTokenizer for an example: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java -- lucidimagination.com