On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> Again thanks.  I'll take a stab at that are you aware of any
> resources/examples of how to do this?  I figured I'd start with
> WhiteSpaceTokenizer but wasn't sure if there was a simpler place to
> start.
>

Well, easiest is if you can build what you need out of existing resources...

But if you need to write your own, and If your input is not massive
documents/you have no problem processing the whole field in RAM at
once, you could try looking at PatternTokenizer for an example:

http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java

-- 
lucidimagination.com

Reply via email to