Hi, I've been playing around with using the ICUTokenizer from 4.0.0. Using the code below, I was receiving an ArrayIndexOutOfBounds exception on the call to tokenizer.incrementToken(). Looking at the ICUTokenizer source, I can see why this is occuring (usableLength defaults to -1).
ICUTokenizer tokenizer = new ICUTokenizer(myReader); CharTermAttribute termAtt = tokenizer.getAttribute(CharTermAttribute.class); while(tokenizer.incrementToken()) { System.out.println(termAtt.toString()); } After poking around a little more, I found that I can just call tokenizer.reset() (initializes usableLength to 0) right after constructing the object (org.apache.lucene.analysis.icu.segmentation.TestICUTokenizer does a similar step in it's super class). I was wondering if someone could explain why I need to call tokenizer.reset() prior to using the tokenizer for the first time. Thanks in advance, Shane