Hello, It seems that Tokenizer may violate the contract put forth by the TokenStream.reset function. Specifically, TokenStream.reset states:
"*Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.*" Tokenizer does not do this. Tokenizers can only be reset one time. On subsequent resets IllegalStateReader is swapped in as the Reader, and incrementToken throws an exception. The complication arises because Tokenizer takes a Reader and LUCENE-2387 was filed to intentionally unset the input (Reader) to prevent memory leak. However, unsetting it means we can never read from the Tokenizer a 2nd time (unless you set the Reader again) and thus it violates the contract. Should there be a way to reuse Tokenizers? Thanks, Dan