In 4.x and trunk is a close() method on Tokenizers and Filters. In currently released up to 4.3, there is instead a reset(stream) method which is how it resets a Tokenizer&Filter for a following document in the same upload.

In both cases I had to track the first time the tokens are consumed, and do all of the setup then. If you do this, then reset(stream) can clear the native resources, and let you re-load them on the next consume.

Look at LUCENE-2899 in OpenNLPTokenizer and OpenNLPFilter.java to see what I had to do.

But yes, to be absolutely sure, you need to add a finalizer.

On 06/12/2013 04:34 AM, Benson Margulies wrote:
Could I have some help on the combination of these two? Right now, it
appears that I'm stuck with a finalizer to chase after native
resources in a Tokenizer. Am I missing something?

Reply via email to