[ 
https://issues.apache.org/jira/browse/LUCENE-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222151#comment-17222151
 ] 

Robert Muir commented on LUCENE-9588:
-------------------------------------

Why would a Tokenizer invoke incrementToken on another tokenfilter?

The idea of this class is that it handles the I/O itself, and that the subclass 
just deals with only word segmentation. So I don't think the methods should 
declare {{throws IOException}} because its encouraging wrong usage?

Of the three "real" subclasses in lucene's source tree, none of them are doing 
I/O in these methods:
* ThaiTokenizer
* OpenNLPTokenizer
* HMMChineseTokenizer



> Exceptions handling in methods of SegmentingTokenizerBase
> ---------------------------------------------------------
>
>                 Key: LUCENE-9588
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9588
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 8.6.3
>            Reporter: Nguyen Minh Gia Huy
>            Priority: Minor
>
> The current interface of *setNextSentence* and *i**ncrementWord* methods in 
> *SegmentingTokenizerBase* do not define the checked exceptions, which makes 
> it troublesome to be inherited.
>  For example, if we override the _incrementWord_  with a logic that invoke  
> _incrementToken_ on another token filter, the _incrementToken_ raises the 
> _IOException_ but the _incrementWord_ is not defined to handle it. 
> I think having _setNextSentence_ and _incrementWord_ handle the IOException 
> would make the *SegmentingTokenizerBase* easier to be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to