[ https://issues.apache.org/jira/browse/LUCENE-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222151#comment-17222151 ]
Robert Muir commented on LUCENE-9588: ------------------------------------- Why would a Tokenizer invoke incrementToken on another tokenfilter? The idea of this class is that it handles the I/O itself, and that the subclass just deals with only word segmentation. So I don't think the methods should declare {{throws IOException}} because its encouraging wrong usage? Of the three "real" subclasses in lucene's source tree, none of them are doing I/O in these methods: * ThaiTokenizer * OpenNLPTokenizer * HMMChineseTokenizer > Exceptions handling in methods of SegmentingTokenizerBase > --------------------------------------------------------- > > Key: LUCENE-9588 > URL: https://issues.apache.org/jira/browse/LUCENE-9588 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 8.6.3 > Reporter: Nguyen Minh Gia Huy > Priority: Minor > > The current interface of *setNextSentence* and *i**ncrementWord* methods in > *SegmentingTokenizerBase* do not define the checked exceptions, which makes > it troublesome to be inherited. > For example, if we override the _incrementWord_ with a logic that invoke > _incrementToken_ on another token filter, the _incrementToken_ raises the > _IOException_ but the _incrementWord_ is not defined to handle it. > I think having _setNextSentence_ and _incrementWord_ handle the IOException > would make the *SegmentingTokenizerBase* easier to be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org