[ https://issues.apache.org/jira/browse/LUCENE-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222651#comment-17222651 ]
Nguyen Minh Gia Huy commented on LUCENE-9588: --------------------------------------------- My original statement *_a Tokenizer invoke incrementToken on another tokenfilter_* could be misleading. To make it clear, it may invoke incrementToken on another *Tokenizer.* The existing sub-classes of SegmentingTokenizerBase handle the word segmentation without having to be aware of I/O exception but it's not always the case. Word segmentation sometimes requires I/O-aware e.g. tokenize a japanese sentence using [JapaneseTokenizer|https://github.com/apache/lucene-solr/blob/master/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizer.java#L526] Additionally, this method [incrementSentence|https://github.com/apache/lucene-solr/blob/9ce4b98af2155ba9d6d41e12ff12017c557a9ea4/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/SegmentingTokenizerBase.java#L174-L195] is currently defined to throw IO exception but none of the statement inside it throw IO exception. Isn't it a signal that either (1) IO exception is unnecessary for *incrementSentence* or (2) *setNextSentence* and *incrementWord* should throw IO exception ? > Exceptions handling in methods of SegmentingTokenizerBase > --------------------------------------------------------- > > Key: LUCENE-9588 > URL: https://issues.apache.org/jira/browse/LUCENE-9588 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 8.6.3 > Reporter: Nguyen Minh Gia Huy > Priority: Minor > > The current interface of *setNextSentence* and *i**ncrementWord* methods in > *SegmentingTokenizerBase* do not define the checked exceptions, which makes > it troublesome to be inherited. > For example, if we override the _incrementWord_ with a logic that invoke > _incrementToken_ on another token filter, the _incrementToken_ raises the > _IOException_ but the _incrementWord_ is not defined to handle it. > I think having _setNextSentence_ and _incrementWord_ handle the IOException > would make the *SegmentingTokenizerBase* easier to be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org