cbuescher commented on a change in pull request #1073: LUCENE-9088: JapaneseNumberFilter uses inaccurate PartOfSpeechAttribute URL: https://github.com/apache/lucene-solr/pull/1073#discussion_r356639454
########## File path: lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseNumberFilter.java ########## @@ -218,6 +228,11 @@ public final boolean incrementToken() throws IOException { // capture the state of this token and emit it on our next incrementToken() state = captureState(); } + // we restore state to when we read the last numeral token to get its attributes (e.g. part-of-speech) + if (lastNumeralTokenState != null) { + restoreState(lastNumeralTokenState); Review comment: Note: simply setting the PartOfSpeechAttribute to "noun-numeric" on the emited token wasn't as straight forward as I expected, since the implementation wraps a whole `org.apache.lucene.analysis.ja.Token`. This is why I explored tracking and restoring the last "good" tokens state here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org