[GitHub] [lucene-solr] cbuescher commented on a change in pull request #1073: LUCENE-9088: JapaneseNumberFilter uses inaccurate PartOfSpeechAttribute

GitBox Wed, 11 Dec 2019 06:47:08 -0800

cbuescher commented on a change in pull request #1073: LUCENE-9088: 
JapaneseNumberFilter uses inaccurate PartOfSpeechAttribute
URL: https://github.com/apache/lucene-solr/pull/1073#discussion_r356639454


 ##########
 File path: 
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseNumberFilter.java
 ##########
 @@ -218,6 +228,11 @@ public final boolean incrementToken() throws IOException {
         // capture the state of this token and emit it on our next 
incrementToken()
         state = captureState();
       }
+      // we restore state to when we read the last numeral token to get its 
attributes (e.g. part-of-speech)
+      if (lastNumeralTokenState != null) {
+        restoreState(lastNumeralTokenState);
 
 Review comment:
   Note: simply setting the PartOfSpeechAttribute to "noun-numeric" on the 
emited token wasn't as straight forward as I expected, since the implementation 
wraps a whole `org.apache.lucene.analysis.ja.Token`. This is why I explored 
tracking and restoring the last "good" tokens state here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cbuescher commented on a change in pull request #1073: LUCENE-9088: JapaneseNumberFilter uses inaccurate PartOfSpeechAttribute

Reply via email to