[ https://issues.apache.org/jira/browse/LUCENE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470558#comment-17470558 ]
Robert Muir commented on LUCENE-10364: -------------------------------------- > It was complaining about Character#getNumericValue(): This is a good hint, > but in our case we were only using DECIMAL digits. For DecimalDigitFilter > this is fine. Maybe rmuir should have a look at the unicode rules processing > in GenerateUTR30DataFiles. Please don't see this as "Robert does not know > Unicode", I just want to verify that the SuppressWarnings is fine, because I > did not understand the code there. The problem is that > UCharacter.getNumericValue() returns values outside 0..9 for roman numbers > like 50. So adding it to the character '0' (0x30) to generate ASCII digit is > not a good idea. DecimalDigitFilter does not do this, but for > GenerateUTR30DataFiles I am unsure. So this should be verified! I didn't write this file, but i may have "touched it last" :) The code applies UnicodeSet to filter codepoints it works on: * https://github.com/apache/lucene/blob/main/lucene/analysis/icu/src/tools/java/org/apache/lucene/analysis/icu/GenerateUTR30DataFiles.java#L233-L234 * https://github.com/apache/lucene/blob/main/lucene/analysis/icu/src/data/utr30/NativeDigitFolding.txt#L33 You can see the set visually here: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%5B%5B%3ANumeric_Type%3DDigit%3A%5D%5B%3ANd%3A%5D%5D+-+%5B%5B%3AChanges_When_NFKC_Casefolded%3DYes%3A%5D%5B%3ABlock%3DSuperscripts_And_Subscripts%3A%5D%5B%5Cu00B2%5Cu00B3%5Cu00B9%5D%5B%5Cu0030-%5Cu0039%5D%5D%5D&g=&i= The key is the first part of the expression in the set: {{[[:Numeric_Type=Digit:][:Nd:]]}}. This logic only operates on DIGITS. There is nothing wrong with it. So to me this check from error-prone is stupid and noisy, and should be disabled if possible? (just like the rest of error-prone, sorry) > Prepare and update errorprone plugin for Java 17 > ------------------------------------------------ > > Key: LUCENE-10364 > URL: https://issues.apache.org/jira/browse/LUCENE-10364 > Project: Lucene - Core > Issue Type: Bug > Components: general/build > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When working on LUCENE-10283 and also SOLR-15876, we figured out that > errorprone is now also able to run with Java 17, if we update it and if it > runs inside Gradle's JVM. This was caused by the add-opens we did for > Spotless previously. > There is only one case where it does not work: If you run spotless in a > forked compiler, because the Gradle options are not applied then. The new > Spotless plugin can handle this, but it won't work with our customized build > for some reason. So I changed the if clause a bit, so it wont run errorprone > if you use a JDK-18 preview build with RUNTIME_JAVA_HOME. > When updating the rules it also found new bugs, some of them were real > problems: > - some tests were comparing Longs as Floats. The resason for this was when > Suggesters changed to use Longs instead of Floats. In a similar way sometimes > we assign a long to a float score. The first on was easy to fix by removing > the epssilon from the assertEquals, the latter was mostly adding an explicit > cast (to make it clear in our scorers) > - There were also some concurrent modification exceptions possible, i fixed > this in test by making a clone before modifying. For those using a TreeMap it > was fine. > - It was complaining about Character#getNumericValue(): This is a good hint, > but in our case we were only using DECIMAL digits. For DecimalDigitFilter > this is fine. Maybe [~rmuir] should have a look at the unicode rules > processing in GenerateUTR30DataFiles. Please don't see this as "Robert does > not know Unicode", I just want to verify that the SuppressWarnings is fine, > because I did not understand the code there. The problem is that > UCharacter.getNumericValue() returns values outside 0..9 for roman numbers > like 50. So adding it to the character '0' (0x30) to generate ASCII digit is > not a good idea. DecimalDigitFilter does not do this, but for > GenerateUTR30DataFiles I am unsure. So this should be verified! > - Some equals() methods were comparing primitives with Objects.equals(). This > causes boxing and should be avoided (although Hotspot removes this after > enough iterations) -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org