[ 
https://issues.apache.org/jira/browse/LUCENE-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470558#comment-17470558
 ] 

Robert Muir commented on LUCENE-10364:
--------------------------------------

> It was complaining about Character#getNumericValue(): This is a good hint, 
> but in our case we were only using DECIMAL digits. For DecimalDigitFilter 
> this is fine. Maybe rmuir should have a look at the unicode rules processing 
> in GenerateUTR30DataFiles. Please don't see this as "Robert does not know 
> Unicode", I just want to verify that the SuppressWarnings is fine, because I 
> did not understand the code there. The problem is that 
> UCharacter.getNumericValue() returns values outside 0..9 for roman numbers 
> like 50. So adding it to the character '0' (0x30) to generate ASCII digit is 
> not a good idea. DecimalDigitFilter does not do this, but for 
> GenerateUTR30DataFiles I am unsure. So this should be verified!

I didn't write this file, but i may have "touched it last" :)

The code applies UnicodeSet to filter codepoints it works on: 
* 
https://github.com/apache/lucene/blob/main/lucene/analysis/icu/src/tools/java/org/apache/lucene/analysis/icu/GenerateUTR30DataFiles.java#L233-L234
* 
https://github.com/apache/lucene/blob/main/lucene/analysis/icu/src/data/utr30/NativeDigitFolding.txt#L33

You can see the set visually here:

https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%5B%5B%3ANumeric_Type%3DDigit%3A%5D%5B%3ANd%3A%5D%5D+-+%5B%5B%3AChanges_When_NFKC_Casefolded%3DYes%3A%5D%5B%3ABlock%3DSuperscripts_And_Subscripts%3A%5D%5B%5Cu00B2%5Cu00B3%5Cu00B9%5D%5B%5Cu0030-%5Cu0039%5D%5D%5D&g=&i=

The key is the first part of the expression in the set: 
{{[[:Numeric_Type=Digit:][:Nd:]]}}. This logic only operates on DIGITS. There 
is nothing wrong with it.

So to me this check from error-prone is stupid and noisy, and should be 
disabled if possible? (just like the rest of error-prone, sorry)

> Prepare and update errorprone plugin for Java 17
> ------------------------------------------------
>
>                 Key: LUCENE-10364
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10364
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: general/build
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When working on LUCENE-10283 and also SOLR-15876, we figured out that 
> errorprone is now also able to run with Java 17, if we update it and if it 
> runs inside Gradle's JVM. This was caused by the add-opens we did for 
> Spotless previously.
> There is only one case where it does not work: If you run spotless in a 
> forked compiler, because the Gradle options are not applied then. The new 
> Spotless plugin can handle this, but it won't work with our customized build 
> for some reason. So I changed the if clause a bit, so it wont run errorprone 
> if you use a JDK-18 preview build with RUNTIME_JAVA_HOME.
> When updating the rules it also found new bugs, some of them were real 
> problems:
> - some tests were comparing Longs as Floats. The resason for this was when 
> Suggesters changed to use Longs instead of Floats. In a similar way sometimes 
> we assign a long to a float score. The first on was easy to fix by removing 
> the epssilon from the assertEquals, the latter was mostly adding an explicit 
> cast (to make it clear in our scorers)
> - There were also some concurrent modification exceptions possible, i fixed 
> this in test by making a clone before modifying. For those using a TreeMap it 
> was fine.
> - It was complaining about Character#getNumericValue(): This is a good hint, 
> but in our case we were only using DECIMAL digits. For DecimalDigitFilter 
> this is fine. Maybe [~rmuir] should have a look at the unicode rules 
> processing in GenerateUTR30DataFiles. Please don't see this as "Robert does 
> not know Unicode", I just want to verify that the SuppressWarnings is fine, 
> because I did not understand the code there. The problem is that 
> UCharacter.getNumericValue() returns values outside 0..9 for roman numbers 
> like 50. So adding it to the character '0' (0x30) to generate ASCII digit is 
> not a good idea. DecimalDigitFilter does not do this, but for 
> GenerateUTR30DataFiles I am unsure. So this should be verified!
> - Some equals() methods were comparing primitives with Objects.equals(). This 
> causes boxing and should be avoided (although Hotspot removes this after 
> enough iterations)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to