[ 
https://issues.apache.org/jira/browse/LUCENE-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996785#comment-16996785
 ] 

Erick Erickson commented on LUCENE-9080:
----------------------------------------

OK, here's the short form. I _think_ that the *gennorm2* and *icupkg* must 
somehow be told to generate version 3 for *utr30.nrm*, and the fresh install of 
those I used is generating  version 4.

[~jpountz] [~rcmuir]  [~jim.ferenczi]  You've all successfully checked in a new 
"utr30.nrm" file, do you remember the magic?

I'll look at this some more this evening unless some kind person says "Just 
export -DGENNROM2VERSION=3" or something.

If this is on the right track, the broader question is why we're still using 
utr30.nrm, assuming there's a utr40.nrm available. One reason I'm obsessing on 
this is that [~dweiss] is heroically trying to port all this over to the gradle 
build and we need a story to tell...

tl;dr;

Normalizer2Impl$isAcceptable _FROM 62.1_ is checking for version 3 (hard coded) 
but "utr30.nrm" has version 4 after regeneration:
{code:java}
private static final class IsAcceptable implements ICUBinary.Authenticate {
    @Override
    public boolean isDataVersionAcceptable(byte version[]) {
        return version[0]==3;
    }
} 

But according to the debugger, version[0] == 4{code}
When I break in the debugger at the above test, path shown for Normalizer2Impl 
is the same version that I expect:

/Users/Erick/.ideaLibSources/icu4j-62.1-sources.jar!/com/ibm/icu/impl/Normalizer2Impl.java

The file being loaded is from here in ICUFoldingFilter (Related to Robert's 
comment from the dev thread I think):
{code:java}
public static final Normalizer2 NORMALIZER = Normalizer2.getInstance(
  // TODO: if the wrong version of the ICU jar is used, loading these data 
files may give a strange error.
  // maybe add an explicit check? 
http://icu-project.org/apiref/icu4j/com/ibm/icu/util/VersionInfo.html
  ICUFoldingFilter.class.getResourceAsStream("utr30.nrm"),
  "utr30", Normalizer2.Mode.COMPOSE);
 {code}
what's weird is that utr30.nrm seems like it's something that _would_ be 
version 3 rather than 4, just from the name. Indeed, if I roll back _just_ that 
file then the test passes.

So I can make this all work by just rolling utr30.nrm back, but that seems like 
a horrible hack, any suggestions?

Here's the stack trace at the failure point:

"TEST-TestICUFoldingFilterFactory.testBogusArguments-seed#[70D45DC4E868A173]"@1,675
 in group "TGRP-TestICUFoldingFilterFactory": RUNNING
isDataVersionAcceptable:423, Normalizer2Impl$IsAcceptable \{com.ibm.icu.impl}
readHeader:603, ICUBinary \{com.ibm.icu.impl}
readHeaderAndDataVersion:556, ICUBinary \{com.ibm.icu.impl}
load:431, Normalizer2Impl \{com.ibm.icu.impl}
createInstance:351, Norm2AllModes$1 \{com.ibm.icu.impl}
createInstance:344, Norm2AllModes$1 \{com.ibm.icu.impl}
getInstance:69, SoftCache \{com.ibm.icu.impl}
getInstance:341, Norm2AllModes \{com.ibm.icu.impl}
getInstance:202, Normalizer2 \{com.ibm.icu.text}
<clinit>:72, ICUFoldingFilter \{org.apache.lucene.analysis.icu}
<init>:51, ICUFoldingFilterFactory \{org.apache.lucene.analysis.icu}
lambda$testBogusArguments$0:59, TestICUFoldingFilterFactory 
\{org.apache.lucene.analysis.icu}
run:-1, 1247295720 
\{org.apache.lucene.analysis.icu.TestICUFoldingFilterFactory$$Lambda$78}
_expectThrows:2849, LuceneTestCase \{org.apache.lucene.util}
expectThrows:2724, LuceneTestCase \{org.apache.lucene.util}
expectThrows:2719, LuceneTestCase \{org.apache.lucene.util}
testBogusArguments:58, TestICUFoldingFilterFactory 
\{org.apache.lucene.analysis.icu}

 

> "ant regenerate" fails on master
> --------------------------------
>
>                 Key: LUCENE-9080
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9080
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: after_regen.patch, before_regen.patch, status.res
>
>
> The root cause is that RamUsageEstimator.NUM_BYTES_INT has been removed and 
> the python scripts still reference it in the generated scripts. That part's 
> easy to fix.
> Last time I looked, though, the regenerate produces some differences in the 
> generated files that should be looked at to insure they're benign.
> Not really sure whether this should be a Lucene or Solr JIRA. Putting it in 
> Lucene since one of the failed files is: 
> lucene/core/src/java/org/apache/lucene/util/packed/Packed8ThreeBlocks.java
> I do know that one of the Solr jflex-produced file has an unexplained 
> difference so it may bleed over.
> "ant regenerate" needs about 24G on my machine FWIW.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to