[
https://issues.apache.org/jira/browse/LUCENE-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996785#comment-16996785
]
Erick Erickson commented on LUCENE-9080:
----------------------------------------
OK, here's the short form. I _think_ that the *gennorm2* and *icupkg* must
somehow be told to generate version 3 for *utr30.nrm*, and the fresh install of
those I used is generating version 4.
[~jpountz] [~rcmuir] [~jim.ferenczi] You've all successfully checked in a new
"utr30.nrm" file, do you remember the magic?
I'll look at this some more this evening unless some kind person says "Just
export -DGENNROM2VERSION=3" or something.
If this is on the right track, the broader question is why we're still using
utr30.nrm, assuming there's a utr40.nrm available. One reason I'm obsessing on
this is that [~dweiss] is heroically trying to port all this over to the gradle
build and we need a story to tell...
tl;dr;
Normalizer2Impl$isAcceptable _FROM 62.1_ is checking for version 3 (hard coded)
but "utr30.nrm" has version 4 after regeneration:
{code:java}
private static final class IsAcceptable implements ICUBinary.Authenticate {
@Override
public boolean isDataVersionAcceptable(byte version[]) {
return version[0]==3;
}
}
But according to the debugger, version[0] == 4{code}
When I break in the debugger at the above test, path shown for Normalizer2Impl
is the same version that I expect:
/Users/Erick/.ideaLibSources/icu4j-62.1-sources.jar!/com/ibm/icu/impl/Normalizer2Impl.java
The file being loaded is from here in ICUFoldingFilter (Related to Robert's
comment from the dev thread I think):
{code:java}
public static final Normalizer2 NORMALIZER = Normalizer2.getInstance(
// TODO: if the wrong version of the ICU jar is used, loading these data
files may give a strange error.
// maybe add an explicit check?
http://icu-project.org/apiref/icu4j/com/ibm/icu/util/VersionInfo.html
ICUFoldingFilter.class.getResourceAsStream("utr30.nrm"),
"utr30", Normalizer2.Mode.COMPOSE);
{code}
what's weird is that utr30.nrm seems like it's something that _would_ be
version 3 rather than 4, just from the name. Indeed, if I roll back _just_ that
file then the test passes.
So I can make this all work by just rolling utr30.nrm back, but that seems like
a horrible hack, any suggestions?
Here's the stack trace at the failure point:
"TEST-TestICUFoldingFilterFactory.testBogusArguments-seed#[70D45DC4E868A173]"@1,675
in group "TGRP-TestICUFoldingFilterFactory": RUNNING
isDataVersionAcceptable:423, Normalizer2Impl$IsAcceptable \{com.ibm.icu.impl}
readHeader:603, ICUBinary \{com.ibm.icu.impl}
readHeaderAndDataVersion:556, ICUBinary \{com.ibm.icu.impl}
load:431, Normalizer2Impl \{com.ibm.icu.impl}
createInstance:351, Norm2AllModes$1 \{com.ibm.icu.impl}
createInstance:344, Norm2AllModes$1 \{com.ibm.icu.impl}
getInstance:69, SoftCache \{com.ibm.icu.impl}
getInstance:341, Norm2AllModes \{com.ibm.icu.impl}
getInstance:202, Normalizer2 \{com.ibm.icu.text}
<clinit>:72, ICUFoldingFilter \{org.apache.lucene.analysis.icu}
<init>:51, ICUFoldingFilterFactory \{org.apache.lucene.analysis.icu}
lambda$testBogusArguments$0:59, TestICUFoldingFilterFactory
\{org.apache.lucene.analysis.icu}
run:-1, 1247295720
\{org.apache.lucene.analysis.icu.TestICUFoldingFilterFactory$$Lambda$78}
_expectThrows:2849, LuceneTestCase \{org.apache.lucene.util}
expectThrows:2724, LuceneTestCase \{org.apache.lucene.util}
expectThrows:2719, LuceneTestCase \{org.apache.lucene.util}
testBogusArguments:58, TestICUFoldingFilterFactory
\{org.apache.lucene.analysis.icu}
> "ant regenerate" fails on master
> --------------------------------
>
> Key: LUCENE-9080
> URL: https://issues.apache.org/jira/browse/LUCENE-9080
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Priority: Major
> Attachments: after_regen.patch, before_regen.patch, status.res
>
>
> The root cause is that RamUsageEstimator.NUM_BYTES_INT has been removed and
> the python scripts still reference it in the generated scripts. That part's
> easy to fix.
> Last time I looked, though, the regenerate produces some differences in the
> generated files that should be looked at to insure they're benign.
> Not really sure whether this should be a Lucene or Solr JIRA. Putting it in
> Lucene since one of the failed files is:
> lucene/core/src/java/org/apache/lucene/util/packed/Packed8ThreeBlocks.java
> I do know that one of the Solr jflex-produced file has an unexplained
> difference so it may bleed over.
> "ant regenerate" needs about 24G on my machine FWIW.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]