[ https://issues.apache.org/jira/browse/LUCENE-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996785#comment-16996785 ]
Erick Erickson commented on LUCENE-9080: ---------------------------------------- OK, here's the short form. I _think_ that the *gennorm2* and *icupkg* must somehow be told to generate version 3 for *utr30.nrm*, and the fresh install of those I used is generating version 4. [~jpountz] [~rcmuir] [~jim.ferenczi] You've all successfully checked in a new "utr30.nrm" file, do you remember the magic? I'll look at this some more this evening unless some kind person says "Just export -DGENNROM2VERSION=3" or something. If this is on the right track, the broader question is why we're still using utr30.nrm, assuming there's a utr40.nrm available. One reason I'm obsessing on this is that [~dweiss] is heroically trying to port all this over to the gradle build and we need a story to tell... tl;dr; Normalizer2Impl$isAcceptable _FROM 62.1_ is checking for version 3 (hard coded) but "utr30.nrm" has version 4 after regeneration: {code:java} private static final class IsAcceptable implements ICUBinary.Authenticate { @Override public boolean isDataVersionAcceptable(byte version[]) { return version[0]==3; } } But according to the debugger, version[0] == 4{code} When I break in the debugger at the above test, path shown for Normalizer2Impl is the same version that I expect: /Users/Erick/.ideaLibSources/icu4j-62.1-sources.jar!/com/ibm/icu/impl/Normalizer2Impl.java The file being loaded is from here in ICUFoldingFilter (Related to Robert's comment from the dev thread I think): {code:java} public static final Normalizer2 NORMALIZER = Normalizer2.getInstance( // TODO: if the wrong version of the ICU jar is used, loading these data files may give a strange error. // maybe add an explicit check? http://icu-project.org/apiref/icu4j/com/ibm/icu/util/VersionInfo.html ICUFoldingFilter.class.getResourceAsStream("utr30.nrm"), "utr30", Normalizer2.Mode.COMPOSE); {code} what's weird is that utr30.nrm seems like it's something that _would_ be version 3 rather than 4, just from the name. Indeed, if I roll back _just_ that file then the test passes. So I can make this all work by just rolling utr30.nrm back, but that seems like a horrible hack, any suggestions? Here's the stack trace at the failure point: "TEST-TestICUFoldingFilterFactory.testBogusArguments-seed#[70D45DC4E868A173]"@1,675 in group "TGRP-TestICUFoldingFilterFactory": RUNNING isDataVersionAcceptable:423, Normalizer2Impl$IsAcceptable \{com.ibm.icu.impl} readHeader:603, ICUBinary \{com.ibm.icu.impl} readHeaderAndDataVersion:556, ICUBinary \{com.ibm.icu.impl} load:431, Normalizer2Impl \{com.ibm.icu.impl} createInstance:351, Norm2AllModes$1 \{com.ibm.icu.impl} createInstance:344, Norm2AllModes$1 \{com.ibm.icu.impl} getInstance:69, SoftCache \{com.ibm.icu.impl} getInstance:341, Norm2AllModes \{com.ibm.icu.impl} getInstance:202, Normalizer2 \{com.ibm.icu.text} <clinit>:72, ICUFoldingFilter \{org.apache.lucene.analysis.icu} <init>:51, ICUFoldingFilterFactory \{org.apache.lucene.analysis.icu} lambda$testBogusArguments$0:59, TestICUFoldingFilterFactory \{org.apache.lucene.analysis.icu} run:-1, 1247295720 \{org.apache.lucene.analysis.icu.TestICUFoldingFilterFactory$$Lambda$78} _expectThrows:2849, LuceneTestCase \{org.apache.lucene.util} expectThrows:2724, LuceneTestCase \{org.apache.lucene.util} expectThrows:2719, LuceneTestCase \{org.apache.lucene.util} testBogusArguments:58, TestICUFoldingFilterFactory \{org.apache.lucene.analysis.icu} > "ant regenerate" fails on master > -------------------------------- > > Key: LUCENE-9080 > URL: https://issues.apache.org/jira/browse/LUCENE-9080 > Project: Lucene - Core > Issue Type: Bug > Reporter: Erick Erickson > Assignee: Erick Erickson > Priority: Major > Attachments: after_regen.patch, before_regen.patch, status.res > > > The root cause is that RamUsageEstimator.NUM_BYTES_INT has been removed and > the python scripts still reference it in the generated scripts. That part's > easy to fix. > Last time I looked, though, the regenerate produces some differences in the > generated files that should be looked at to insure they're benign. > Not really sure whether this should be a Lucene or Solr JIRA. Putting it in > Lucene since one of the failed files is: > lucene/core/src/java/org/apache/lucene/util/packed/Packed8ThreeBlocks.java > I do know that one of the Solr jflex-produced file has an unexplained > difference so it may bleed over. > "ant regenerate" needs about 24G on my machine FWIW. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org