davido commented on issue #12307: URL: https://github.com/apache/lucene/issues/12307#issuecomment-1807097946
@uschindler Thank you for clarifying and for the links to the specifications. What confused me, is that the problem showed up after two changes: using JDK 21 at runtime, and updating to Lucene 9.8.0. Apparently, starting from Lucene 9.x the Multi-Release JAR files are used: ```bash davido@wizball:~/projects/gerrit/junk/WEB-INF/lib (jdk_21_support %)$ unzip -t lucene-core-9.8.0.jar | grep MemorySegmentIndexInputProvider testing: META-INF/versions/19/org/apache/lucene/store/MemorySegmentIndexInputProvider.class OK testing: META-INF/versions/20/org/apache/lucene/store/MemorySegmentIndexInputProvider.class OK testing: META-INF/versions/21/org/apache/lucene/store/MemorySegmentIndexInputProvider.class OK ``` Clearly, because Bazel currently doesn't have MR-JAR support: [1], trying to merge `lucene-core.jar` and `lucene-backward-codecs.jar` is breaking MR-JAR format, that explains `NoClassDefFoundError` we are seeing. Gerrit Code Review project started to merge `lucene-core` and `backward-codecs` 8 years ago to "understand" index format created by older Lucene releases, so that Gerrit site could be reindexed with new Gerrit releases that is shipping new Lucene version, with this explanation in the commit message: ``` Merge Lucene core and backward-codecs jars Both of these jars provide a provider-configuration file in META-INF/services/org.apache.lucene.codecs.Codec registering their respective implementations as providers of this codec. The proper way to merge these files is to concatenate them, but the normal Buck build process would otherwise choose one arbitrarily. Add a new custom rule merge_maven_jars to merge multiple Maven jars together using a simple Python script. The script concatenates all the entries in two zip files, preferring the entry found in the first file on the command line, which is still arbitrary but at least deterministic. It specially handles files in the META-INF/services directory by concatenating them. Use this new rule to merge the old :core and :backward-codecs rules into a single :core-and-backward-codecs rule. ``` In fact, it's still true for Lucene 9.8.0, where: lucene-backward-codecs/META-INF/services/org.apache.lucene.codecs.Codec ``` org.apache.lucene.backward_codecs.lucene80.Lucene80Codec org.apache.lucene.backward_codecs.lucene84.Lucene84Codec org.apache.lucene.backward_codecs.lucene86.Lucene86Codec org.apache.lucene.backward_codecs.lucene87.Lucene87Codec org.apache.lucene.backward_codecs.lucene70.Lucene70Codec org.apache.lucene.backward_codecs.lucene90.Lucene90Codec org.apache.lucene.backward_codecs.lucene91.Lucene91Codec org.apache.lucene.backward_codecs.lucene92.Lucene92Codec org.apache.lucene.backward_codecs.lucene94.Lucene94Codec ``` and lucene-core/META-INF/services/org.apache.lucene.codecs.Codec ``` org.apache.lucene.codecs.lucene95.Lucene95Codec ``` However, the better question is: why those file need to be merged? To not mess around with MR-JAR file format, wouldn't it e sufficient to just put the `lucene-backward-codecs` and `lucene-core` AS-IS on the classpath? So, I stopped merging the JARs, and preserved the original JARs. Now, the tests are passing, and I was able to reindex gerrit site with latest Lucene release 9.8.0 that was created with previous Lucene release 8.11.2: ``` $ davido@wizball:~/projects/gerrit (jdk_21_support %)$ unzip -t bazel-bin/gerrit.war | grep lucene | grep 9.8.0 testing: WEB-INF/lib/lucene-core-9.8.0.jar OK testing: WEB-INF/lib/lucene-backward-codecs-9.8.0.jar OK testing: WEB-INF/lib/lucene-queryparser-9.8.0.jar OK testing: WEB-INF/lib/lucene-analysis-common-9.8.0.jar OK testing: WEB-INF/lib/lucene-misc-9.8.0.jar OK ``` Am I understanding correctly, that with the recent Lucene releases, the merging of `lucene-backward-codecs` and `lucene-core` JARs is not necessary any more? [1] https://github.com/bazelbuild/bazel/issues/5947 [2] https://gerrit-review.googlesource.com/c/gerrit/+/69850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org