davido commented on issue #12307:
URL: https://github.com/apache/lucene/issues/12307#issuecomment-1807097946

   @uschindler 
   
   Thank you for clarifying and for the links to the specifications.
   
   What confused me, is that the problem showed up after two changes: using JDK 
21 at runtime, and updating to Lucene 9.8.0.
   
   Apparently, starting from Lucene 9.x the Multi-Release JAR files are used:
   
   ```bash
   davido@wizball:~/projects/gerrit/junk/WEB-INF/lib (jdk_21_support %)$ unzip 
-t lucene-core-9.8.0.jar | grep MemorySegmentIndexInputProvider
       testing: 
META-INF/versions/19/org/apache/lucene/store/MemorySegmentIndexInputProvider.class
   OK
       testing: 
META-INF/versions/20/org/apache/lucene/store/MemorySegmentIndexInputProvider.class
   OK
       testing: 
META-INF/versions/21/org/apache/lucene/store/MemorySegmentIndexInputProvider.class
   OK
   ```
   
   Clearly, because Bazel currently doesn't have MR-JAR support: [1], trying to 
merge `lucene-core.jar` and `lucene-backward-codecs.jar` is breaking MR-JAR 
format, that explains `NoClassDefFoundError` we are seeing.
   
   Gerrit Code Review project started to merge `lucene-core` and 
`backward-codecs` 8 years ago to "understand" index format created by older 
Lucene releases, so that Gerrit site could be reindexed with new Gerrit 
releases that is shipping new Lucene version, with this explanation in the 
commit message:
   
   ```
   Merge Lucene core and backward-codecs jars
   
   Both of these jars provide a provider-configuration file in
   META-INF/services/org.apache.lucene.codecs.Codec registering their
   respective implementations as providers of this codec. The proper way
   to merge these files is to concatenate them, but the normal Buck build
   process would otherwise choose one arbitrarily.
   
   Add a new custom rule merge_maven_jars to merge multiple Maven jars
   together using a simple Python script. The script concatenates all the
   entries in two zip files, preferring the entry found in the first file
   on the command line, which is still arbitrary but at least
   deterministic. It specially handles files in the META-INF/services
   directory by concatenating them.
   
   Use this new rule to merge the old :core and :backward-codecs rules
   into a single :core-and-backward-codecs rule.
   ```
   
   In fact, it's still true for Lucene 9.8.0, where:
   
   lucene-backward-codecs/META-INF/services/org.apache.lucene.codecs.Codec
   ```
   org.apache.lucene.backward_codecs.lucene80.Lucene80Codec
   org.apache.lucene.backward_codecs.lucene84.Lucene84Codec
   org.apache.lucene.backward_codecs.lucene86.Lucene86Codec
   org.apache.lucene.backward_codecs.lucene87.Lucene87Codec
   org.apache.lucene.backward_codecs.lucene70.Lucene70Codec
   org.apache.lucene.backward_codecs.lucene90.Lucene90Codec
   org.apache.lucene.backward_codecs.lucene91.Lucene91Codec
   org.apache.lucene.backward_codecs.lucene92.Lucene92Codec
   org.apache.lucene.backward_codecs.lucene94.Lucene94Codec
   ```
   
   and
   
   lucene-core/META-INF/services/org.apache.lucene.codecs.Codec
   ```
   org.apache.lucene.codecs.lucene95.Lucene95Codec
   ```
   
   However, the better question is: why those file need to be merged? To not 
mess around with MR-JAR file format, wouldn't it e sufficient to just put the 
`lucene-backward-codecs` and `lucene-core` AS-IS on the classpath?
   
   So, I stopped merging the JARs, and preserved the original JARs. Now, the 
tests are passing, and I was able to reindex gerrit site with latest Lucene 
release 9.8.0 that was created with previous Lucene release 8.11.2:
   
   ```
     $ davido@wizball:~/projects/gerrit (jdk_21_support %)$ unzip -t 
bazel-bin/gerrit.war | grep lucene | grep 9.8.0
       testing: WEB-INF/lib/lucene-core-9.8.0.jar   OK
       testing: WEB-INF/lib/lucene-backward-codecs-9.8.0.jar   OK
       testing: WEB-INF/lib/lucene-queryparser-9.8.0.jar   OK
       testing: WEB-INF/lib/lucene-analysis-common-9.8.0.jar   OK
       testing: WEB-INF/lib/lucene-misc-9.8.0.jar   OK
   ```
   
   Am I understanding correctly, that with the recent Lucene releases, the 
merging of `lucene-backward-codecs` and `lucene-core` JARs is not necessary any 
more?
   
   [1] https://github.com/bazelbuild/bazel/issues/5947
   [2] https://gerrit-review.googlesource.com/c/gerrit/+/69850
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to