rahulgoswami commented on code in PR #14607:
URL: https://github.com/apache/lucene/pull/14607#discussion_r2476449164


##########
lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java:
##########
@@ -402,11 +389,38 @@ private static void parseSegmentInfos(
     }
 
     long totalDocs = 0;
+
     for (int seg = 0; seg < numSegments; seg++) {
       String segName = input.readString();
       byte[] segmentID = new byte[StringHelper.ID_LENGTH];
       input.readBytes(segmentID, 0, segmentID.length);
-      Codec codec = readCodec(input);
+      Codec codec = null;
+      try {
+        codec = readCodec(input);

Review Comment:
   >you seemed to be saying your system would be able to read an index for 
which we don't have any backward codec available ("As such, we are able to read 
the SegmentInfos file and extract the indexCreatedVersionMajor which is how it 
was done previously"), but how can that be?
   
   Not the index, but looks like the nightly test failure suggests we can at 
least read the 'indexCreatedVersion' from SegmentInfos (aka 'segments_*' file) 
written in 7x without the codec present. If you see there are only two places 
at which readCommit() throws IndexFormatTooOldException. The previous nightly 
failure occurred because the below check was removed
   ```
   if (indexCreatedVersion < minSupportedMajorVersion) {...}
   ``` 
   which led to checking individual segments in parseSegmentInfos() (which is 
what we intend). I think it is in reading the segment level metadata/data where 
it needs the codec (`Codec codec = readCodec(input)`) which is where the 
failure occurred.  
   Which also means even the original implementation assumes that we'll always 
be able to read the "indexCreatedVersion" from SegmentInfos once we make it 
past the first check in readCommit() for too old index (`if (magic != 
CodecUtil.CODEC_MAGIC)`)
   
   Exception from nightly for reference:
   ```
       java.lang.IllegalArgumentException: Could not load codec 'Lucene70'. Did 
you forget to add lucene-backward-codecs.jar?
           at 
__randomizedtesting.SeedInfo.seed([14FBA2DCF615D67D:EF0090FF468E7121]:0)
           at 
[email protected]/org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:532)
           at 
[email protected]/org.apache.lucene.index.SegmentInfos.parseSegmentInfos(SegmentInfos.java:397)
           at 
[email protected]/org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:354)
   ```
   
   <br/>
   
   > I guess with your change we can still read indexes that were originally 
created with 8x and 9x (which would not have been possible before) as long as 
their segments have been rewritten by 10x or 11x
   
   Exactly
   
   > but if the index was created by 7x or earlier we would not. 
   
   Going by the above explanation, I think at least for index created by 7x, we 
_will_ be able to read, provided the segments have been rewritten by 10x or 11x.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to