mikemccand commented on PR #12908:
URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850181921

   > Hmmm... I agree we can't expect `BasePostingsFormatTestCase` to catch all 
bw compat problems, but the `TestLucene90PostingsFormat` from this PR writes 
data in the 9.8 format of the terms dictionary, and then reads the data with 
the 9.9 logic, only relying on version checks to read outputs with regular 
vlongs. So it should be exposing the bug?
   
   Oh!  I didn't realize it did this, OK.  Then indeed it has a chance of 
uncovering the bug, but failed to do so :(
   
   Note that the bug was quite hard to repro -- I started with various random 
simple strings, than realistic unicode strings, varying counts, term lengths, 
etc.. (using one Java tool to write the index using 9.8.0, and another to 
randomly search the index using 9.9.0), but bug would not repro after a great 
many iterations.
   
   So then I switched to contiguous slices of `wikibigall` terms, and had to 
also randomize the `WildcardQuery` substring, and finally it repros quite 
readily.
   
   Too many of our tests use unrealistic random terms ... realistic term 
distributions tickle the many `if` statements better.  Does 
`BasePostingsFormatTestCase` index from `LineFileDocs`?  If so, the nightly 
line docs file is quite a bit beefier and may have a better shot at repro?  
(And they are somewhat extracted from `enwiki` text...).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to