mikemccand commented on pull request #128: URL: https://github.com/apache/lucene/pull/128#issuecomment-849677698
OK, I have good news and bad news. Good news first! I wrote a [simple little Python tool](https://github.com/mikemccand/luceneutil/commit/77ef7e6708ccaed7077ef83e009da8e3b91f45ad) to randomly flip a random bit in a random file in a provided directory. Bad news! I ran the tool, confirmed it seems to flip just the one bit, then ran the `CheckIndex` here, and no corruption was detected!! Then I also ran `CheckIndex` from a clean `main` checkout, and we still fail to detect the corruption. WTF? Surely the bit flip would alter the checksum and we should have detected that in `CheckIndex`? Or is it possible `CheckIndex` does not actually fully `checkIntegrity` too? For the record, this is how I ran the new bit-flipper tool: ``` python3 -u /l/util/src/python/flip_random_bit.py /l/indices/trunk.nightly.index.prev.broken/index -seed 7 -real ``` and this is its output: ``` python3 -u /l/util.nightly/src/python/flip_random_bit.py /l/indices/trunk.nightly.index.prev.broken/index -seed 7 -real RANDOM SEED: 0x7 Directory has 302 files: _32.fdm _32.fdt _32.fdx _32.fnm _32.kdd _32.kdi _32.kdm _32.nvd ... _h2.fdm _h2.fdt _h2.fdx _h2.fnm _h2.kdd _h2.kdi _h2.kdm _h2.nvd _h2.nvm _h2.si _h2_Lucene90HnswVectorFormat_0.vec _h2_Lucene90HnswVectorFormat_0.vem _h2_Lucene90HnswVectorFormat_0.vex _h2_Lucene90_0.doc _h2_Lucene90_0.dvd _h2_Lucene90_0.dvm _h2_Lucene90_0.pos _h2_Lucene90_0.tim _h2_Lucene90_0.tip _h2_Lucene90_0.tmd segments_2 write.lock **WARNING**: this tool will soon corrupt bit 39544 (of 152368 bits) in /l/indices/trunk.nightly.index.prev.broken/index/_gm.kdi!!! Be really certain this is what you want... you have 5 seconds to change your mind! 5... 4... 3... 2... 1... **BOOOOOOOM** ``` And then `cmp` and `diff` confirm the file is indeed changed, yet `CheckIndex` (with or without this PR) doesn't catch it. I'll try a few more bit flips. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org