https://bugs.kde.org/show_bug.cgi?id=434926
--- Comment #13 from nyanpasu64 <nyanpas...@tuta.io> --- After renaming my corrupted database to data.mdb (and keeping a backup copy), I decided to try checking if the corruption occurred in Baloo's memory or if the database was already corrupt on-disk. It's corrupt on-disk. > mdb_dump -s documenttimedb .|pv>/dev/null mdb.c:5856: Assertion 'IS_BRANCH(mc->mc_pg[mc->mc_top])' failed in mdb_cursor_sibling() 10.1MiB 0:00:00 fish: Process 127289, 'mdb_dump' from job 1, 'mdb_dump -s documenttimedb .|pv…' terminated by signal SIGABRT (Abort) > mdb_dump -a .|pv>/dev/null mdb.c:5856: Assertion 'IS_BRANCH(mc->mc_pg[mc->mc_top])' failed in mdb_cursor_sibling() 68.7MiB 0:00:00 Running gdb on both `mdb_dump -s documenttimedb . -f /dev/null` and `mdb_dump -a . -f /dev/null`, I found that the bad page that crashes (a sibling of other midway pages, but holding block-like data) occurs at a *different* file/mmap offset (0x03CAB000) than my initial Baloo crash (0x5925000)! Are they similar or not? (somewhat:) - 0x03CAB000 comes after data containing strings like "Fpathpercent" and "Fpathetique", which I believe was created by TermGenerator::indexFileNameText() inserting "F" + (words appearing in filenames) into LMDB. The page starting at 0x03CAB000 itself has a weak 10-byte periodicity. The 32-bit integer 0x00003CAB (page address >> 12) appears 10 times in the database file. - 0x5925000 exists in a region with a strong 10-byte periodicity both before and after the page starts. The 32-bit integer 0x00005925 appears a whopping 307 times in the file! Seeking Audacity to offset 93474816, I see data with a periodicity of 10. Valid page headers show 2-byte periodicities. This pointer doesn't point to page metadata! Either the page contents were overwritten or never written, or the page pointer was written incorrectly. I haven't tried modifying LMDB to scan the *entire* database, continuing on errors, and logging *all* data inconsistencies. I think that would help gather more data to understand what kind of corruption is happening. (In reply to tagwerk19 from comment #12) > I'm not so sure how/when baloo_file recognises when the index is being > "read" and therefore has to append instead of update however it's clear that > this is happening is you look at Bug 437754 (where you see that a "balooctl > status", which seems to enumerate files to be indexed, means that updates > are "appends" and the index grows dramatically). https://schd.ws/hosted_files/buildstuff14/96/20141120-BuildStuff-Lightning.pdf describes page reclamation. Of note: > LMDB maintains a free list tracking the IDs of unused pages > Old pages are reused as soon as possible, so data volumes don't grow without > bound And if you get this code wrong, it's a fast fast path to data corruption. If I understand correctly, write transactions never erase no-longer-used pages, but only pages abandoned by an *earlier* write transaction if no active readers predate that transaction committing. So an active read transaction, which I assume snapshots the root page and relies on writers to not overwrite the tree it references, prevents writers from reusing pages freed by *all* writes which commit after the read transaction started. So yeah, long-running read transactions cause written unused data to pile up. And since the PDF says "No compaction or garbage collection phase is ever needed", I suspect Baloo's index file size will *never* decrease, even if data gets freed (eg. by closing a long-running read transaction, excluding folders from indexing, deleting files, or turning off content indexing). This is... suboptimal. > > ... I don't know who wrote the corrupted file > I know there was a flood of "corruption" reports (Bug 389848). This issue > was found but the fix left the index corrupt and it became normal to > recommend purging and rebuilding the index (Bug 431664). Yes, still quite a > while ago and the number of these reports is dropping away but it did > resurface when people upgraded from Debian 10 to 11 (which was only the end > of last year) Reading https://www.openldap.org/lists/openldap-devel/201710/msg00019.html, I'm scared of yet another category of corruption: corrupting in-memory data queued in a write transaction, *before being committed to disk*! Does baloo_file have any memory corruption bugs overwriting data with a 10-byte stride? I don't know! > Interesting in that baloo "batches up" its content indexing work (where it > analyses 40 files at a time and writes the results to the index) however it > deals with the initial scan of files it needs to index in a single tranche; > give it a hundred thousand files it needs to index, it will collect the > information for all of them and write the results to the index in one go. > This can be pretty horrible (see Bug 394750) > > No reason that this is a cause but it is a behaviour that might raise the > stakes... This could be fixed separately I assume. > > ... evaluate the performance differences > One of the joys of baloo is it's amazing speed, that you can type a search > string and see the results refine themselves on screen. https://github.com/LumoSQL/LumoSQL claims LMDB is still somewhat faster than SQLite's standard engine (though SQLite is catching up). I trust LMDB less to avoid corrupting data though. -- You are receiving this mail because: You are watching all bug changes.