[frameworks-baloo] [Bug 380456] Suspected memory leak in baloo_file_extractor

Adam Fontenot Thu, 11 Jul 2024 10:35:40 -0700

https://bugs.kde.org/show_bug.cgi?id=380456


--- Comment #33 from Adam Fontenot <adam.m.fontenot+...@gmail.com> ---
> The way that Baloo provides results for searches so quickly is that it jumps 
> to the word in the database and pulls a page from disk that lists all the 
> files that the word appears in. When you index a file, you extract a list of 
> words, look up each word in the database, get the list of files it appears 
> in, insert  this new file (as an ID rather than filename) into the list and 
> save it back.

This is firmly in the realm of speculative feature requests, but this makes
deletes sound extremely expensive... wouldn't it be much cheaper to save a
hashset of deleted fileIDs, and then remove search results with these IDs
before returning them? You could then clean up the database on a regular basis,
say once a month, or when triggered by a balooctl command. This deleted files
hashset would be small enough to keep in memory and hash table lookups are O(1)
on average, so this wouldn't measurably slow down searches.

I think ordinary development work triggering 500 GB of disk write for a file
indexer is probably not going to be usable for a lot of people.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 380456] Suspected memory leak in baloo_file_extractor

Reply via email to