https://bugs.kde.org/show_bug.cgi?id=380456
tagwer...@innerjoin.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tagwer...@innerjoin.org --- Comment #17 from tagwer...@innerjoin.org --- (In reply to Adam Fontenot from comment #16) > ... it is completely unreasonable for a file indexer to ever make a user's > system > unusable. Any time it takes baloo_file_extractor more than 30 seconds to pull > the text out of a file, or it starts using more than 10% of the user's total > RAM, it should be instantly killed and the file blacklisted. Only the file > name > (not contents) should be available to search results ... OOoooo. Ouch! If you look at htop, you'll see that baloo_file and baloo_file_extractor run with minimum priority. They'll yield to nearly everything that wants a CPU. They should take all the time they need without annoying anything else.... Memory usage is different, baloo "memory maps" the index and pulls pages from disc to memory as needed, they'll be "forgotten" again if the RAM is needed (and the pages have not been modified). You might see that baloo_file / baloo_file_extractor use a lot of memory but that can be "just cache". The kicker is when indexing and you're building a *large* transaction, that might take a lot of memory (possibly, alas, stretching to swap). If you kill the process before the commit is done, you're condemning yourself to repeat the work. On a system with Out Of Memory (OOM) protections, you might hit this. You can see a little of what's happening (the switching between reading the source files and writing the updates to the index) with iotop. > ... Surprisingly, the Poppler devs say there's nothing wrong with Poppler here > (despite the fact that their pdftotext tool hangs for over an hour on this > file). That's because the R script which generated it is apparently using > the "I" character repeatedly as part of a graph. I don't know why R does > that, but it does ... I'm tempted to say that if this is a application generated file with little/no human readable information in it (that happens to be a PDF) it would make sense to have an application specific mimetype for it. Then that can be added to baloo's "exclude filters" list. I suspect though that if the file is generated by a script, that might not be possible. > So in general, while there *may* be specific bugs with Baloo that need > fixing or some crazy files that perhaps "shouldn't" exist, the probable > cause of this problem for *most* users is that Baloo simply doesn't give up > on trying to index a file when it really, really should. Baloo does have a mechanism for flagging files as "failed" - "balooctl failed" will list them. I think that needs more love... -- You are receiving this mail because: You are watching all bug changes.