https://bugs.kde.org/show_bug.cgi?id=444520
--- Comment #6 from tagwer...@innerjoin.org --- (In reply to Adam Fontenot from comment #5) > To be clear ... the specific issue I'm seeing here > is that when the normal indexing resumes (possibly after a reboot, possibly > not), baloo_file_extractor starts trying to index file content again despite > that feature being disabled in the settings. Thanks! The implication, as I see it, is that baloo_file flags (in its index) that it's queued a batch of files for content indexing. That feels, well, strange :-/ > The PDF in question is only 20 MB. It's "large" only in the sense that it > has a ton of indexable text (according to the poppler devs). This suggests > another pretty obvious heuristic in addition to those I mentioned in the bug > report on Baloo memory use: I know there's a rough 10Mbyte limit for .txt and .html, see https://bugs.kde.org/show_bug.cgi?id=410680#c7. Files larger are not content indexed. There is an existing rationale for such a limit. I'm happy to check behaviour if you can generate a test PDF/SVG and upload/attach it > If the index for a file grows to be larger than the original file, kill the > extraction process, add the file to a list of failed files, delete the index > for it, and don't try indexing the content of the file again. I don't think there's an easy relation between the size of the source and the size of the index. The index contains "lookups", you type a search term and a list of hits gets pulled off disc. The design decision was for speed; you get a refined list of hits in Dolphin as you type more letters into the search box or view your files in folders based on the tags you've given them. -- You are receiving this mail because: You are watching all bug changes.