https://bugs.kde.org/show_bug.cgi?id=444520

--- Comment #6 from tagwer...@innerjoin.org ---
(In reply to Adam Fontenot from comment #5)
> To be clear ... the specific issue I'm seeing here
> is that when the normal indexing resumes (possibly after a reboot, possibly
> not), baloo_file_extractor starts trying to index file content again despite
> that feature being disabled in the settings.
Thanks!

The implication, as I see it, is that baloo_file flags (in its index) that it's
queued a batch of files for content indexing. That feels, well, strange :-/

> The PDF in question is only 20 MB. It's "large" only in the sense that it
> has a ton of indexable text (according to the poppler devs). This suggests
> another pretty obvious heuristic in addition to those I mentioned in the bug
> report on Baloo memory use:
I know there's a rough 10Mbyte limit for .txt and .html, see
https://bugs.kde.org/show_bug.cgi?id=410680#c7. Files larger are not content
indexed. There is an existing rationale for such a limit.

I'm happy to check behaviour if you can generate a test PDF/SVG and
upload/attach it

> If the index for a file grows to be larger than the original file, kill the
> extraction process, add the file to a list of failed files, delete the index
> for it, and don't try indexing the content of the file again. 
I don't think there's an easy relation between the size of the source and the
size of the index. The index contains "lookups", you type a search term and a
list of hits gets pulled off disc. The design decision was for speed; you get a
refined list of hits in Dolphin as you type more letters into the search box or
view your files in folders based on the tags you've given them.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to