https://bugs.kde.org/show_bug.cgi?id=444520

--- Comment #5 from Adam Fontenot <adam.m.fontenot+...@gmail.com> ---
(In reply to tagwerk19 from comment #4)
Just a couple of clarifications:
> > Sooner or later, whenever Baloo kicks back in, it may also restart indexing
> > file content, despite being disabled.
> Baloo should really not restart on its own. If disabled it should stay
> disabled - although if the baloo_file process died or was killed it would be
> restarted (at least) at the next logon. 
To be clear, the issue here is that *content* indexing was disabled (the
setting "also index file content") in the file search settings, and
baloo_file_extractor was killed. Normal indexing (just the file name / file
search option) was *not* disabled. And so the specific issue I'm seeing here is
that when the normal indexing resumes (possibly after a reboot, possibly not),
baloo_file_extractor starts trying to index file content again despite that
feature being disabled in the settings.

> I'd also suspect the "very large" PDF being the reason for the large index
> (baloo will write a reverse index entry for each of the "random words"),
> however there are other things that can also trigger the index to balloon in
> size.
The PDF in question is only 20 MB. It's "large" only in the sense that it has a
ton of indexable text (according to the poppler devs). This suggests another
pretty obvious heuristic in addition to those I mentioned in the bug report on
Baloo memory use:

If the index for a file grows to be larger than the original file, kill the
extraction process, add the file to a list of failed files, delete the index
for it, and don't try indexing the content of the file again. 

I realize the reality might be a bit more complicated than "just" doing that,
but at the end of the day Baloo desperately needs some better heuristics given
the large number of resource consumption issues users report with it.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to