https://bugs.kde.org/show_bug.cgi?id=380456

--- Comment #16 from Adam Fontenot <adam.m.fontenot+...@gmail.com> ---
I actually filed an upstream bug with Poppler for its handling of the specific
PDF file I was seeing issues with.
https://gitlab.freedesktop.org/poppler/poppler/-/issues/1173

Surprisingly, the Poppler devs say there's nothing wrong with Poppler here
(despite the fact that their pdftotext tool hangs for over an hour on this
file). That's because the R script which generated it is apparently using the
"I" character repeatedly as part of a graph. I don't know why R does that, but
it does.

Quoting the dev response:

> whether this bug is fixed or not baloo needs to understand that extracting 
> the 
> text of a pdf file can take forever, and thus give up after X seconds/minutes

Obviously this is not going to correspond to everyone's issues, but it's an
interesting example of the point I made:

> it is completely unreasonable for a file indexer to ever make a user's system 
> unusable. Any time it takes baloo_file_extractor more than 30 seconds to pull 
> the text out of a file, or it starts using more than 10% of the user's total 
> RAM, it should be instantly killed and the file blacklisted. Only the file 
> name 
> (not contents) should be available to search results.

So in general, while there *may* be specific bugs with Baloo that need fixing
or some crazy files that perhaps "shouldn't" exist, the probable cause of this
problem for *most* users is that Baloo simply doesn't give up on trying to
index a file when it really, really should.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to