https://bugs.kde.org/show_bug.cgi?id=394750

            Bug ID: 394750
           Summary: baloo_file fills RAM and disk for hours with no
                    visible progress
           Product: frameworks-baloo
           Version: 5.46.0
          Platform: Neon Packages
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: Baloo File Daemon
          Assignee: baloo-bugs-n...@kde.org
          Reporter: thaddee....@gmail.com
  Target Milestone: ---

The baloo_file process has been running for five hours and uses about 4±2 GiB
of RAM, causing swapping, and not a single file has been indexed yet:

$ balooctl -v
baloo 5.46.0
$ balooctl status
Baloo File Indexer is running
Indexer state: Initial Indexing
Indexed 0 / 0 files
Current size of index is 21.26 GiB
$ ps -C baloo_file -o comm,etime,%cpu,%mem,vsz,rss
COMMAND             ELAPSED %CPU %MEM    VSZ   RSS
baloo_file         05:09:33 43.6 32.5 274650148 3965904
$ ls -lh .local/share/baloo/index
-rw-rw-r-- 1 tyl tyl 22G May 27 14:04 .local/share/baloo/index

This link suggested I file this bug: https://community.kde.org/Baloo/Debugging.

I really like the idea of Baloo, so I wish for it to work a bit better.

I don't know how often Baloo works flawlessly. My setup is barely unusual: I
have some directories with a million small files (records of Go games obtained
from this command:
https://github.com/espadrine/badukjs/blob/master/Makefile#L13), and some files
which are quite big, like a few Linux .iso. In total, I have about 150 GiB in
/home — including the 22 GiB of Baloo index, which is now a significant amount
of "0 files indexed".

If that large folder and the iso are the files that baloo_file chokes on, could
we make Baloo give up if it spends more than 10 seconds on a single file or
folder? (An `ls` on the Go games folder takes 11 minutes.)

But really, I only care about indexing the contents of my PDFs and LibreOffice
documents, and maybe my images. All told, a few thousand files.

Philosophically, it makes more sense to whitelist files by type than to index
files that are unlikely to be properly read. Looking through the configuration
parameters, it looks like files are blacklisted by type. It would make more
sense to whitelist them: there are more file types that are unreadable than
there are supported ones. Most users only care about indexing of .pdf, .docx
and .jpg files, maybe a handful of others. I don't see a use-case for indexing
an .iso file. Yet it is neither in excludeFilters nor in excludeMimetypes by
default.

Aside. Is Baloo indexing file paths themselves? It would be both pretty
inefficient and a duplication of effort, since mlocate does it stellarly and
yet unnoticeably. /var/lib/mlocate is 98 MiB and `locate *.pdf` takes about a
second to run.

Could we make Baloo stream its processing? For each file extension in the
whitelist we discussed, it would regularly use locate(1) to get them, feed them
to the content indexer if they were updated, and that's it.

Finally, when Baloo does pointless busywork, it would be welcome to have more
debugging tools.
balooctl could have a command to debug what baloo_file is currently indexing.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to