https://bugs.kde.org/show_bug.cgi?id=394750
Bug ID: 394750 Summary: baloo_file fills RAM and disk for hours with no visible progress Product: frameworks-baloo Version: 5.46.0 Platform: Neon Packages OS: Linux Status: UNCONFIRMED Severity: normal Priority: NOR Component: Baloo File Daemon Assignee: baloo-bugs-n...@kde.org Reporter: thaddee....@gmail.com Target Milestone: --- The baloo_file process has been running for five hours and uses about 4±2 GiB of RAM, causing swapping, and not a single file has been indexed yet: $ balooctl -v baloo 5.46.0 $ balooctl status Baloo File Indexer is running Indexer state: Initial Indexing Indexed 0 / 0 files Current size of index is 21.26 GiB $ ps -C baloo_file -o comm,etime,%cpu,%mem,vsz,rss COMMAND ELAPSED %CPU %MEM VSZ RSS baloo_file 05:09:33 43.6 32.5 274650148 3965904 $ ls -lh .local/share/baloo/index -rw-rw-r-- 1 tyl tyl 22G May 27 14:04 .local/share/baloo/index This link suggested I file this bug: https://community.kde.org/Baloo/Debugging. I really like the idea of Baloo, so I wish for it to work a bit better. I don't know how often Baloo works flawlessly. My setup is barely unusual: I have some directories with a million small files (records of Go games obtained from this command: https://github.com/espadrine/badukjs/blob/master/Makefile#L13), and some files which are quite big, like a few Linux .iso. In total, I have about 150 GiB in /home — including the 22 GiB of Baloo index, which is now a significant amount of "0 files indexed". If that large folder and the iso are the files that baloo_file chokes on, could we make Baloo give up if it spends more than 10 seconds on a single file or folder? (An `ls` on the Go games folder takes 11 minutes.) But really, I only care about indexing the contents of my PDFs and LibreOffice documents, and maybe my images. All told, a few thousand files. Philosophically, it makes more sense to whitelist files by type than to index files that are unlikely to be properly read. Looking through the configuration parameters, it looks like files are blacklisted by type. It would make more sense to whitelist them: there are more file types that are unreadable than there are supported ones. Most users only care about indexing of .pdf, .docx and .jpg files, maybe a handful of others. I don't see a use-case for indexing an .iso file. Yet it is neither in excludeFilters nor in excludeMimetypes by default. Aside. Is Baloo indexing file paths themselves? It would be both pretty inefficient and a duplication of effort, since mlocate does it stellarly and yet unnoticeably. /var/lib/mlocate is 98 MiB and `locate *.pdf` takes about a second to run. Could we make Baloo stream its processing? For each file extension in the whitelist we discussed, it would regularly use locate(1) to get them, feed them to the content indexer if they were updated, and that's it. Finally, when Baloo does pointless busywork, it would be welcome to have more debugging tools. balooctl could have a command to debug what baloo_file is currently indexing. -- You are receiving this mail because: You are watching all bug changes.