https://bugs.kde.org/show_bug.cgi?id=432717

            Bug ID: 432717
           Summary: Baloo scans content from too many files
           Product: frameworks-baloo
           Version: unspecified
          Platform: Other
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: Baloo File Daemon
          Assignee: stefan.bru...@rwth-aachen.de
          Reporter: c...@palacio.io
  Target Milestone: ---

I have disabled file content indexing because it not only takes a great toll on
I/O disk usage in my system, but it scans and indexes useless program data
files content.

I have a few Wine prefixes in plain view in unhidden folders in my home, so
quite a lot of data files are accessible to Baloo with a default configuration.
I have caught Baloo scanning and indexing keywords of a Daz Studio data file.
For example:

$ balooshow -x "/home/user/Wine/Daz/drive_c/users/Public/Documents/My DAZ 3D
Library/data/DAZ 3D/Genesis 8/Male/Morphs/DAZ 3D/Base Pose
Head/alias_head_eCTRLEyelidsUpperUp-DownL.dsf"
425600051801229316 2052 99092734
/home/user/Wine/Daz/drive_c/users/Public/Documents/My DAZ 3D Library/data/DAZ
3D/Genesis 8/Male/Morphs/DAZ 3D/Base Pose
Head/alias_head_eCTRLEyelidsUpperUp-DownL.dsf
        Mtime: 1503348208 2017-08-21T15:43:28
        Ctime: 1567044300 2019-08-28T21:05:00
        Cached properties:
                Line Count: 44

Internal Info
Terms: 0.2784314 0.3254902 0.3764706 0.6.0.0 06 1 1.0 2017 203d 208 20head
20pose 21t23 27 34z 3d Mplain Mtext T5 T8 X20-44 alias asset author base
channel colors com contributor controls data daz daz3d description down downl
dsf ectrleyelidsupperup email eyelids eyes file genesis genesis8male group head
http icon id info label large left library male modified modifier modifiers
morphs name parent pose presentation revision scene support target type up
upper url value version website www 
File Name Terms: Falias Fdownl Fdsf Fectrleyelidsupperup Fhead 
XAttr Terms: 
lineCount: 44

I can't imagine the amount of program data it might have indexed from my home
folder.

In my opinion, Baloo should restrict itself to a very limited selection of
files to extract keywords from. There's bug #358098 that is related to this
issue. I disagree strongly with it. Sure, it might interest a few people to
scan more files but that is a potentially harmful default for most users.
Unknown data should be skipped, source code should be skipped. There should be
a more simple default. A extension blacklist isn't the appropiate solution, a
whitelist is.


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Debian unstable
KDE Plasma Version: 5.78.0
KDE Frameworks Version: 5.20.5
Qt Version: 5.15.2

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to