https://bugs.kde.org/show_bug.cgi?id=444520
--- Comment #8 from tagwer...@innerjoin.org --- (In reply to Adam Fontenot from comment #7) > Here's the original file that caused the problem: > https://ipfs.io/ipfs/QmVqWhPuQkE7reTN5F9TiSeA75z62VNaZUSFZz3FdWTLbC You are right about the warning... also best not to open the link with a browser that wants to render PDF's itself 8-] Yes, 20MB but the plot content is compressed, as plaintext it could be *very* much larger. It is titled "R Graphics Output" so maybe there's a possibility to recognise such files - even though I'm sure "R" allows you to set a title yourself. > That's a fair point. Let me put it a different way. Good arguments... > ... Perhaps an > option to limit the size of the Baloo cache could be provided: either X GB > or X% of free space. Given the available space, Baloo could manage its > storage to not index files that are less usefully indexed. E.g. if there's > one file that is 20 MB but using 2 GB of index space, it's going to be the > first to go ... I don't know "the internals" well enough to say. I do know that the underlying library (LMDB) is designed withstand normal desktop misuse (killing processes, turning things off in the middle of an update). You can get times when the index grows because a transaction is being appended while another process is reading the index... Another design decision. For the 20MB PDFs, it may be that indexing the first file generates a 2 GB index but the second one only adds a few additional MB. There's no guessing with edge cases... > ... For example, biologists frequently use > plain text "SAM" files, which contain long strings of meaningful but not > indexable text, representing bits of DNA and metadata. E.g. > "ATAGCACTCAAGCAATCAAATCAAATAGCCAACTCCTTATCTCAACTCTCC". These files might be > under 10 MB, and they might have a .sam, .txt, or no extension at all. In this case, I'd hope that SAM files have their own Mimetype (although looks like not... perhaps possible to build a rule if the files follow the "Recommended Practice"). I know the SAM files were just an example but if you _did_ want to index them, you'd hit baloo's "25 character limit" (Bug 412421) :-/ See this with: $ echo "abcdefghijklmnopqrstuvwxyz" > testfile.txt $ balooshow -x testfile.txt 13fc000000fc01 64513 1309696 testfile.txt [/home/user/Documents/testfile.txt] Mtime: 1637231394 2021-11-18T11:29:54 Ctime: 1637231394 2021-11-18T11:29:54 Cached properties: Line Count: 1 Internal Info Terms: Mplain Mtext T5 T8 X20-1 abcdefghijklmnopqrstuvwxy File Name Terms: Ftestfile Ftxt XAttr Terms: lineCount: 1 $ baloosearch abc /home/user/Documents/testfile.txt Elapsed: 0.31964 msecs $ baloosearch abcdefghijklmnopqrstuvwxyz Elapsed: 0.215223 msecs So there's compromises here as well.... In a way it's a question of what you mean by "just works".... -- You are receiving this mail because: You are watching all bug changes.