Am Donnerstag, 16. Oktober 2014, 14:20:06 schrieb Luca Beltrame: > In data giovedì 16 ottobre 2014 14:15:15, Martin Gräßlin ha scritto: > > genome data is really huge wouldn't it make sense to go rather for file > > size or abort the indexing if it's obvious random gibberish? > > As the person who mentioned this first (hey, I'm famous ;), I'm guessing > that limiting on file size would work in principle. > > For reference on the sizes, these kind of files range from tens of M to a > few G. Perhaps a size cutoff would work without no longer indexing > everything (which IMO is a nice feature and shouldn't be disabled).
Could limiting on filesize also be done like this: Just index the first say 100 KiB or so of a file – instead of not indexing it at all? And in search results probably include a hint it has only been partially indexed? Or would that be worse than not indexing at all in that case? For my file index I currently have: martin@merkaba:~/.local/share/baloo> LANG=C du -sch file/* | sort -rh 1.2G total 638M file/position.DB 250M file/postlist.DB 160M file/termlist.DB 103M file/fileMap.sqlite3 2.5M file/fileMap.sqlite3-wal 19M file/record.DB 4.0K file/termlist.baseB 4.0K file/termlist.baseA 4.0K file/record.baseB 4.0K file/record.baseA 4.0K file/postlist.baseB 4.0K file/postlist.baseA 4.0K file/iamchert 32K file/fileMap.sqlite3-shm 12K file/position.baseB 12K file/position.baseA 0 file/flintlock Thats less than the last Nepomuk index: martin@merkaba:~/.kde/share/apps/nepomuk/repository/main/data/virtuosobackend> LANG=C du -sch * | sort -rh 3.1G total 3.1G soprano-virtuoso.db 2.1M soprano-virtuoso.log 8.0K soprano-virtuoso-temp.db 20K missed_flush.txt 0 soprano-virtuoso.trx 0 soprano-virtuoso.pxa 0 soprano-virtuoso.lock And as its still performant, I wouldn´t care if it indexed some nice *.txt or source files :). Actually I think I would like to be able to fulltext search in these. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 _______________________________________________ Plasma-devel mailing list Plasma-devel@kde.org https://mail.kde.org/mailman/listinfo/plasma-devel