https://bugs.kde.org/show_bug.cgi?id=460882
Bug ID: 460882 Summary: Wishlist: Baloo and .eml files Classification: Frameworks and Libraries Product: frameworks-kfilemetadata Version: unspecified Platform: Neon OS: Linux Status: REPORTED Severity: wishlist Priority: NOR Component: general Assignee: pinak.ah...@gmail.com Reporter: tagwer...@innerjoin.org Target Milestone: --- SUMMARY: Baloo can index *very* many terms when indexing an .eml, an exported email message, file. STEPS TO REPRODUCE: Save an email message (one that contains encoded attachments such as images) to a whatever.eml file in a folder being indexed by baloo. Wait for it to be indexed and check the extracted data with balooshow -x whatever.eml OBSERVED RESULTS Likely *very* many indexed terms... Internal Info Terms: + ++ +++ 0 00 000 0000 003qzxnsptaacsfyy7v7za55g 003vrg9mqpdg5l2qwdeyibwrz .... zxdpbmcgq29uzgl0aw9uiglui zxfahckfvzgvgwppcdshla zxfimr20rc5aaj69sbyfxpowo zxfl5frdetlhmgrqwwqltgf4u zxhvjklib4iyhh zxhzk9uciwo0e5gcdkdnj zxidunvh5eo5 zxigd2l0acbwawn0dxjlcybvz zxirdqt1ym5pwnkzsfmfc0djq zxjpzjtjb2xvcjojnem3nkeyi zxjwb2xhdgugdhj1zs9mzw5nd zxk zxk8461lf38mq25nzlaerkcy1 zxknmzleroscscd06vn9ocqec xlektnijydyatjnnj4bzl0kl zxlpmhv zxlyoj7dbvyojfbitx7y6 zxm9ha9xqmkhljd29sk zxmv6raojowfj5bbxkghstgn1 zxnfcojqsjptsbrjdhmwu2v9l zxnjaaaa zxnqx zxodub5 zxpu zxrk9lh8m6t zxrovfd7 I've seen a test .eml generate 60,000 indexed terms. That is a bit unfair on baloo :-/ EXPECTED RESULTS Ideally, the extracted plain/text of the body of the email, possibly with RFC822 header lines being indexed under specific tags (From:, To:, CC:, Subject:, Date: ?) SOFTWARE/OS VERSIONS Neon Unstable Plasma: 5.26.80 Frameworks: 5.100.0 Qt: 5.15.6 ADDITIONAL INFORMATION I think this will require a kfilemetadata extractor for "message/rfc822" type files. There might be established code (munpack?) that can be invoked do the "extraction". -- You are receiving this mail because: You are watching all bug changes.