https://bugs.kde.org/show_bug.cgi?id=438455
Bug ID: 438455 Summary: Baloo doesn't index Microsoft Office .doc files Product: frameworks-baloo Version: 5.82.0 Platform: Fedora RPMs OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: Baloo File Daemon Assignee: stefan.bru...@rwth-aachen.de Reporter: skierp...@gmail.com CC: baloo-bugs-n...@kde.org, n...@kde.org Target Milestone: --- SUMMARY `baloosearch` couldn't locate a word processing file with a term in it. It was a .doc file, not .docx or .odt. STEPS TO REPRODUCE 1. In LibreOffice Writer, create a document containing just "baloopleaseindexme" 2. File > Save As in Word 97-2003 format as baloo_indexing_test.doc in some directory that Baloo indexes. 3. In a terminal, run `baloosearch baloopleaseindexme` 4. In a terminal, run `balooshow -x /path/to/baloo_indexing_test.doc OBSERVED RESULT The document contents aren't indexed, so baloosearch for the content fails. balooshow doesn't list any words in the document, just Terms: Mapplication Mmsword T5 X19-0 X20-0 EXPECTED RESULT baloo should index these files as it does .odt and .docx files. SOFTWARE/OS VERSIONS Linux/KDE Plasma: KDE Plasma Version: 5.21.5 KDE Frameworks Version: 5.82.0 Qt Version: 5.15.2 on Wayland ADDITIONAL INFORMATION There are tools to extract text from MSOffice files, e.g. % flatpak run org.libreoffice.LibreOffice --invisible --convert-to txt --outdir /tmp/ /path/to/baloo_indexing_test.doc will convert a .doc file to .txt. And TDF/DocumentLiberation project offers introspection tools like mso-dumper's doc-dump which dumps in some weird XML format. In the interim this limitation should be mentioned somewhere, but I can't see where Baloo describes the file types whose content it does index. I don't know if Baloo indexes contents of other MS Office 1990-2000 formats. Again, I should have to create test files to find out, known limitations should be documented. -- You are receiving this mail because: You are watching all bug changes.