https://bugs.kde.org/show_bug.cgi?id=410680
--- Comment #6 from tagwer...@innerjoin.org --- (In reply to skierpage from comment #5) > ... Now "Design" and "Principles" are indexed 🎉 ... but > still not words later on like "SSLv3" and "CANPENDING" ... You don't get anything from: baloosearch SSLv3 maybe you are stumbling over "case issues"? For me, if I add some unique text in at the end of the test-utf8.html file, baloo finds it... > I strace'd baloo_file of the original non-utf-8 files, and some child > process does one 4096-byte read of the start of the file, then packs it in! > That's why balooo indexed so few terms in the original files; I filed bug > 439857. Yes, I'd say the indexer met a "non-valid character" and stopped. Best consider this file as messed up in terms of encoding 8-] It has "charset" information in an HTTP-EQUIV header line - but commented out. <!--<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> --> However even if I take out the commenting to test whether the indexing recognises the HTTP-EQUIV, the indexing still fails. I feel I don't always see all of baloo's error messages but the trick of running "balooctl purge" means that they start appearing on screen. I then get: Invalid encoding. Ignoring "/home/test/stadyn_largepagewithimages.html" Ideally this file would be flagged "failed to index" -- You are receiving this mail because: You are watching all bug changes.