https://bugs.kde.org/show_bug.cgi?id=410680

--- Comment #6 from tagwer...@innerjoin.org ---
(In reply to skierpage from comment #5)
> ... Now "Design" and "Principles" are indexed 🎉 ...  but
> still not words later on like "SSLv3" and "CANPENDING" ...
You don't get anything from:
    baloosearch SSLv3
maybe you are stumbling over "case issues"?

For me, if I add some unique text in at the end of the test-utf8.html file,
baloo finds it...

> I strace'd baloo_file of the original non-utf-8 files, and some child
> process does one 4096-byte read of the start of the file, then packs it in!
> That's why balooo indexed so few terms in the original files; I filed bug
> 439857.
Yes, I'd say the indexer met a "non-valid character" and stopped. Best consider
this file as messed up in terms of encoding 8-]

It has "charset" information in an HTTP-EQUIV header line - but commented out.

   <!--<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
 -->

However even if I take out the commenting to test whether the indexing
recognises the HTTP-EQUIV, the indexing still fails.

I feel I don't always see all of baloo's error messages but the trick of
running "balooctl purge" means that they start appearing on screen. I then get:

    Invalid encoding. Ignoring "/home/test/stadyn_largepagewithimages.html"

Ideally this file would be flagged "failed to index"

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to