https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #15 from Stefan Brüns <[email protected]> ---
Git commit 9fa1aaaf4a841224161e791cb8ffd366485dc7e3 by Stefan Brüns.
Committed on 06/07/2025 at 18:16.
Pushed by bruns into branch 'master'.

[PlaintextExtractor] Fix various issues with UTF-16

Read the file in binary mode, feed the complete data into QStringDecoder
with the detected encoding, and split the lines last.

Opening a file with open mode "QIODevice::Text" mangles Carriage Return
sequences, and the UTF16-LE sequence "\r\0\n\0" ends up as "\0\n\0", i.e.
an invalid sequence.

QIODevice::readline() only supports 8 bit encodings (see QTBUG 121812),
and the fixup attempts here were not working in general.

Unfortunately, QTextStream::setEncoding only supports UTF encodings,
but none of the legacy ISO-8859 or Windows encodings or e.g. GB18030.

M  +0    -2    autotests/indexerextractortests.cpp
M  +53   -25   src/extractors/plaintextextractor.cpp

https://invent.kde.org/frameworks/kfilemetadata/-/commit/9fa1aaaf4a841224161e791cb8ffd366485dc7e3

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to