https://bugs.kde.org/show_bug.cgi?id=448112
Bug ID: 448112 Summary: Parsing broken subjects and other possibly UTF-8-encoded headers Product: kmail2 Version: unspecified Platform: Other OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: general Assignee: kdepim-b...@kde.org Reporter: m...@ratijas.tk Target Milestone: --- SUMMARY Sometimes I'm getting automated emails from systems that could not properly encode multi-line UTF-8 subjects. For example: > Кассовый чек 500 ₽ от «ПАО "ТАТ��ЕЛЕКОМ"» In the source view of that email it was represented in one line as this: > From: "OFD.RU" <nore...@ofd.ru> > Subject: > =?UTF-8?B?0JrQsNGB0YHQvtCy0YvQuSDRh9C10LogNTAw?==?UTF-8?B?IOKCvSDQvtGCIMKr0J/QkNCeICLQotCQ0KLQ?==?UTF-8?B?otCV0JvQldCa0J7QnCLCuw==?= The are two problems, as far as I can tell: 1. It was supposed to be split in multiple lines, after each closing ?= sequence 2. Unicode code-points should not be split across multiple =?UTF-8?B?...?= chunks. But maybe we could make our lives easier by trying to recover broken subjects? At least, we are already doing a good job of recovering from unspecified encoding, such as in this follow-up email I got from my internet provider: > From: <p...@ais.tattelecom.ru> > Subject: ÐвиÑанÑÐ¸Ñ Ð¿Ð¾ оплаÑе ÑÑлÑг ÑвÑзи ÐÐÐ > «ТаÑÑелеком» …which KMail tried hard to «correctly» recover as > Квитанция по оплате услуг связи ПАО «Таттелеком» STEPS TO REPRODUCE 1. Get an email from OFD.RU OBSERVED RESULT Unicode symbols shred into pieces, as in > "ТАТ��ЕЛЕКОМ" EXPECTED RESULT > "ТАТТЕЛЕКОМ" SOFTWARE/OS VERSIONS Operating System: Arch Linux KDE Plasma Version: 5.23.80 KDE Frameworks Version: 5.90.0 Qt Version: 5.15.2 Kernel Version: 5.15.12-arch1-1 (64-bit) Graphics Platform: X11 Processors: 8 × Intel® Core™ i7-6700HQ CPU @ 2.60GHz Memory: 15.6 GiB of RAM Graphics Processor: NVIDIA GeForce GTX 970M/PCIe/SSE2 ADDITIONAL INFORMATION -- You are receiving this mail because: You are watching all bug changes.