Package: libpython3.9-minimal Version: 3.9.2-1 Severity: important Dear Maintainer,
*** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? * What exactly did you do (or not do) that was effective (or ineffective)? * What was the outcome of this action? * What outcome did you expect instead? *** End of the template - remove these template lines *** While running getmail which calls this library to download my spam folder from a gmail acct for further processing, I ran across error in header.py. It's triggered when a message contains an invalid unicode sequence. For example: b'Body Revolution - Medico Postura\xe2"\xa2 Body Posture Corrector' Note the double-quote (") in the middle of the unicode sequence! This triggers the following condition: Exception: please read docs/BUGS and include the following information in any bug report: getmail version 6.14 Python version 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] Unhandled exception follows: File "/usr/bin/getmail", line 932, in main success = go(configs, options.idle) File "/usr/bin/getmail", line 244, in go msg = mail_filter.filter_message(msg, retriever) File "/usr/lib/python3/dist-packages/getmailcore/filters.py", line 79, in filter_message exitcode, newmsg, err = self._filter_message(msg) File "/usr/lib/python3/dist-packages/getmailcore/filters.py", line 289, in _filter_message msg.add_header('X-getmail-filter-classifier', line) File "/usr/lib/python3/dist-packages/getmailcore/message.py", line 210, in add_header self.__msg[name] = Header(content.rstrip(), 'utf-8') File "/usr/lib/python3.9/email/header.py", line 217, in __init__ self.append(s, charset, errors) File "/usr/lib/python3.9/email/header.py", line 295, in append s = s.decode(input_charset, errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 32: invalid continuation byte Please also include configuration information from running getmail with your normal options plus "--dump". The code looks like this: if not isinstance(s, str): input_charset = charset.input_codec or 'us-ascii' if input_charset == _charset.UNKNOWN8BIT: s = s.decode('us-ascii', 'surrogateescape') else: s = s.decode(input_charset, errors) I think you may need a try/accept around that last s.decode() function or something to catch this case where it’s invalid utf-8. I don't think this should fail like this. If it's not valid unicode then probably it should default it back to latin-1. I can't think of anything better. -- System Information: Debian Release: 11.0 APT prefers stable-security APT policy: (500, 'stable-security'), (500, 'stable'), (250, 'testing'), (10, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 5.10.0-8-amd64 (SMP w/2 CPU threads) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8), LANGUAGE not set Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libpython3.9-minimal depends on: ii libc6 2.31-13 ii libssl1.1 1.1.1k-1 Versions of packages libpython3.9-minimal recommends: ii libpython3.9-stdlib 3.9.2-1 libpython3.9-minimal suggests no packages. -- no debconf information