Package: lynx-cur Version: 2.8.9dev1-2From the utf-8(7) manpage: "The Unicode and UCS standards require that producers of UTF‐8 shall use the shortest form possible, for example, producing a two‐byte sequence with first byte 0xc0 is nonconforming. Unicode 3.1 has added the requirement that conforming programs must not accept non‐shortest forms in their input."
But lynx happily accepts such overlong sequences: $ lynx -dump utf8.html If you see this, the parser accepts overlong UTF-8 sequences. -- System Information: Debian Release: jessie/sid APT prefers unstable APT policy: (990, 'unstable'), (500, 'experimental') Architecture: i386 (x86_64) Foreign Architectures: amd64 Kernel: Linux 3.16-2-amd64 (SMP w/2 CPU cores) Locale: LANG=C, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages lynx-cur depends on: ii libbsd0 0.7.0-2 ii libbz2-1.0 1.0.6-7 ii libc6 2.19-11 ii libgcrypt20 1.6.2-3 ii libgnutls-deb0-28 3.3.8-2 ii libidn11 1.29-1 ii libncursesw5 5.9+20140913-1 ii libtinfo5 5.9+20140913-1 ii zlib1g 1:1.2.8.dfsg-2 -- Jakub Wilk
utf8.html.gz
Description: application/gzip