Bug#763268: lynx-cur: accepts overlong UTF-8 sequences

Jakub Wilk Sun, 28 Sep 2014 11:31:13 -0700

Package: lynx-cur
Version: 2.8.9dev1-2

From the utf-8(7) manpage: "The Unicode and UCS standards require that producers of UTF‐8 shall use the shortest form possible, for example, producing a two‐byte sequence with first byte 0xc0 is nonconforming. Unicode 3.1 has added the requirement that conforming programs must not accept non‐shortest forms in their input."


But lynx happily accepts such overlong sequences:

$ lynx -dump utf8.html
  If you see this, the parser accepts overlong UTF-8 sequences.


-- System Information:
Debian Release: jessie/sid
 APT prefers unstable
 APT policy: (990, 'unstable'), (500, 'experimental')
Architecture: i386 (x86_64)
Foreign Architectures: amd64

Kernel: Linux 3.16-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=C, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages lynx-cur depends on:
ii  libbsd0            0.7.0-2
ii  libbz2-1.0         1.0.6-7
ii  libc6              2.19-11
ii  libgcrypt20        1.6.2-3
ii  libgnutls-deb0-28  3.3.8-2
ii  libidn11           1.29-1
ii  libncursesw5       5.9+20140913-1
ii  libtinfo5          5.9+20140913-1
ii  zlib1g             1:1.2.8.dfsg-2

--
Jakub Wilk

utf8.html.gz
Description: application/gzip

Bug#763268: lynx-cur: accepts overlong UTF-8 sequences

Reply via email to