Package: util-linux
Version: 2.16.1-4
Severity: important
Tags: patch

Attached is a file which may be used to demonstrate the problem.

With a terminal of standard 80 column width, more displays the text
correctly.  The longest line (11, 91 chars in 91×3=273 bytes) is
correctly folded over two lines.

─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────────────
                                                                                
   col 91 ↑
⇒

─────────────────┼──────────┼──────────┼─────────────┼─────────────┼────────────
───────────
                                                                        col 80 ↑

Now, resize the terminal width to over 85 columns, and one sees this:

─────────────────┼──────────┼──────────┼─────────────┼─────────────┼─────────────────
��─────
                                                                             
col 85 ↑

There is a newline inserted after 85 chars, and the first byte of the
following UTF-8 3-byte code is lost (replaced by \n?) leading to
corruption since the following two bytes are now invalid UTF-8.

Why is this happening?

I believe it's partly down to
  #define LINSIZ  256
in text-utils/more.c, since all the UTF-8 characters are 3-byte codes,
256/3 is 85 + 1 remainder.  But there's a bug in the code somewhere
else as well, since not only is it flushing the buffer, it's corrupting
it.

Partial solution: 256 bytes for the line buffer is way too small.  I'd
suggest that for a modern system using UTF-8 1024 bytes would be a
more sensible default, since this would allow use of at least 256 columns
of 4-byte UTF-8 codes.  4096 bytes would be even safer, and since it's
for a single static buffer, the increased overhead is minimal.  I've
built with the following patch and it does prevent the corruption.

There's still the matter of corruption in the case of overflow, which
still would need addressing--the increased buffer size is just hiding
it rather than fixing it.  It should probably only flush up to the end
of the last valid UTF-8 sequence.

diff -urN util-linux-2.16.1.orig/text-utils/more.c 
util-linux-2.16.1/text-utils/more.c
--- util-linux-2.16.1.orig/text-utils/more.c    2009-07-04 00:20:07.000000000 
+0100
+++ util-linux-2.16.1/text-utils/more.c 2009-10-27 11:11:32.046127972 +0000
@@ -107,7 +107,7 @@
 FILE *checkf (char *, int *);
 
 #define TBUFSIZ        1024
-#define LINSIZ 256
+#define LINSIZ 4096
 #define ctrl(letter)   (letter & 077)
 #define RUBOUT '\177'
 #define ESC    '\033'


Regards,
Roger

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (550, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.30-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages util-linux depends on:
ii  dpkg                   1.15.4.1          Debian package management system
ii  initscripts            2.87dsf-8         scripts for initializing and shutt
ii  install-info           4.13a.dfsg.1-5    Manage installed documentation in 
ii  libblkid1              2.16.1-4          block device id library
ii  libc6                  2.10.1-2          GNU C Library: Shared libraries
ii  libncurses5            5.7+20090803-2    shared libraries for terminal hand
ii  libselinux1            2.0.88-1          SELinux runtime shared libraries
ii  libslang2              2.2.1-1           The S-Lang programming library - r
ii  libuuid1               2.16.1-4          Universally Unique ID library
ii  lsb-base               3.2-23            Linux Standard Base 3.2 init scrip
ii  tzdata                 2009o-2           time zone and daylight-saving time
ii  zlib1g                 1:1.2.3.3.dfsg-15 compression library - runtime

util-linux recommends no packages.

Versions of packages util-linux suggests:
ii  console-tools              1:0.2.3dbs-66 Linux console and font utilities
ii  dosfstools                 3.0.6-1       utilities for making and checking 
ii  util-linux-locales         2.16.1-4      Locales files for util-linux

-- no debconf information
psql (8.5devel, server 8.4.1)
WARNING: psql version 8.5, server version 8.4.
         Some psql features might not work.
Type "help" for help.

rleigh=# \pset pager off
Pager usage is off.
rleigh=# \l
                                     List of databases
      Name       │  Owner   │ Encoding │  Collation  │    Ctype    │   Access 
privileges   
─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────────────
 merkelpb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 postgres        │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 projectb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 rleigh          │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 rleigh-amarok   │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 sbuild-packages │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 scratch         │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 scratch2        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 template0       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ 
=c/postgres          ↵
                 │          │          │             │             │ 
postgres=CTc/postgres 
 template1       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ 
=c/postgres          ↵
                 │          │          │             │             │ 
postgres=CTc/postgres 
 test            │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 test2           │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 test3           │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 test4           │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 test5           │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 testp           │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
 vtest           │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │            
           
(17 rows)

Reply via email to