Package: uni2ascii Version: 3.9-1 Severity: normal Tags: patch uni2ascii fails if a read returns a partial result in the middle of a multi-byte UTF-8 sequence. This only happens if the UTF-8 sequence is at least 3 bytes long, because of the detailed logic. Here's a test case:
------------------------------------------------------------ [EMAIL PROTECTED]:~$ (echo -ne '\344\270'; echo -e '\200') | uni2ascii 0x4E00 1 tokens converted out of 2 characters [EMAIL PROTECTED]:~$ (echo -ne '\344\270'; sleep 2; echo -e '\200') | uni2ascii Truncated UTF-8 sequence encountered at byte 0, character 0. ------------------------------------------------------------ The byte sequence '\342\270\200' is a single UTF-8 sequence. In the first command, with no pause, the read returns the expected result. In the second command we insert a pause between the 2nd and 3rd bytes, with a resulting spurious error message. (The bug was found by Chung-Chieh Shan, cc:ed.) Patch attached. I note that the file patched, Get_UTF32_From_UTF8.c, is written by Bill Poser, who is not listed as the author of uni2ascii; perhaps this file came from some other package, and the patch should be passed upstream? Peace, Dylan Thurston -- System Information: Debian Release: testing/unstable APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.17.1 Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Versions of packages uni2ascii depends on: ii libc6 2.3.6-15 GNU C Library: Shared libraries uni2ascii recommends no packages. -- no debconf information
--- uni2ascii-3.9/Get_UTF32_From_UTF8.c 2006-05-11 21:20:37.000000000 -0400 +++ uni2ascii-3.9.new/Get_UTF32_From_UTF8.c 2006-06-28 00:44:18.000000000 -0400 @@ -86,6 +86,7 @@ UTF32 Get_UTF32_From_UTF8 (int fd, int *bytes, unsigned char **bstr) { + int BytesSoFar; int BytesRead; int BytesNeeded; /* Additional bytes after initial byte */ static unsigned char c[6]; @@ -102,9 +103,13 @@ /* Now get the remaining bytes */ BytesNeeded = (int) TrailingBytesForUTF8[c[0]]; - BytesRead = read(fd,(void *) &c[1],(size_t) BytesNeeded); - if(BytesRead != BytesNeeded) return(UTF8_NOTENOUGHBYTES); - *bytes = BytesRead+1; + BytesSoFar = 0; + do { + BytesRead = read(fd,(void *) &c[BytesSoFar+1],(size_t) (BytesNeeded-BytesSoFar)); + BytesSoFar += BytesRead; + } while (BytesRead > 0 || BytesSoFar < BytesNeeded); + if(BytesSoFar != BytesNeeded) return(UTF8_NOTENOUGHBYTES); + *bytes = BytesNeeded+1; *bstr = &c[0]; /* Check validity of source */