Package: diff Version: 1:3.0-1 Severity: normal
When diff is used to compare two files which are identical line by line and a unique difference which is wether BOM is present or not to indicates UTF-8 encoding, diff does not indicate the good difference. (Byte Order Mark is one kind of space character which is before the first line, to indicate which encoding is in use in this file, as explained in wikipedia). This can occur after both files were intented to be used with a software which is not fully compliant with UTF-8, one file will work the other one no. The first try to solve this mistery, will be to see what differ between both files. For this reason, the diff command should take this case into consideration wether in its man page or in its executable. In one way, diff compare character by character as it is line oriented. In the other way, when it appears to know the meaning of newline, it appears to not know the meaning of Byte Order Mark, so making a binary comparison. This makes not clear if diff pretends to work on binary files, on full unicode files, or just some kind of intermediate legacy files. Worse, this also happen with diff -w! My proposition to solve those bugs: In case where diff is used without any option, it seams very strange to see both identical lines displayed without difference. This case should at least be documented in the man pages, because as now utf8 is the encoding used everywhere, the issue will be more and more common, making people believe GNU/linux systems to be randomly and mysteriously odd. When diff is done with some option such as -w or any other name, diff should report files identical in this case. Moreover, a verbose option which explicitely indicates that files differ by their BOM might be interesting. An alternative would be to develop a unicode diff and to make the man page recommand people to use a unicode diff instead of a legacy diff in this case. I believe taking into consideration the BOM is not a big job to do, but it will make the life of naive users easier. -- System Information: Debian Release: 6.0.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'proposed-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores) Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages diff depends on: ii diffutils 1:3.0-1 File comparison utilities diff recommends no packages. diff suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org