Package: diff
Version: 1:3.0-1
Severity: normal

When diff is used to compare two files which are identical line by line and a 
unique difference which is wether BOM is present or not to indicates UTF-8 
encoding, diff does not indicate the good difference. (Byte Order Mark is one 
kind of space character  which is before the first line, to indicate which 
encoding is in use in this file, as explained in wikipedia).

This can occur after both files were intented to be used with a software which 
is not fully compliant with UTF-8, one file will work the other one no.

The first try to solve this mistery, will be to see what differ between both 
files.
For this reason, the diff command should take this case into consideration 
wether in its man page or in its executable.

In one way, diff compare character by character as it is line oriented.
In the other way, when it appears to know the meaning of newline, it appears to 
not know the meaning of Byte Order Mark, so making a binary comparison.
This makes not clear if diff pretends to work on binary files, on full unicode 
files, or just some kind of intermediate legacy files.

Worse, this also happen with diff -w!


My proposition to solve those bugs:

In case where diff is used without any option, it seams very strange to see 
both identical lines displayed without difference.

This case should at least be documented in the man pages, because as now utf8 
is the encoding used everywhere, the issue will be more and more common, making 
people believe GNU/linux systems to be randomly and mysteriously odd.

When diff is done with some option such as -w or any other name, diff should 
report files identical in this case.

Moreover, a verbose option which explicitely indicates that files differ by 
their BOM might be interesting.

An alternative would be to develop a unicode diff and to make the man page 
recommand people to use a unicode diff instead of a legacy diff in this case.


I believe taking into consideration the BOM is not a big job to do, but it will 
make the life of naive users easier.


-- System Information:
Debian Release: 6.0.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'proposed-updates'), (500, 
'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages diff depends on:
ii  diffutils                     1:3.0-1    File comparison utilities

diff recommends no packages.

diff suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to