On Sun, 15 Jul 2012, jeanmichel....@free.fr wrote:

*see* when I make the diff:

1c1
< Hello.
---
Hello.


Yes, this is the issue I consider in this bugreport.

First, the example here-above shows the difference in not visible with a 
regular terminal.

So what? The same happens with spaces. You don't see the spaces because they 
are invisible.
But this is not a bug in diff. Spaces were already invisible before diff 
existed.
diff does not make them more invisible, diff respects them as they are.
And the same happens with BOM. It is respected.

Secondly, when I copy your diff I received by mail to $(cat | od -c) command, 
this gives:
0000000   >       1   c   1  \n   >       <       H   e   l   l   o   .
0000020  \n   >       -   -   -  \n   >       >       H   e   l   l   o
0000040   .  \n
0000042
which just means in one way diff output is not copiable lossless.

No, this is wrong because I didn't try to copy the output losslessly in the 
above example.
I typed the above by hand. Whatever output you might obtain from "od -c" on my 
previous email
does not show anything but the already known fact that BOM is as invisible as 
spaces.

Please forget the contents of my previous email and run "od -c" on text files
created as I explained. You will see that diff output is correct.

For a regular user, this does mean the output of diff is simply not
understandable, in at least two ways (first and second).

You seem to imply that every character that might be in the output of "diff" should be 
"visible".
The fact that spaces are invisible invalidates your theory.

Only advanced geek can understand this diff output.

You would be confused by extra spaces in exactly the same way.
Do you have to be a geek to understand a diff which is caused by extra spaces?

For me, it is because diff output is not compatible with unicode specification.

What do you mean by that? Does unicode specification mandates that BOM should 
be visible?
Please give a reference for that.

but if I redirect the output to a file and use patch, the first file
becomes identical to the second file, so the diff is correct.

This just means that diff is a tool campatible with patch tool. Such a 
compatibility is essential.
But one can expect more from a diff command:
For instance, one can expect output should be copiable by mail
lossless; this is not the case as explained above.

Again, you are drawing conclusions by assuming that I used diff output "as is" 
in my previous email.
This is not the case, I typed it by hand. In fact, I made emphasis on the word 
"see" to indicate
that was *just* what I *saw*.

Third, you suggest to store diff output to file. I wonder how such a file 
should be considered.
Is this an octet stream (binary file) to be handled with byte
oriented tools? It is not what it looks like.  Is this a character
stream (text file) compliant with UTF-8, to be handled with text
tools? I do not think so as explained above, because any byte
sequence is not valid UTF8.
This is ambiguous, and I hope a future version including UTF-8 support make it 
less ambiguous.

Where do you take the idea that diff does not support UTF-8?
A text file including "ñ" or "é" is diffed correctly.
A text file containing BOM is also diffed correctly.

The only problem is that the BOM itself is not visible but I repeat
that diff is not to blame for that.

Are you reporting that BOM is invisible like spaces?
Why should we consider that as a bug in the "diff" program?
If it's invisible, blame the terminal, not diff!

Fourthly, if BOM is invisible like spaces, diff -w should take care of it. This 
is not the case.

The meaning of diff -w is explained in the texinfo manual:

`-w'
`--ignore-all-space'
     Ignore white space when comparing lines.  *Note White Space::.

 "White space" characters include tab, vertical tab,
form feed, carriage return, and space;

Because BOM is neither tab, vertical tab, form meed, carriage return
or space, diff is right in showing a difference for that.


But I can understand that including BOM in the mix would be useful.
I'll consider that as a wishlist item and will forward it upstream.

Five, if we can have a message such as «No newline at end of file»,
we should also have a message such as «No BOM at start of file»/«BOM
at start of file».

That would probably complicate the work for the patch program.


To summarize:

I don't see the point in modifying the output of diff without -w,
because the output is correct.

The only thing I see where diff could improve is in the output of
"diff -w". I'll forward that upstream as wishlist.

Reply via email to