Re: git-merge-changelog and EOL format on Windows

Eli Zaretskii Mon, 20 Jan 2014 07:09:41 -0800

> Date: Mon, 20 Jan 2014 03:57:40 +0100 (CET)
> From: Bruno Haible <br...@clisp.org>
> Cc: br...@clisp.org
> 
> > no similar explanation is provided for the output.
> 
> The explanation is the general "Newline Guidelines" from unicode.org:
> <http://www.unicode.org/versions/Unicode6.2.0/ch05.pdf#G10213> p. 156
> rule R3a:
>   "If the intended target is unknown, map to the platform
>    newline convention."
> And rule R4a as well.


I doubt that Unicode Newline Guidelines are a good source in this
case.  They are just general guidelines for when no other higher-level
protocols tell us what is TRT.  But even those guidelines say, in rule
R3:

  If the intended target is known, map NLF, LS, and PS depending on
  the target conventions.

And in this case, the intended target _is_ indeed known: it is a Git
repository that clones an upstream one, where files should preserve
their EOL format, by virtue of the Git core.autocrlf setting, and
because the files from the upstream repository will get tarred as-is
into release tarballs on a Posix platform.

> According to the git documentation <http://git-scm.com/book/ch7-1.html>,
> section "core.autocrlf", the setting
>   git config --global core.autocrlf input
> should be perfect for Windows users.

I don't think the way I set up my Git repositories is the issue here:
after all, I used a documented value of a documented option.  But
since you've raised this...

I'm not sure "core.autocrlf input" is really perfect: the
documentation is unclear about what it does in all kinds of corner
cases, like with files that have CRLF EOL format, such as DOS/Windows
batch files, that are kept in the repository.  In particular, if you
commit changes in such a file, will Git convert its CRLF EOLs into
Unix-style LF-only EOLs?  If it will, that will (a) mark every line
modified, and (b) cause the file to have Unix-style EOLs in the
release tarballs, which will preclude the batch files from working
correctly at least on some versions of DOS/Windows.  Also, what if I
create a new file with CRLF EOLs and commit it -- will it keep its EOL
format?  And what happens if I decide to deliberately convert a
versioned file from Unix to DOS EOL format, or vice versa?

I see no answers to these questions in the documentation, so even if
today Git does work as I expect, it might change that in the future
based on some judgment of the maintainers (especially since msysgit is
maintained separately from the upstream Git, and includes quite a lot
of code that is not in the upstream).  Undocumented behavior is
subject to change without notice.  AFAICT, it already did in the
not-so-far past.

Therefore, I believe "core.autocrlf false" _is_ the right choice, as
it can never do any harm by itself.  It leaves the EOL issue to the
user.  And if you use Emacs as your text editor, editing will preserve
the EOL format of existing files, so files that started as Unix EOL
will stay that way.

> I can understand why you don't want
>   git config --global core.autocrlf true
> (namely if you work a lot with Cygwin / Unix tools).

I don't use Cygwin, and native ports of Unix tools I use have no
trouble with CRLF EOLs.  Emacs certainly doesn't, and neither do GCC,
Grep, Diffutils, Patch, etc.

So Cygwin etc. is not the source of the problem that triggered this
patch.

Anyway, IMO "core.autocrlf true" is even worse than "input",
especially with binary files and files with mixed EOLs.  (Yes, I know
about core.safecrlf, but I don't trust any program to be entirely
"safe" when it futzes with the EOL format, and have enough gray hair
to bear witness.)

What triggered this patch was that I used git-merge-changelog with the
msysgit port of Git for Windows, to cherry-pick a changeset from one
branch to another.  See this discussion on gdb-patches:

  https://sourceware.org/ml/gdb-patches/2014-01/msg00348.html

and its 2 follow-ups for the details.

As you might know, Emacs is on its way to switch to Git as its VCS.
Emacs does have files with CRLF EOLs in its repository, does maintain
at least 2 branches with cherry-picks/merges between them being
routine, and some of its contributors (including myself) work on
MS-Windows.  In bzr, the current VCS used by Emacs, the default EOL
handling is the equivalent of Git's "autocrlf false", so users are
accustomed to this setup, and will most probably want to carry it to
Git.  It would be a pity to have this problem get in the way when we
switch.  I was lucky to discover this issue in time to have it
resolved.

> Given the above general rule, I believe that git-merge-changelog is
> not the only program that outputs CR-LFs on Windows, for input that
> had only LF as line terminator. Therefore setting core.autolf to 'true'
> should be the better solution, compared to modifying git-merge-changelog.

I hope the above convinced you otherwise.  In a nutshell, I think the
EOL issue is best left to the user; programs should not make any
changes there, unless specifically instructed by the user.  The
proposed changes will cause git-merge-changelog behave like that.

In general, it is my experience that preserving EOL format is always
either exactly what the users want or at least the lesser evil.

But all of this shifts the focus of the discussion from a concrete
problem with git-merge-changelog to what we think should be the best
setup of Git on Windows.  I came here to talk about the former, not
the latter.

And even if you still think that setting autocrlf to "input" or "true"
is a better choice on Windows, the changes I suggest will support that
configuration as well, because git-merge-changelog will always
preserve the EOL format, be it Unix or DOS/Windows.  By contrast, the
current code does _not_ support the "false" setting, which is a
legitimate user choice advertised by the Git installer on Windows (the
user is asked at installation time to select one of the 3 possible
values) and by Git documentation.  I don't think it would be good for
git-merge-changelog to refuse to support one of the 3 possible
settings, just because that setting is deemed "wrong" or suboptimal in
some sense by someone.

> > + p = strchr (buf, '\n');
> > +
> > + if (p > buf)
> > +   {
> > +     if (p[-1] == '\r')
> > +       result = true;
> > +   }
> > + }
> 
> If, despite your expectations, the first line is longer than 255 bytes,
> p will be NULL, and with a high probability on Linux/x86 (buf being
> stack-allocated, it is at an address between 0x80000000 and 0xC0000000),
> p > buf will evaluate to true, and the next statement will crash.

In the MinGW32 build, the stack size is 2MB, which makes its top be
very far from the MSB (I get addresses like 0x0028fe10).  So this
problem does not exist there.  And on GNU/Linux, the code in question
will not be compiled.

But if you meant to say that

    if (p && p > buf)

is safer, then I agree.

(Obviously, if you are not going to admit these changes, this is a
moot point, so I hope your mentioning that is a sign that the changes
will be accepted.)

Thanks.

Re: git-merge-changelog and EOL format on Windows

Reply via email to