> Date: Mon, 20 Jan 2014 03:57:40 +0100 (CET) > From: Bruno Haible <br...@clisp.org> > Cc: br...@clisp.org > > > no similar explanation is provided for the output. > > The explanation is the general "Newline Guidelines" from unicode.org: > <http://www.unicode.org/versions/Unicode6.2.0/ch05.pdf#G10213> p. 156 > rule R3a: > "If the intended target is unknown, map to the platform > newline convention." > And rule R4a as well.
I doubt that Unicode Newline Guidelines are a good source in this case. They are just general guidelines for when no other higher-level protocols tell us what is TRT. But even those guidelines say, in rule R3: If the intended target is known, map NLF, LS, and PS depending on the target conventions. And in this case, the intended target _is_ indeed known: it is a Git repository that clones an upstream one, where files should preserve their EOL format, by virtue of the Git core.autocrlf setting, and because the files from the upstream repository will get tarred as-is into release tarballs on a Posix platform. > According to the git documentation <http://git-scm.com/book/ch7-1.html>, > section "core.autocrlf", the setting > git config --global core.autocrlf input > should be perfect for Windows users. I don't think the way I set up my Git repositories is the issue here: after all, I used a documented value of a documented option. But since you've raised this... I'm not sure "core.autocrlf input" is really perfect: the documentation is unclear about what it does in all kinds of corner cases, like with files that have CRLF EOL format, such as DOS/Windows batch files, that are kept in the repository. In particular, if you commit changes in such a file, will Git convert its CRLF EOLs into Unix-style LF-only EOLs? If it will, that will (a) mark every line modified, and (b) cause the file to have Unix-style EOLs in the release tarballs, which will preclude the batch files from working correctly at least on some versions of DOS/Windows. Also, what if I create a new file with CRLF EOLs and commit it -- will it keep its EOL format? And what happens if I decide to deliberately convert a versioned file from Unix to DOS EOL format, or vice versa? I see no answers to these questions in the documentation, so even if today Git does work as I expect, it might change that in the future based on some judgment of the maintainers (especially since msysgit is maintained separately from the upstream Git, and includes quite a lot of code that is not in the upstream). Undocumented behavior is subject to change without notice. AFAICT, it already did in the not-so-far past. Therefore, I believe "core.autocrlf false" _is_ the right choice, as it can never do any harm by itself. It leaves the EOL issue to the user. And if you use Emacs as your text editor, editing will preserve the EOL format of existing files, so files that started as Unix EOL will stay that way. > I can understand why you don't want > git config --global core.autocrlf true > (namely if you work a lot with Cygwin / Unix tools). I don't use Cygwin, and native ports of Unix tools I use have no trouble with CRLF EOLs. Emacs certainly doesn't, and neither do GCC, Grep, Diffutils, Patch, etc. So Cygwin etc. is not the source of the problem that triggered this patch. Anyway, IMO "core.autocrlf true" is even worse than "input", especially with binary files and files with mixed EOLs. (Yes, I know about core.safecrlf, but I don't trust any program to be entirely "safe" when it futzes with the EOL format, and have enough gray hair to bear witness.) What triggered this patch was that I used git-merge-changelog with the msysgit port of Git for Windows, to cherry-pick a changeset from one branch to another. See this discussion on gdb-patches: https://sourceware.org/ml/gdb-patches/2014-01/msg00348.html and its 2 follow-ups for the details. As you might know, Emacs is on its way to switch to Git as its VCS. Emacs does have files with CRLF EOLs in its repository, does maintain at least 2 branches with cherry-picks/merges between them being routine, and some of its contributors (including myself) work on MS-Windows. In bzr, the current VCS used by Emacs, the default EOL handling is the equivalent of Git's "autocrlf false", so users are accustomed to this setup, and will most probably want to carry it to Git. It would be a pity to have this problem get in the way when we switch. I was lucky to discover this issue in time to have it resolved. > Given the above general rule, I believe that git-merge-changelog is > not the only program that outputs CR-LFs on Windows, for input that > had only LF as line terminator. Therefore setting core.autolf to 'true' > should be the better solution, compared to modifying git-merge-changelog. I hope the above convinced you otherwise. In a nutshell, I think the EOL issue is best left to the user; programs should not make any changes there, unless specifically instructed by the user. The proposed changes will cause git-merge-changelog behave like that. In general, it is my experience that preserving EOL format is always either exactly what the users want or at least the lesser evil. But all of this shifts the focus of the discussion from a concrete problem with git-merge-changelog to what we think should be the best setup of Git on Windows. I came here to talk about the former, not the latter. And even if you still think that setting autocrlf to "input" or "true" is a better choice on Windows, the changes I suggest will support that configuration as well, because git-merge-changelog will always preserve the EOL format, be it Unix or DOS/Windows. By contrast, the current code does _not_ support the "false" setting, which is a legitimate user choice advertised by the Git installer on Windows (the user is asked at installation time to select one of the 3 possible values) and by Git documentation. I don't think it would be good for git-merge-changelog to refuse to support one of the 3 possible settings, just because that setting is deemed "wrong" or suboptimal in some sense by someone. > > + p = strchr (buf, '\n'); > > + > > + if (p > buf) > > + { > > + if (p[-1] == '\r') > > + result = true; > > + } > > + } > > If, despite your expectations, the first line is longer than 255 bytes, > p will be NULL, and with a high probability on Linux/x86 (buf being > stack-allocated, it is at an address between 0x80000000 and 0xC0000000), > p > buf will evaluate to true, and the next statement will crash. In the MinGW32 build, the stack size is 2MB, which makes its top be very far from the MSB (I get addresses like 0x0028fe10). So this problem does not exist there. And on GNU/Linux, the code in question will not be compiled. But if you meant to say that if (p && p > buf) is safer, then I agree. (Obviously, if you are not going to admit these changes, this is a moot point, so I hope your mentioning that is a sign that the changes will be accepted.) Thanks.