Hi Matt

This is definitely worth fixing, I've got a couple of comments below

On 02/09/2019 15:01, Matt R via GitGitGadget wrote:
From: Matt R <matt...@gmail.com>

The `label` todo command in interactive rebases creates temporary refs
in the `refs/rewritten/` namespace. These refs are stored as loose refs,
i.e. as files in `.git/refs/rewritten/`, therefore they have to conform
with file name limitations on the current filesystem.

This poses a problem in particular on NTFS/FAT, where e.g. the colon
character is not a valid part of a file name.

Being picking I'll point out that ':' is not a valid in refs either. Looking at https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file I think only " and | are not allowed on NTFS/FAT but are valid in refs (see the man page for git check-ref-format for all the details). So the main limitation is actually what git allows in refs.

Let's safeguard against this by replacing not only white-space
characters by dashes, but all non-alpha-numeric ones.

However, we exempt non-ASCII UTF-8 characters from that, as it should be
quite possible to reflect branch names such as `↯↯↯` in refs/file names.

Signed-off-by: Matthew Rogers <matt...@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de>
---
  sequencer.c              | 12 +++++++++++-
  t/t3430-rebase-merges.sh |  5 +++++
  2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 34ebf8ed94..23f4a0876a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4635,8 +4635,18 @@ static int make_script_with_merges(struct 
pretty_print_context *pp,
                else
                        strbuf_addbuf(&label, &oneline);
+ /*
+                * Sanitize labels by replacing non-alpha-numeric characters
+                * (including white-space ones) by dashes, as they might be
+                * illegal in file names (and hence in ref names).
+                *
+                * Note that we retain non-ASCII UTF-8 characters (identified
+                * via the most significant bit). They should be all acceptable
+                * in file names. We do not validate the UTF-8 here, that's not
+                * the job of this function.
+                */
                for (p1 = label.buf; *p1; p1++)
-                       if (isspace(*p1))
+                       if (!(*p1 & 0x80) && !isalnum(*p1))
                                *(char *)p1 = '-';

I'm sightly concerned that this opens the possibility for unexpected effects if two different labels get sanitized to the same string. I suspect it's unlikely to happen in practice but doing something like percent encoding non-alphanumeric characters would avoid the problem entirely.

Best Wishes

Phillip

                strbuf_reset(&buf);
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 7b6c4847ad..737396f944 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -441,4 +441,9 @@ test_expect_success '--continue after resolving conflicts 
after a merge' '
        test_path_is_missing .git/MERGE_HEAD
  '
+test_expect_success '--rebase-merges with commit that can generate bad characters for filename' '
+       git checkout -b colon-in-label E &&
+       git merge -m "colon: this should work" G &&
+       git rebase --rebase-merges --force-rebase E
+'
  test_done

Reply via email to