On Mon, May 13, 2019 at 04:17:24PM -0700, Elijah Newren wrote:
> When fast-export encounters a commit with an 'encoding' header, it tries
> to reencode in utf-8 and then drops the encoding header.  However, if it
> fails to reencode in utf-8 because e.g. one of the characters in the
> commit message was invalid in the old encoding, then we need to retain
> the original encoding or otherwise we lose information needed to
> understand all the other (valid) characters in the original commit
> message.

Minor question: "utf-8" or "UTF-8" ?
Mostly we use UTF-8 in Git.

>
> Signed-off-by: Elijah Newren <new...@gmail.com>
> ---
>  builtin/fast-export.c                        |  7 +++++--
>  t/t9350-fast-export.sh                       | 21 ++++++++++++++++++++
>  t/t9350/broken-iso-8859-7-commit-message.txt |  1 +
>  3 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 t/t9350/broken-iso-8859-7-commit-message.txt
>
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 9e283482ef..7734a9f5a5 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -642,9 +642,12 @@ static void handle_commit(struct commit *commit, struct 
> rev_info *rev,
>       printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum);
>       if (show_original_ids)
>               printf("original-oid %s\n", oid_to_hex(&commit->object.oid));
> -     printf("%.*s\n%.*s\ndata %u\n%s",
> +     printf("%.*s\n%.*s\n",
>              (int)(author_end - author), author,
> -            (int)(committer_end - committer), committer,
> +            (int)(committer_end - committer), committer);
> +     if (!reencoded && encoding)
> +             printf("encoding %s\n", encoding);
> +     printf("data %u\n%s",
>              (unsigned)(reencoded
>                         ? strlen(reencoded) : message
>                         ? strlen(message) : 0),
> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
> index c721026260..4fd637312a 100755
> --- a/t/t9350-fast-export.sh
> +++ b/t/t9350-fast-export.sh
> @@ -118,6 +118,27 @@ test_expect_success 'iso-8859-7' '
>                ! grep ^encoding actual)
>  '
>
> +test_expect_success 'encoding preserved if reencoding fails' '
> +
> +     test_when_finished "git reset --hard HEAD~1" &&
> +     test_config i18n.commitencoding iso-8859-7 &&
> +     echo rosten >file &&
> +     git commit -s -F 
> "$TEST_DIRECTORY/t9350/broken-iso-8859-7-commit-message.txt" file &&
> +     git fast-export wer^..wer >iso-8859-7.fi &&
> +     sed "s/wer/i18n-invalid/" iso-8859-7.fi |
> +             (cd new &&
> +              git fast-import &&
> +              git cat-file commit i18n-invalid >actual &&
> +              # Make sure the commit still has the encoding header
> +              grep ^encoding actual &&
> +              # Verify that the commit has the expected size; i.e.
> +              # that no bytes were re-encoded to a different encoding.
> +              test 252 -eq "$(git cat-file -s i18n-invalid)" &&
> +              # ...and check for the original special bytes
> +              grep $(printf "\360") actual &&
> +              grep $(printf "\377") actual)
> +'
> +
>  test_expect_success 'import/export-marks' '
>
>       git checkout -b marks master &&
> diff --git a/t/t9350/broken-iso-8859-7-commit-message.txt 
> b/t/t9350/broken-iso-8859-7-commit-message.txt
> new file mode 100644
> index 0000000000..d06ad75b44
> --- /dev/null
> +++ b/t/t9350/broken-iso-8859-7-commit-message.txt
> @@ -0,0 +1 @@
> +Pi: ?; Invalid: ?
> \ No newline at end of file
> --
> 2.21.0.782.gd8be4ee826
>

Reply via email to