On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf <d...@daniel.shahaf.name> wrote: > CC += dev@, and let me point you to our patch submission guidelines: > http://subversion.apache.org/docs/community-guide/general.html#patches > > Summary for dev@: allow svnsync to translate non-UTF-8 log messages to UTF-8. > > (more below) > > Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700: >> On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien <dtrebb...@gmail.com> wrote: >> > I think that a call to `svn_subst_translate_string` >> > (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs to be >> > added in the `svnsync_normalize_revprops` function when `propname` is >> > "svn:log". >> >> After applying the following patch to `subversion/svnsync/sync.c`, I >> can confirm that the "svnsync: Error while replaying commit" error >> disappears, allowing the sync to complete, and that the problem is >> indeed a log message encoding issue: >> diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c >> index 8f7be9e..c7a5e38 100644 >> --- a/subversion/svnsync/sync.c >> +++ b/subversion/svnsync/sync.c >> @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t *rev_props, >> /* And count this. */ >> (*normalized_count)++; >> } >> + >> + if (strcmp(propname, "svn:log") == 0) >> + { > > Should this use svn_prop_needs_translation()?
Though it does not appear so, the added lines are within the check for `svn_prop_needs_translation`. >> + svn_string_t *new_propval; >> + SVN_ERR(svn_subst_translate_string(&new_propval, propval, >> "ISO-8859-1", pool)); >> + apr_hash_set(rev_props, propname, APR_HASH_KEY_STRING, >> propval = new_propval); > > Please, please, please, no assignment inside a function call. > ^^^^^^^^^ Fixed >> + (*normalized_count)++; >> + } >> } >> } >> return SVN_NO_ERROR; >> >> >> Here I hard-coded the encoding, but I think that the encoding to use >> should be retrieved from the Subversion config file. Now the questions >> are: >> >> 1. What is the best way to pass the `log-encoding` option of the >> [miscellany] section to the `svnsync_normalize_revprops` function? >> > > What is 'log-encoding' normally used for? Only for recoding the commit > message supplied by an editor, right? So I'm not sure we should use it > here, perhaps a new command-line option will be better. (svnsync > $subcommand --source-encoding=%s) I like the idea of adding a command line option, so I think that this is the way to go. Do other properties need to be re-encoded, potentially? I was only thinking about `svn:log`, but perhaps other properties might need re-encoding as well. If not, then I think that the command line option should be more specific (e.g.: `--source-log-encoding=%s`). > And either way, the default behaviour should be to translate nothing. > (If you always translate from latin1, you break syncs for people who, > correctly, used UTF-8 log messages the entire time.) I agree. The default behavior should definitely be to not re-encode the log messages. >> 2. Should `svnsync` always store the log messages in UTF-8 even though >> the original repository log message encoding is different? >> > > You have no choice on the matter: the RA API instructs you to supply it > a UTF-8 log message. (IIRC, even the libsvn_client API expects an > already-UTF-8 message.) > >> 3. Should there be separate configuration options for the log message >> encoding of the source repository and the log message encoding of the >> destination repository? > > No, see (2). > Another question: which line of code raises the "svnsync: Error while replaying commit" error message? I would like to try to make this more helpful.