[
https://issues.apache.org/jira/browse/HADOOP-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726054#comment-13726054
]
Chris Nauroth commented on HADOOP-9801:
---------------------------------------
bq. For the unit test(testMultiByteCharacters) to test the issue even when
running on Linux, we may want to set the default character to non-utf8 in the
test.
Thanks, Brandon! This would be a good idea, but unfortunately, the
documentation I've read indicates that the default character set gets read at
JVM launch time and cached within relevant JDK classes that make use of a
character set. You can change it by passing JVM arguments at process launch
time, but I don't see a way to change it programmatically within a running JVM.
Do you have any other ideas on how to do this? If not, then I think we'll have
to go with the patch as is.
> Configuration#writeXml uses platform defaulting encoding, which may mishandle
> multi-byte characters.
> ----------------------------------------------------------------------------------------------------
>
> Key: HADOOP-9801
> URL: https://issues.apache.org/jira/browse/HADOOP-9801
> Project: Hadoop Common
> Issue Type: Bug
> Components: conf
> Affects Versions: 3.0.0, 1-win, 1.3.0, 2.1.1-beta
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-9801-branch-1.1.patch, HADOOP-9801-trunk.1.patch
>
>
> The overload of {{Configuration#writeXml}} that accepts an {{OutputStream}}
> does not set encoding explicitly, so it chooses the platform default
> encoding. Depending on the platform's default encoding, this can cause
> incorrect output data when encoding multi-byte characters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira