[
https://issues.apache.org/jira/browse/HADOOP-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724141#comment-13724141
]
Chris Nauroth commented on HADOOP-9801:
---------------------------------------
Thanks to [~daijy] for finding and reporting this bug via Hive testing on
Windows, where the default encoding is CP-1252.
> Configuration#writeXml uses platform defaulting encoding, which may mishandle
> multi-byte characters.
> ----------------------------------------------------------------------------------------------------
>
> Key: HADOOP-9801
> URL: https://issues.apache.org/jira/browse/HADOOP-9801
> Project: Hadoop Common
> Issue Type: Bug
> Components: conf
> Affects Versions: 3.0.0, 1-win, 2.1.0-beta, 1.3.0
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
>
> The overload of {{Configuration#writeXml}} that accepts an {{OutputStream}}
> does not set encoding explicitly, so it chooses the platform default
> encoding. Depending on the platform's default encoding, this can cause
> incorrect output data when encoding multi-byte characters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira