[ 
https://jira.codehaus.org/browse/MASSEMBLY-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312643#comment-312643
 ] 

Dennis Lundberg commented on MASSEMBLY-371:
-------------------------------------------

More fixes in 
[r1403897|http://svn.apache.org/viewvc?view=revision&revision=1403897].
                
> Converting line endings corrupts ISO-8859-1 files when platform encoding is 
> UTF-8
> ---------------------------------------------------------------------------------
>
>                 Key: MASSEMBLY-371
>                 URL: https://jira.codehaus.org/browse/MASSEMBLY-371
>             Project: Maven 2.x Assembly Plugin
>          Issue Type: Bug
>    Affects Versions: 2.2-beta-2, 2.2
>         Environment: Linux with platform encoding set to UTF-8
>            Reporter: Håvard Wigtil
>            Assignee: Dennis Lundberg
>             Fix For: 2.4
>
>         Attachments: assembly-encoding.zip
>
>
> Converting line endings for a text file encoded in ISO-8859-1 replaces any 
> character in the set above ASCII with the three characters ᅵ.
> What happens is that the file to be converted is read as text in the platform 
> encoding (seems to be method readFile in class FileFormatter), and when the 
> platform encoding is UTF-8, any non-ASCII character from ISO-8859-1 is 
> converted to the UTF-8 character "�" (i.e. the placeholder for unknown 
> / broken character). 
> I've attached a small sample project that shows this problem on Linux with 
> platform encoding set to UTF-8.
> I see two possible fixes for this, one is to read the file as bytes and do a 
> search /replace for line endings, and the other is to be able to specify 
> encoding for a fileset or file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to