[jira] [Commented] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered

Aaron Digulla (Jira) Wed, 15 Jul 2020 05:47:22 -0700


    [ 
https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158128#comment-17158128
 ]


Aaron Digulla commented on MRESOURCES-171:
------------------------------------------

Short discussion regarding the default value:

project.build.sourceEncoding:

Pro: It's not a breaking change.

Con: 99% of all Java developers are not aware that the problem even exists. 
Many are US developers who don't care about characters outside the ASCII 
charset, so they're not affected. This would mean that most builds will stay 
broken without anyone noticing. Only when translations into other languages are 
added, weird things will happen and people will be confused.

ISO-8859-1:

Pro: That's what it should have been all along.

ISO-8859-1 can process UTF-8 unchanged since the encoding is binary stable 
(every byte of input maps to the same byte of output). So while a human would 
see those UTF-8 sequences for umlauts and special characters, the computer 
doesn't care. This can only fail when people use resource filtering and try to 
replace a variable with a System property with special characters. Pure ASCII 
replacements still work. That's the only corner case where we get the dreaded 
UTF-8 sequence unrolling (where you start to see those Ã characters).

Con: There is a chance that builds will break if people added the wrong 
workaround to fix the issue. One fix would be the complex config above. As far 
as I can tell, the fix above is compatible with ISO-8859-1 as default. It can 
get messy when people have changed the loading code to use UTF-8.

That being said, if you would chose the default to stay UTF-8, projects would 
silently fail for a long time without anyone noticing. I think this is bad. 
When something is broken, it should blow up in a way that people can see and do 
something about it.

So as I see it, using the correct default (as Java defines it) will break a 
small number of builds but the fix is easy: Remove all workarounds.

What I would like is a warning or error when you're affected. Maybe we should 
check for characters with codePoint >= 128 && check whether resource filtering 
is enabled and print a warning?

> ISO8859-1 properties files get changed into UTF-8 when filtered
> ---------------------------------------------------------------
>
>                 Key: MRESOURCES-171
>                 URL: https://issues.apache.org/jira/browse/MRESOURCES-171
>             Project: Maven Resources Plugin
>          Issue Type: Bug
>          Components: filtering
>            Reporter: Alex Collins
>            Priority: Minor
>         Attachments: filtering-bug.zip
>
>
> Create:
> src/main/resources/test.properties
> And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \uXXXX 
> formatting.
> When adding this line:
> <resource><directory>src/main/resources</directory><filtering>true</filtering></resource>
> Expected:
> ISO8859-1 encoded file in jar.
> Actual:
> UTF-8 encoded file in jar.
> ---
> If there are any property files (which can only be ISO8859-1) they appear to 
> be converted into UTF-8 in the jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered

Reply via email to