[
https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158128#comment-17158128
]
Aaron Digulla commented on MRESOURCES-171:
------------------------------------------
Short discussion regarding the default value:
project.build.sourceEncoding:
Pro: It's not a breaking change.
Con: 99% of all Java developers are not aware that the problem even exists.
Many are US developers who don't care about characters outside the ASCII
charset, so they're not affected. This would mean that most builds will stay
broken without anyone noticing. Only when translations into other languages are
added, weird things will happen and people will be confused.
ISO-8859-1:
Pro: That's what it should have been all along.
ISO-8859-1 can process UTF-8 unchanged since the encoding is binary stable
(every byte of input maps to the same byte of output). So while a human would
see those UTF-8 sequences for umlauts and special characters, the computer
doesn't care. This can only fail when people use resource filtering and try to
replace a variable with a System property with special characters. Pure ASCII
replacements still work. That's the only corner case where we get the dreaded
UTF-8 sequence unrolling (where you start to see those à characters).
Con: There is a chance that builds will break if people added the wrong
workaround to fix the issue. One fix would be the complex config above. As far
as I can tell, the fix above is compatible with ISO-8859-1 as default. It can
get messy when people have changed the loading code to use UTF-8.
That being said, if you would chose the default to stay UTF-8, projects would
silently fail for a long time without anyone noticing. I think this is bad.
When something is broken, it should blow up in a way that people can see and do
something about it.
So as I see it, using the correct default (as Java defines it) will break a
small number of builds but the fix is easy: Remove all workarounds.
What I would like is a warning or error when you're affected. Maybe we should
check for characters with codePoint >= 128 && check whether resource filtering
is enabled and print a warning?
> ISO8859-1 properties files get changed into UTF-8 when filtered
> ---------------------------------------------------------------
>
> Key: MRESOURCES-171
> URL: https://issues.apache.org/jira/browse/MRESOURCES-171
> Project: Maven Resources Plugin
> Issue Type: Bug
> Components: filtering
> Reporter: Alex Collins
> Priority: Minor
> Attachments: filtering-bug.zip
>
>
> Create:
> src/main/resources/test.properties
> And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \uXXXX
> formatting.
> When adding this line:
> <resource><directory>src/main/resources</directory><filtering>true</filtering></resource>
> Expected:
> ISO8859-1 encoded file in jar.
> Actual:
> UTF-8 encoded file in jar.
> ---
> If there are any property files (which can only be ISO8859-1) they appear to
> be converted into UTF-8 in the jar.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)