[ http://jira.codehaus.org/browse/DOXIA-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herve Boutemy closed DOXIA-278. ------------------------------- Assignee: Herve Boutemy Resolution: Not A Bug auto-detecting encoding isn't a bullet-proof feature: nobody can assure to really detect encoding of a byte stream, the better that can be done is a guess, without any guarantee XML encoding selection is possible because encoding is written into the XML document in a [precise manner|http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing]: here, we have automatic encoding *selection*. If the stream effective encoding is different from the encoding in {{<? xml encoding="..."?>}}, there will be broken characters because the parser is using what is told is the header. FYI, there has already been a [long discussion in Maven dev list|http://www.nabble.com/-VOTE--POM-Element-for-Source-File-Encoding-to16515820.html#a16558356] about this APT format does not provide such a convention: it's pure text, without encoding information. If a convention similar to the XML convention was added. bq. an APT file starting with {{~~ encoding="xxx"}} should be considered as being written in the specified encoding we could implement a text reader using it. Don't know if such a comment at APT file start is copmpatible with title headers though... Last point: the user complaining about encoding problems you show was hitting a real bug, when encoding wasn't properly handled in Doxia and maven-site-plugin this is fixed now: see MSITE-314 and [POM Element for Source File Encoding|http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding] > Character encoding autodetection fails for APT source files > ----------------------------------------------------------- > > Key: DOXIA-278 > URL: http://jira.codehaus.org/browse/DOXIA-278 > Project: Maven Doxia > Issue Type: Bug > Components: Module - Apt > Affects Versions: 1.0-alpha-11 > Environment: Mac OS X 10.5.6, Java 1.6.0_07 > Reporter: Trevor Harmon > Assignee: Herve Boutemy > Attachments: HelloWorld.zip > > > Doxia unnecessarily forces all APT source files to be encoded in ISO-8859-1. > Files encoded in UTF-8 can have garbage characters as a result. Doxia should > be able to autodetect the encoding of the APT file to prevent this problem, > as it already does for XML (see DOXIA-133). > A test case is attached. It includes two APT source files, one encoded in > ISO-8859-1 and another encoded in UTF-8. Both contain the copyright symbol. > To reproduce the problem, simply run "mvn site" on the project and open the > target/site/test-utf8.html and target/site/test-iso-8859-1.html. The file > encoded with ISO-8859-1 should display the copyright symbol correctly, while > the one encoded with UTF-8 contains a garbage character immediately before > the symbol. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira