[ 
http://jira.codehaus.org/browse/DOXIA-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herve Boutemy closed DOXIA-278.
-------------------------------

      Assignee: Herve Boutemy
    Resolution: Not A Bug

auto-detecting encoding isn't a bullet-proof feature: nobody can assure to 
really detect encoding of a byte stream, the better that can be done is a 
guess, without any guarantee

XML encoding selection is possible because encoding is written into the XML 
document in a [precise 
manner|http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing]: here, we have 
automatic encoding *selection*. If the stream effective encoding is different 
from the encoding in {{<? xml encoding="..."?>}}, there will be broken 
characters because the parser is using what is told is the header.

FYI, there has already been a [long discussion in Maven dev 
list|http://www.nabble.com/-VOTE--POM-Element-for-Source-File-Encoding-to16515820.html#a16558356]
 about this

APT format does not provide such a convention: it's pure text, without encoding 
information.
If a convention similar to the XML convention was added. 
bq. an APT file starting with {{~~ encoding="xxx"}} should be considered as 
being written in the specified encoding
we could implement a text reader using it.
Don't know if such a comment at APT file start is copmpatible with title 
headers though...


Last point: the user complaining about encoding problems you show was hitting a 
real bug, when encoding wasn't properly handled in Doxia and maven-site-plugin
this is fixed now: see MSITE-314 and [POM Element for Source File 
Encoding|http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding]

> Character encoding autodetection fails for APT source files
> -----------------------------------------------------------
>
>                 Key: DOXIA-278
>                 URL: http://jira.codehaus.org/browse/DOXIA-278
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Module - Apt
>    Affects Versions: 1.0-alpha-11
>         Environment: Mac OS X 10.5.6, Java 1.6.0_07
>            Reporter: Trevor Harmon
>            Assignee: Herve Boutemy
>         Attachments: HelloWorld.zip
>
>
> Doxia unnecessarily forces all APT source files to be encoded in ISO-8859-1. 
> Files encoded in UTF-8 can have garbage characters as a result. Doxia should 
> be able to autodetect the encoding of the APT file to prevent this problem, 
> as it already does for XML (see DOXIA-133).
> A test case is attached. It includes two APT source files, one encoded in 
> ISO-8859-1 and another encoded in UTF-8. Both contain the copyright symbol. 
> To reproduce the problem, simply run "mvn site" on the project and open the 
> target/site/test-utf8.html and target/site/test-iso-8859-1.html. The file 
> encoded with ISO-8859-1 should display the copyright symbol correctly, while 
> the one encoded with UTF-8 contains a garbage character immediately before 
> the symbol.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to