[ http://jira.codehaus.org/browse/DOXIA-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=133205#action_133205 ]
Lukas Theussl commented on DOXIA-239: ------------------------------------- Some links: [http://www.w3.org/TR/html4/struct/links.html#h-12.2.1] [http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars] I think encodeId() should replace non-ASCII characters according to the recommendation of the latter link above. > Handle non-ASCII characters in anchors and id's > ----------------------------------------------- > > Key: DOXIA-239 > URL: http://jira.codehaus.org/browse/DOXIA-239 > Project: Maven Doxia > Issue Type: Bug > Components: Core, Documentation, Modules, Sink API > Reporter: Lukas Theussl > > From DOXIA-236: > The javadoc for the method HtmlTools.encodeId() mentions the pattern > [A-Za-z][A-Za-z0-9:_.-]* for its output. To me, this looks like the term > "letter" in meant to refer to ASCII characters in this context. However, the > employed method Character.isLetter() will classify characters according to > the Unicode data file. For instance, the characters "ä" and "ß" are letters > in the Unicode sense. encodeId() will pass these through to its output, > violating the ASCII-only pattern stated in its javadoc. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira