Andrius Velykis created DOXIA-480:
-------------------------------------

             Summary: XhtmlBaseParser ignores XHTML default entities
                 Key: DOXIA-480
                 URL: https://jira.codehaus.org/browse/DOXIA-480
             Project: Maven Doxia
          Issue Type: Bug
          Components: Core, Module - Xhtml
    Affects Versions: 1.4
            Reporter: Andrius Velykis
         Attachments: doxia-core-XhtmlBaseParser.patch, 
doxia-xhtml-entities-bug.zip

XHTML defines a number of default entities that can appear in valid XHTML files 
(http://www.w3.org/TR/xhtml1/#h-A2), such as left/right quotes: “, 
’, and many others.

XhtmlBaseParser, however, ignores XHTML default entities appearing in the 
source code. This is because it delegates the parsing to AbstractXmlParser, 
which uses vanilla MXParser to parse. MXParser only recognises default XML 
entities.

Because the HTML entities are not resolved by the XML parser, and thus by the 
XHTML parser, they are not rendered by the XHTML module. I have attached a 
sample project for Maven site that uses XHTML module. The source file has 
double/single quotes, however the output file does not.

This also affects other parsers that extend XhtmlParser, e.g. MarkdownParser 
(see DOXIA-473 for a reported bug). This is because Pegdown library, used to 
parse Markdown, generates “ for quotes and other entities.

I have attached a patch that fixes this problem. It exposes the XmlPullParser 
(MXParser) for configuration before parsing, so that extending classes could 
define default entities. Then XhtmlBaseParser adds default XHTML entities to 
the parser. This patch will also fix DOXIA-473, because MarkdownParser extends 
XhtmlParser.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to