[ 
https://issues.apache.org/jira/browse/XERCESC-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790612#comment-16790612
 ] 

Scott Cantor commented on XERCESC-2016:
---------------------------------------

I guess there is odd language in 5th edition that actually changes the version 
production to allow 1.[0-9] but this change wasn't applied consistently to the 
DOM, if that's even allowed or correct to do. So that's the genesis of the 
change. It's possible the SAX parser was modified to allow this but the DOM 
parser was not.

> XML 1.0 5th edition support
> ---------------------------
>
>                 Key: XERCESC-2016
>                 URL: https://issues.apache.org/jira/browse/XERCESC-2016
>             Project: Xerces-C++
>          Issue Type: Improvement
>          Components: Non-Validating Parser
>         Environment: All
>            Reporter: Rob Cameron
>            Assignee: Alberto Massari
>            Priority: Major
>             Fix For: 3.2.0
>
>         Attachments: diff5e
>
>
> Xerces-C currently applies XML 1.0 4th edition rules to name characters
> in XML 1.0 documents.    XML 1.0 5th edition permits a broader class
> of name characters, based on those permitted in XML 1.1.
> Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0
> 5th edition.
> Although our main work is with icXML, we've looked at making this change
> in Xerces-C original code base so that icXML support for XML 1.0 5e is
> compatible with us.
> I'm not entirely sure that I've handled everything, but the following change
> works in our test.  The change plan is below and a svn diff file is
> attached.
> Here is the change plan.
> ----------------------------------
> (1)  internal/CharTypeTables.hpp
> Rename gFirstNameChars1_1 to be gFirstNameChars
> Rename gNameChars1_1 to be gNameChars
> (2) util/XMLChar.cpp
> (2a)
>    Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars
>    Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1()
>      to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask.
>     //
>     //  Name characters are special. A name is made up of a number of
>     //  different tables and some special case characters.
>     //
>     initOneTable(gNameChars, gNameCharMask);
>     //
>     //  Name characters are special. A name is made up of a number of
>     //  different tables and some special case characters.
>     //
>     initOneTable(gNameChars, gNCNameCharMask);
>     gTmpCharTable[chColon] &= ~gNCNameCharMask;
>     //
>     //  Then do the first name char
>     //
>     initOneTable(gFirstNameChars, gFirstNameCharMask);
> (2b) #define NEED_TO_GEN_TABLE
> compile and do a sample run of a Xerces app, generate table.out
> (2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp
> with that from table.out.
> (3) XMLChar.hpp
>     Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar,
> XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar
>     to each check for and allow characters in the #x10000-#xEFFFF range
>     else {
>         if ((toCheck >= 0xD800) && (toCheck <= 0xDB7F))
>            if ((toCheck2 >= 0xDC00) && (toCheck2 <= 0xDFFF))
>                return true;
>     }
> (4)  Modify XMLReader::getName and XMLReader::getNCName
>        to allow surrogate pairs in Names and NCNames
>        (i.e., use the version 1.1 logic for both 1.0 and 1.1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to