All XML validation in Qt is based on XML 1.0 (and not the newer 1.1 standard).
I found at least 3 places where validity is checked:

1. in qxmlstream.cpp:

Method resolveCharRef:

        //checks for validity
        ok &= (s == 0x9 || s == 0xa || s == 0xd || (s >= 0x20 && s <= 0xd7ff)
                || (s >= 0xe000 && s <= 0xfffd) || (s >= 0x10000 && s <= 
QChar::LastValidCodePoint));


Method scanUntil:

        //checks for invalidity
        if (c < 0x20 || (c > 0xFFFD && c < 0x10000) || c > 
QChar::LastValidCodePoint )


2. In qxmlutils.cpp:

bool QXmlUtils::isChar(const QChar c)
{
    return (c.unicode() >= 0x0020 && c.unicode() <= 0xD7FF)
           || c.unicode() == 0x0009
           || c.unicode() == 0x000A
           || c.unicode() == 0x000D
           || (c.unicode() >= 0xE000 && c.unicode() <= 0xFFFD);
}

It is pretty much the same as the above checks, except that it doesn't check 
for characters in the range 0x10000 - 0x10FFFF.
It think this is a bug, especially because the source is referring to the 
standard at http://www.w3.org/TR/REC-xml/#NT-Char, which says:

[2]   Char             ::=      #x9 | #xA | #xD | [#x20-#xD7FF] | 
[#xE000-#xFFFD] | [#x10000-#x10FFFF]  /* any Unicode character, excluding the 
surrogate blocks, FFFE, and FFFF. */


------------------------------------
Now, I have three questions:

1. Can someone confirm if the check in QXmlUtils is actually a bug?
2. Wouldn't it be better to move these checks to QChar, so that at least there 
is only one implementation?
3. Is there a reason to stick to XML1.0, or should Qt also implement the XML1.1 
standard?
According to the XML 1.1 standard (http://www.w3.org/TR/xml11/#charsets), 
allowed characters are:

[2]   Char             ::=   [#x1-#xD7FF] | [#xE000-#xFFFD] | 
[#x10000-#x10FFFF]     /* any Unicode character, excluding the surrogate 
blocks, FFFE, and FFFF. */
[2a]  RestrictedChar   ::=   [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | 
[#x86-#x9F]

So the allowed character range is a little bit extended (now includes all 
characters between 0x0001 and 0x0020). In addition, XML1.1 has defined some 
characters to be highly discouraged, but still valid.


_______________________________________________
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to