Hi Alberto, Thank you for your quick reply ! I'll apply the same patch in order to spare the same confusion from other people in the future might they bump into the same code :)
Cheers, Amit On Fri, Jul 20, 2012 at 10:15 PM, Alberto Massari < [email protected]> wrote: > Hi Amit, > thanks for spotting this; even if the code was not going to crash (as the > first character after the authority is still in the allocated memory and > must be a / or a NULL, and both would have failed the isHex test, avoiding > that the second isHex would access memory after the end of the string) I > added the extra check to make it clear. > > Thanks, > Alberto > > Il 19/07/2012 23:00, Amit K ha scritto: > >> Hi, >> >> >> I am using the C++ distribution of xerces 2.7, the implementation of >> the function in question in it is: >> >> >> bool XMLUri::**isValidRegistryBasedAuthority(**const XMLCh* const >> authority, >> >> const int authLen) >> { >> // check authority int index = 0; >> while (index < authLen) >> { >> if (isUnreservedCharacter(**authority[index]) || >> (XMLString::indexOf(REG_NAME_**CHARACTERS, >> authority[index]) != -1)) >> { >> index++; >> } >> else if (authority[index] == chPercent) // '%' >> { >> *if (XMLString::isHex(authority[**index+1]) && // 1st >> >> hex XMLString::isHex(authority[**index+2]) ) // 2nd >> hex index +=3;* >> >> else >> return false; >> } >> else >> return false; >> } //while >> return true; >> } >> >> >> I've boldened the lines which I want to further discuss here. >> >> and also note that I've seen that the implementation is the same in >> the latest version of xerces. >> >> However, I noticed that the Java implementation is different in >> relation to the same lines: >> >> /** >> + * Determines whether the given string is a registry based >> authority. >> + * >> + * @param authority the authority component of a URI >> + * >> + * @return true if the given string is a registry based authority >> + */ >> + private boolean isValidRegistryBasedAuthority(**String authority) { >> + int index = 0; >> + int end = authority.length(); >> + char testChar; >> + >> + while (index < end) { >> + testChar = authority.charAt(index); >> + >> + // check for valid escape sequence >> + if (testChar == '%') { >> + *if (index+2 >= end || >> + !isHex(authority.charAt(index+**1)) || >> + !isHex(authority.charAt(index+**2))) {* >> >> + return false; >> + } >> + index += 2; >> + } >> + // can check against path characters because the set >> + // is the same except for '/' which we've already excluded. >> + else if (!isPathCharacter(testChar)) { >> + return false; >> + } >> + ++index; >> + } >> + return true; >> + } >> >> >> The important difference is of course, the bounds check on the string >> / character array. Why is it omitted in the C++ version ? >> >> I thought it might be because the string can be assume to be null >> terminated in the C++ version.. but I can't be sure whether it's just >> a bug or not. >> >> I would thank your reply >> >> >> Sincerely, >> >> Amit >> . >> >> >
