Sam Varshavchik <[email protected]> writes: > char *p=strdup("example.com\xe3"); > err=idna_to_unicode_8z8z(p, &utf8_ptr, 0); ... > When g_utf8_next_char() gets 0xe3, this loop will merrily skip over > the trailing \0 in the C string, and off it goes, into merry-land.
Right. > Yes, idna_to_unicode_8z8z() is documented as taking valid UTF-8 for > input. But, is it unreasonable for me to take an address from an > E-mail header, and feed it to idna_to_unicode_8z8z(), without having > to validate it for properly UTF-8ness? It has to be validated for proper UTF-8-ness. The UTF-8 functions in libidn (copied from glib) assume valid UTF-8 strings. I agree it is way too easy to end up using libidn the way you did. I'm split between improving documentation to explain the issue or add input sanitization to all libidn functions accepting UTF-8 data. I know IDNA operations is a performance bottleneck in some environments, and validating UTF-8 takes some CPU time. But probably not that much though... /Simon _______________________________________________ Help-libidn mailing list [email protected] https://lists.gnu.org/mailman/listinfo/help-libidn
