On 07/05/2011 08:40 AM, Bruce Korb wrote: > 2. Assuming that you want a localized file name for this archive file, > you thus still want to encode the file name for transmission. > To do this, you would use code like this: > dst = malloc(2 * strlen(p) + 1); > while (*p) { > if (*p == '/') // if I am not mistaken, '/' is always a '/' char
The next version of POSIX will be enforcing that '/' and '.' are unambiguous across all POSIX encodings supported by all locales on a system (it was a happy accident that no POSIX system has attempted to do otherwise), as well as further clarifying that yes, filenames are not necessarily character strings in all locales, unless those filenames are drawn solely from the portable filename character set. See http://austingroupbugs.net/view.php?id=291 There are, however, some non-POSIX encodings where '/' can appear as the second byte in a shift-state sequence encoder (ISO-2022-JP-2), although they are rare in practice these days. Also, if you worry about systems where backslash is a directory separator, there are encodings such as Shift_JIS where '\\' can appear as a second byte within a multi-byte character (hence, '\\' is ambiguous, even though '/' is not). > 3. Any uuencode-ed file with an encoded file name in it would need to > be marked so that uudecode could cope (translate the encoded name). > This format change should be compatible with POSIX specifications > for the uuencode output. e.g. a preamble to the "begin" > line and not be part of that begin line? Maybe a prefix line: > puts("encoded-file-name\n"); > Eric Blake would be a better person for suggesting ways to "extend" > the POSIX format. If this is worth the bother, then adding options > after the file name on the begin line would surely be "more > convenient".... I'm not quite sure what you are asking me to do here. Maybe it helps to read the current POSIX requirements on uuencode output: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/uuencode.html Note this statement: "The standard output shall be a text file" but if filename is _not_ a character string in the current locale, then the output would _not_ be a text file (among other things, a text file has the property that at least one locale can interpret every byte sequence in the file as valid characters). At which point, we are no longer constrained by POSIX, and can arguably do whatever we want! That is, supporting file names that consist of characters outside of the portable file name character set (a-z, A-Z, 0-9, ., _, /, and -) is already outside the realm of what POSIX requires uuencode to support, and it would be just as reasonable for uuencode to refuse to operate on such file names as it would be for uuencode to emit some sort of header that tells uudecode how to try and decode a string back into characters appropriate for the current locale. >> 1. strlen may be wrong to count how many bytes in argv[optind]. No, strlen is _always_ the way to count how many bytes are in an element of argv, since each argv entry is always a NUL-terminated sequence of bytes (that might also, but are not required to, have meaning when interpreted as multi-byte characters under the current locale). -- Eric Blake ebl...@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature