Re: uuencode: multi-bytes char in remote file name contains bytes >0x80

Bruce Korb Tue, 05 Jul 2011 09:07:57 -0700

Hi Duhuanpeng,

On 07/05/11 06:44, 張叁 wrote:

Let me try to write something in English.
Please to correct my English. :-)


Eric is helping me in some i18n stuff for NTP, hopefully he can help
translate when things become confused.  Please include original
Chinese plus your English so he can detect miscommunication
(Thank you so much for helping, Eric!).

the problem is users using uuencode to uuencode a file, he may expect every
btye is ASCII in encodeed file, but when a NOT-ASCII file name apears,
the problem comes.


Either I am not understanding your response, or you misunderstood the points
raised by Bruno, Eli et al.  In any event, before going forward, one needs
to understand several things:

1. why not use pax (or some other standard utility) to create an archive
   that embeds the file name within it?  At that point, the archive
   can be uuencoded for transfer by email with no loss in file names.

2. Assuming that you want a localized file name for this archive file,
   you thus still want to encode the file name for transmission.
   To do this, you would use code like this:
      dst = malloc(2 * strlen(p) + 1);
      while (*p) {
        if (*p == '/') // if I am not mistaken, '/' is always a '/' char
          *(dst++) = '/'
        else
          {
             sprintf(dst, "%02X", (unsigned)*p);
             dst += 2;
          }
        p++;
      }
      *dst = '\0';

3. Any uuencode-ed file with an encoded file name in it would need to
   be marked so that uudecode could cope (translate the encoded name).
   This format change should be compatible with POSIX specifications
   for the uuencode output.  e.g. a preamble to the "begin"
   line and not be part of that begin line?  Maybe a prefix line:
      puts("encoded-file-name\n");
   Eric Blake would be a better person for suggesting ways to "extend"
   the POSIX format.  If this is worth the bother, then adding options
   after the file name on the begin line would surely be "more convenient"....

4. uudecode needs matching changes, to detect the encoded file name.

5. your patch is still based on very, very old code.
   Please base it on current code:
   http://ftp.gnu.org/gnu/sharutils/sharutils-4.11.1.tar.gz

Let's focus on this question before further discuss.

Q: Is it necessary to do this:
     add a option, make uuencode supports  the file name encoding.

I also post my code here, but it's still buggy.
1. strlen may be wrong to count how many bytes in argv[optind].


It is the correct count, but output buffer should be allocated.
Regards, Bruce

2011/7/5 Bruno Haible <br...@clisp.org <mailto:br...@clisp.org>>

    Eli,

     > > An obvious problem with the patch is that it considers a file name to 
be a
     > > byte sequence. But different users may work in different locales, with
     > > different encodings.

    And users want to see the original filenames. Users don't want to see 
mojibake,
    that is, a mix of garbled characters (see attached screenshot).

     > Doesn't the same problem exist with the file's data itself?

    No, there is normally no problem with the contents of the files, because 
users
    have learned to use file formats that are independent of locale. When users
    send images (.jpeg or .png), text documents (.html, .odt, .rtf, even .doc),
    presentations (.pdf, .odp), etc. they have no problem. And those few users 
who
    receive plain text (.txt) files have the option to change the character
    encoding in the browser they use to view the text file (in mozilla: via the
    View > Encoding menu).

    But when uudecode has created files with garbled names on the receiver's 
disk,
    there is no program which will magically fix it.

     > IMO, it's not uuencode's problem to solve.  The correspondents need to
     > solve it "by some other means" (TM), for file data as for its name.

    No, communication that matches users' reasonable expectations does not
    work like this.

Re: uuencode: multi-bytes char in remote file name contains bytes >0x80

Reply via email to