Re: special characters in filenames in error messages

Bruno Haible Sat, 06 Dec 2008 07:41:19 -0800

Henri Sivonen wrote:
> >   So my proposal is:
> >
> >   - For parsing:
> >     - If the first character is a '"', then the escaped syntax is
> >       in use. The filename is enclosed in "..."; inside,
> >         - occurrences of '"' and '%' are escaped as %22 and %25,
> >           respectively,
> >         - other ASCII characters may be escaped in %nn syntax as well,
> >           where nn is the hexadecimal notation (case insignificant)
> >           of the byte value in the ASCII encoding.
> >     - Otherwise, the filename ends at the first ':' or end of line.
> 
> 
> The reason for suggesting quoting in the first place was allowing  
> absolute URIs as file names in the GNU error format. URIs already use  
> % for escaping, so making % special on the layer carrying the URI  
> would be very inconvenient, since it would break copy-pasteability and  
> human-readability of URIs.


An URI is always presented in escaped form. (RFC 2396, section 2.4.2)
Also the characters that my proposal requires to be escaped, namely
'"' and '%' and newline, are already required to be escaped in URIs.
(RFC 2396, section 2.4.3: '"' and '%' are subsumed under 'delims'
and therefore disallowed in [the escaped form of] URIs.)

Therefore, when you deal with an URI, you should use a different
algorithm of presentation within a GNU error message than when you
deal with a filename.

To make things precise, here are the four algorithms:

* To embed a filename as a location in a GNU error message:
  - Determine whether to use the escaped syntax. This is required when
    the filename contains a ':' or newline, or starts with a '"'.
    The escaped syntax may also be used in other cases.
  - If the escaped syntax is used:
    - Determine which US-ASCII characters to escape. The characters
      '"', '%', newline must be escaped. Other US-ASCII characters may
      be escaped. (Non-ASCII characters should *not* be escaped,
      otherwise a character set identification would be needed for
      parsing. See also RFC 2396, section 2.1.)
    - Output a '"', then for each character in the filename: if the
      character is escaped, output it in %nn syntax, where nn is the
      hexadecimal representation of its ASCII code (upper or lower case
      does not matter). Finally output a '"'.
  - Otherwise, output the filename literally, unmodified.

* To embed an URI or URL as a location in a GNU error message:
  - The URI or URL should not contain '"' or newline characters, since it
    is assumed to be already escaped according to RFC 2396.
  - Determine whether to use the escaped syntax. This is required when
    the URI or URL contains a ':'. The escaped syntax may also be used in
    other cases.
  - If the escaped syntax is used: Output a '"', then the URI or URL,
    then a '"'.
  - Otherwise, output the URI or URL literally, unmodified.

* To parse a filename from a GNU error message:
  - Read a line.
  - If the line starts with '"': There must be a second '"' in the line.
    Take the substring from the first to the second '"' (exclusive).
    Every '%' in this substring must be followed by two hexadecimal digits.
    Replace every %nn sequence with the US-ASCII character with code nn.
    This yields the file name. Continue parsing after the second '"'.
  - Otherwise find the first ':' or, if not found, the end of line. The
    filename extends from the beginning of the line up to this point.

* To parse an URI or URL from a GNU error message:
  - Read a line.
  - If the line starts with '"': There must be a second '"' in the line.
    Take the substring from the first to the second '"' (exclusive).
    This is the URI or URL. Continue parsing after the second '"'.
  - Otherwise find the first ':' or, if not found, the end of line. The
    URI or URL extends from the beginning of the line up to this point.

Since URIs and URLs (in RFC 2396 escaped syntax) are either output literally
or simply surrounded by double-quotes, copy-pasteability is guaranteed.

Bruno

Re: special characters in filenames in error messages

Reply via email to