Henri Sivonen wrote: > > So my proposal is: > > > > - For parsing: > > - If the first character is a '"', then the escaped syntax is > > in use. The filename is enclosed in "..."; inside, > > - occurrences of '"' and '%' are escaped as %22 and %25, > > respectively, > > - other ASCII characters may be escaped in %nn syntax as well, > > where nn is the hexadecimal notation (case insignificant) > > of the byte value in the ASCII encoding. > > - Otherwise, the filename ends at the first ':' or end of line. > > > The reason for suggesting quoting in the first place was allowing > absolute URIs as file names in the GNU error format. URIs already use > % for escaping, so making % special on the layer carrying the URI > would be very inconvenient, since it would break copy-pasteability and > human-readability of URIs.
An URI is always presented in escaped form. (RFC 2396, section 2.4.2) Also the characters that my proposal requires to be escaped, namely '"' and '%' and newline, are already required to be escaped in URIs. (RFC 2396, section 2.4.3: '"' and '%' are subsumed under 'delims' and therefore disallowed in [the escaped form of] URIs.) Therefore, when you deal with an URI, you should use a different algorithm of presentation within a GNU error message than when you deal with a filename. To make things precise, here are the four algorithms: * To embed a filename as a location in a GNU error message: - Determine whether to use the escaped syntax. This is required when the filename contains a ':' or newline, or starts with a '"'. The escaped syntax may also be used in other cases. - If the escaped syntax is used: - Determine which US-ASCII characters to escape. The characters '"', '%', newline must be escaped. Other US-ASCII characters may be escaped. (Non-ASCII characters should *not* be escaped, otherwise a character set identification would be needed for parsing. See also RFC 2396, section 2.1.) - Output a '"', then for each character in the filename: if the character is escaped, output it in %nn syntax, where nn is the hexadecimal representation of its ASCII code (upper or lower case does not matter). Finally output a '"'. - Otherwise, output the filename literally, unmodified. * To embed an URI or URL as a location in a GNU error message: - The URI or URL should not contain '"' or newline characters, since it is assumed to be already escaped according to RFC 2396. - Determine whether to use the escaped syntax. This is required when the URI or URL contains a ':'. The escaped syntax may also be used in other cases. - If the escaped syntax is used: Output a '"', then the URI or URL, then a '"'. - Otherwise, output the URI or URL literally, unmodified. * To parse a filename from a GNU error message: - Read a line. - If the line starts with '"': There must be a second '"' in the line. Take the substring from the first to the second '"' (exclusive). Every '%' in this substring must be followed by two hexadecimal digits. Replace every %nn sequence with the US-ASCII character with code nn. This yields the file name. Continue parsing after the second '"'. - Otherwise find the first ':' or, if not found, the end of line. The filename extends from the beginning of the line up to this point. * To parse an URI or URL from a GNU error message: - Read a line. - If the line starts with '"': There must be a second '"' in the line. Take the substring from the first to the second '"' (exclusive). This is the URI or URL. Continue parsing after the second '"'. - Otherwise find the first ':' or, if not found, the end of line. The URI or URL extends from the beginning of the line up to this point. Since URIs and URLs (in RFC 2396 escaped syntax) are either output literally or simply surrounded by double-quotes, copy-pasteability is guaranteed. Bruno