Re: filenames in error messages

Micah Cowan Thu, 14 Feb 2008 16:36:33 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Karl Berry wrote:
>     | "C escapes" means to use the backslash character as escape character.
>     | This is a particularly bad choice, because - as you know - on some 
> systems,
>     | backslahes are used as directory separator.
> 
> That hardly seems an insurmountable objection to me, since
> (1) many such paths will not have any special characters (such as : or a
> control character), and therefore will not be quoted, and therefore the
> \'s will just appear as-is, and
> (2) for those paths which are quoted, the \'s just get doubled. Big deal.
> 
> Using \ is soooooo conventional in these situations.  I'd find it very
> strange for the coding standards to recommend uri-style % escapes.  Of
> course it will suffice, any quoting mechanism will suffice, what is
> *natural* for compiler-like error messages in our world?  \.  IMHO.


Heartily agreed.

Bruno Haible:
> The thing that you want to quote are filenames and URLs. Filenames and
> URLs are special cases of URIs. The syntax of URIs is defined in RFC 3986 [1].
> It uses the percent character as escape character.

I don't believe it's accurate to claim that _filenames_ are special
cases of URIs (RFC 3986 §2.5 refers to filenames as "local names", which
may need transformation in order to be represented as URIs). Certainly,
pathnames aren't, as URIs use forward-slashes, always, to separate URI
components (in which case we'd have no need to worry about backslashes
any longer).

> The use of URI syntax rather than backslash-escaping is also more 
> understandable
> to humans, because all users who use a browser eventually see a 
> percent-escaped
> URL in the main text field. Whereas backslash-escaping is known only to
> programmers, a small minority among the users.

I couldn't agree less. I believe there are many more people who will
grok something like "\"My\"\ File" than people who can make sense of
"%22My%22%20File", especially if they have some idea of what the naked
filename looked like.

C escapes have the distinct advantage over URI percent-encoding that
most escaped characters still give a direct expression of what the
original character was. " -> \". The only characters that end up hex- or
octal-encoded, are ones that wouldn't have been readable in the first
place (and even then, not all of those: NL -> \n, FF -> \f)

There's also the fact that URI percent-encoding is context-sensitive; in
 particular, %2F and / do not have the same meaning (the first would be
used for a literal / _within_ a path component; the latter to separate
path components); and a question mark within the path would need to be
encoded, whereas the same delimiting or existing within a query string
should not be.


All that being said, I think representing URLs and other URIs in
anything other than percent-encoding is begging for major confusion.
That _is_ the accepted encoding mechanism for encoding characters within
URIs. So routines that know that their arguments are URIs, ought to
percent-encode them (but then, of course, they should already _be_
properly encoded, else they're not valid URIs).

Routines that know only that it's a "filename", or that know nothing at
all, are much better off IMO to use some other quoting mechanism, and
c-style quoting seems an awfully good choice (in no small part because,
as Karl says, it's "soooooo conventional.").

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHtN6B7M8hyUobTrERAltpAJ4uste1DmgUw8GnJC0aIk8T5+w/SwCfX0uV
CyYXtt/F4JUq/Bye3ZvCflM=
=sptW
-----END PGP SIGNATURE-----

Re: filenames in error messages

Reply via email to