On Fri, 14 Jan 2005 21:02:21 +0100, Marc Lehmann writes:
>   At least one of your files (...) contains a character (.) which
>   is not allowed inside an IMG SRC tag.  See the official URI syntax specs at:
>   http://http://www.ietf.org/rfc/rfc2396.txt
>   section 2.4.3.  URIs may not contain delimiters such as <, >, #, %, \" or 
>   white space.  iGal can rename all your files to suppress or replace these
>   characters.
>
>The author would be well-advised to actually read the rfc that is being
>referenced, as the very same rfc explains how to encode unsafe
>characters.

your snooty report offends. your holier-than-thou affection *might* be
acceptable if you had provided a patch for the problem. i see no patch here.

>The message
>is confusing because a program limitation (igal cannot correctly encode uris)
>is misinterpreted as a principal limitation. 

yes and no. for ascii-only urls, you're right: #-encoding things
like <, >, # etc. is safe. 
i'll add code for escaping these to igal.

but if your url was iso-8859-1 and outside ascii (eg. äöüß etc), 
then you're wrong: there is a fundamental limitation making all 
character encodings in urls a tricky endeavour, as urls don't transport 
their own character encoding information. 

http://www.w3.org/TR/html40/appendix/notes.html#h-B.2.1 *suggests*
conversion to utf-8 and then a % encoding for urls, and mentions the older 
practice of using just iso-8859-1 and its %-encoding. 

not all webservers distinguish properly between these 
two cases and the mechanism also depends on the web server filesystem (whether
it wants to see iso-8859-1 filenames or whether unicode is expected).

>It is true that IMG SRC cannot contain spaces (For example), but this
>does in no way mean that image filenames were at fault.  

this is silly. the image filename is unrepresentable -> the filename poses
the problem. igal confronts you with a problem report.

"blödian.jpg" can be represented as "bl%F6ian.jpg" or "bl%C3%B6dian.jpg".

both are legal, both are possible, both have been or are in use out there,
either of them will work or fail in a specific situation. for example,
apache on my debian box likes the first and doesn't grok the latter.

so, which of two evils do you want igal to choose? 

i think that suggesting the safe course (ie. to avoid the charset trouble)
to the user is actually a reasonable approach. 

having said that, i'll think about it a bit more and maybe add 
both common encodings to the list of choices.

regards
az


-- 
+ Alexander Zangerl + DSA 42BD645D + (RSA 5B586291)
Rex is to Regina as Vax is to... -- Vadim Vygonets

Attachment: pgpCYdhspFuSx.pgp
Description: PGP signature

Reply via email to