I was assuming urllib.quote/unquote would only be called on text
intended to be used in non-hostname portions of the URIs. I'm not sure
if this is the actual intent of urllib.quote and perhaps the
documentation should be updated to specify what precisely it does and
then peopel can decide w
> If this is indeed the case, it sounds perfectly legal (according to the
> RFC) and perfectly practical (as required by numerous popular websites)
> to have urllib.quote and urllib.quote_plus do an automatic UTF-8
> encoding of unicode strings before percent encoding them.
It's probably legal, bu
> Maybe I didn't understand the RFC quite right, but it seemed like how to
> handle hostnames was left as a choice between IDNA encoding the hostname
> or replacing the non-ascii characters with dashes? I guess in practice
> IDNA is the right decision.
I haven't fully understood it, either, but I
Maybe I didn't understand the RFC quite right, but it seemed like how
to handle hostnames was left as a choice between IDNA encoding the
hostname or replacing the non-ascii characters with dashes? I guess in
practice IDNA is the right decision.
Another part I wasn't clear on is whether urll
"Martin v. Löwis" wrote:
> The proper way to implement this would be IRIs (RFC 3987),
> in particular section 3.1. This is not as simple as just
> encoding it as UTF-8, as you might have to apply IDNA to
> the host part.
>
> Code doing so just hasn't been contributed yet.
But if someone wanted to
I may be missing something, but it seems that RFC 3987 (which is about
IRIs) basically says:
1) IRIs are identical to URIs except they may have unicode characters
in them
2) IRIs must be converted to URIs before being used in HTTP
3) The way to convert IRIs to URIs is to UTF-8 encode the uni
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf
> Of Jeroen Ruigrok van der Werven
> Sent: Wednesday, May 07, 2008 05:20
> To: Tom Pinckney
> Cc: python-dev@python.org
> Subject: Re: [Python-Dev] urllib unicode handling
>
Hi,
Jeroen Ruigrok van der Werven in-nomine.org> writes:
> Would people object if such functionality got added to urllib?
I would ;-) There are IRIs, just that nobody wrote a useful module for that.
There are algorithms in the RFC that can convert URIs to IRIs and the other way
round. IMO tha
-On [20080507 04:06], Tom Pinckney ([EMAIL PROTECTED]) wrote:
>While in theory UTF-8 is not a standard, sites like Last.fm, Facebook and
>Wikipedia seem to have embraced it (as have pretty much all other major web
>sites). As with HTML, there is what the standard says and what the actual
>browse
> Thanks for any thoughts on this,
The proper way to implement this would be IRIs (RFC 3987),
in particular section 3.1. This is not as simple as just
encoding it as UTF-8, as you might have to apply IDNA to
the host part.
Code doing so just hasn't been contributed yet.
Regards,
Martin
_
10 matches
Mail list logo