[Python-Dev] patch 1462525 or similar solution?

2006-10-31 Thread Paul Jimenez

I submitted patch 1462525 awhile back to
solve the problem described even longer ago in
http://mail.python.org/pipermail/python-dev/2005-November/058301.html
and I'm wondering what my appropriate next steps are. Honestly, I don't
care if you take my patch or someone else's proposed solution, but I'd
like to see something go into the stdlib so that I can eventually stop
having to ship custom code for what is really a standard problem.

  --pj


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] New uriparse.py

2006-04-02 Thread Paul Jimenez

Announcing uriparse.py, submitted for inclusion in the standard library.
Patch request 1462525.
Per the original discussion at
http://mail.python.org/pipermail/python-dev/2005-November/058301.html
I'm submitting a library meant to deprecate the
existing urlparse library. Questions and comments welcome.

  --pj

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] uriparsing library (Patch #1462525)

2006-05-30 Thread Paul Jimenez

http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470
 

This is just a note to ask when the best time to try and get this in is
- I've seen other new/changed libs going in for 2.5 and wanted to make
sure this didn't fall off the radar. If now's the wrong time, please let
me know when the right time is so I can stick my head up again then.

Thanks,

  --pj


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Some more comments re new uriparse module, patch 1462525

2006-06-03 Thread Paul Jimenez
On Friday, Jun 2, 2006, John J Lee writes:
>[Not sure whether this kind of thing is best posted as tracker comments 
>(but then the tracker gets terribly long and is mailed out every time a 
>change happens) or posted here.  Feel free to tell me I'm posting in the 
>wrong place...]

I think this is a fine place - more googleable, still archived, etc.

>Some comments on this patch (a new module, submitted by Paul Jimenez, 
>implementing the rules set out in RFC 3986 for URI parsing, joining URI 
>references with a base URI etc.)
>
>http://python.org/sf/1462525

Note that like many opensource authors, I wrote this to 'scratch an
itch' that I had... and am submitting it in hopes of saving someone else
somewhere some essentially identical work. I'm not married to it; I just
want something *like* it to end up in the stdlib so that I can use it.

>Sorry for the pause, Paul.  I finally read RFC 3986 -- which I must say is 
>probably the best-written RFC I've read (and there was much rejoicing).

No worries.  Yeah, the RFC is pretty clear (for once) :)

>I still haven't read 3987 and got to grips with the unicode issues 
>(whatever they are), but I have just implemented the same stuff you did, 
>so have some comments on non-unicode aspects of your implementation (the 
>version labelled "v23" on the tracker):
>
>
>Your urljoin implementation seems to pass the tests (the tests taken from 
>the RFC), but I have to I admit I don't understand it :-)  It doesn't seem 
>to take account of the distinction between undefined and empty URI 
>components.  For example, the authority of the URI reference may be empty 
>but still defined.  Anyway, if you're taking advantage of some subtle 
>identity that implies that you can get away with truth-testing in place of 
>"is None" tests, please don't ;-) It's slower than "is [not] None" tests 
>both for the computer and (especially!) the reader.

First of all, I must say that urljoin is my least favorite part of this
module; I include it only so as not to break backward compatibility -
I don't have any personal use-cases for such. That said, some of the
'join' semantics are indeed a bit subtle; it took a bit of tinkering to
make all the tests work. I was indeed using 'if foo:' instead of 'if
foo is not None:', but that can be easily fixed; I didn't know there
was a performance issue there. Stylistically I find them about the same
clarity-wise.

>I don't like the use of module posixpath to implement the algorithm 
>labelled "remove_dot_segments".  URIs are not POSIX filesystem paths, and 
>shouldn't depend on code meant to implement the latter.  But my own 
>implementation is exceedingly ugly ATM, so I'm in no position to grumble 
>too much :-)

While URIs themselves are not, of course, POSIX filesystem paths, I believe
there's a strong case that their path components are semantically identical
in this usage.  I see no need to duplicate code that I know can be fairly
tricky to get right; better to let someone else worry about the corner cases
and take advantage of their work when I can.

>Normalisation of the base URI is optional, and your urljoin function
>never normalises.  Instead, it parses the base and reference, then
>follows the algorithm of section 5.2 of the RFC.  Parsing is required
>before normalisation takes place.  So urljoin forces people who need
>to normalise the URI before to parse it twice, which is annoying.
>There should be some way to parse 5-tuples in instead of URIs.  E.g.,
>from my implementation:
>
>def urljoin(base_uri, uri_reference):
> return urlunsplit(urljoin_parts(urlsplit(base_uri),
> urlsplit(uri_reference)))
>

It would certainly be easy to add a version which took tuples instead
of strings, but I was attempting, as previously stated, to conform to the
extant urlparse.urljoin API for backward compatability.  Also as I previously
stated, I have no personal use-cases for urljoin so the issue of having to
double-parse if you do normalization never came to my attention.

>It would be nice to have a 5-tuple-like class (I guess implemented as a 
>subclass of tuple) that also exposes attributes (.authority, .path, etc.) 
>-- the same way module time does it.

That starts to edge over into a 'generic URI' class, which I'm uncomfortable
with due to the possibility of opaque URIs that don't conform to that spec.
The fallback of putting everthing other than the scheme into 'path' doesn't
appeal to me.

>The path component is required, though may be empty.  Your parser
>returns None (meaning "undefined") where it should return an empty
>string.

Indeed.  Fixed now; a fresh look at the

Re: [Python-Dev] Some more comments re new uriparse module, patch 1462525

2006-06-09 Thread Paul Jimenez
On Thursday, Jun 8, 2006, Mike Brown writes:
>
>It appears that Paul uploaded a new version of his library on June 3:
>http://python.org/sf/1462525
>I'm unclear on the relationship between the two now. Are they both up for 
>consideration?

That version was in response to comments from JJ Lee.  Email also went to pydev
(archived at http://mail.python.org/pipermail/python-dev/2006-June/065583.html)
about it.

>One thing I forgot to mention in private email is that I'm concerned that the
>inclusion of URI reference resolution functionality has exceeded the scope of
>this 'urischemes' module that purports to be for 'extensible URI parsing'.  It
>is becoming a scheme-aware and general-purpose syntactic processing library
>for URIs, and should be characterized as such in its name as well as in its
>documentation. 

...which is why i called it 'uriparse'. 

>Even without a new name and more accurately documented scope, people are going
>to see no reason not to add the rest of STD 66's functionality to it
>(percent-encoding, normalization for testing equivalence, syntax
>validation...). As you can see in Ft.Lib.Uri, the latter two are not at all
>hard to implement, especially if you use regular expressions. These all fall 
>under syntactic operations on URIs, just like reference-resolution.
>
>Percent-encoding gets very hairy with its API details due to application-level
>uses that don't jive with STD 66 (e.g. the fuzzy specs and convoluted history
>governing application/x-www-form-urlencoded), the nuances of character
>encoding and Python string types, and widely varying expectations of users.

...all of which I consider scope creep. If someone else wants to add
it, more power to them; I wrote this code to fix the deficiencies in
the existing urlparse library, not to be an all-singing all-dancing
STD 66 module. In fact, I don't care whether it's my code or someone
else's that goes into the library - I just want something better than
the existing urlparse library to go in, because the existing stuff has
been acknowledged as insufficient. I've even provided working code with
modifications to fix comments and criticism I've received. If you or
someone else want to extend what I've done to add features or other
functionality, that would be fine with me. If you want to rewrite the
entire thing in a different vein (as Nick Coghlan as done), be my guest.
I'm not married to my code or API or anything but getting an improved
library into the stdlib. To that end, if it's decided to go with my
code, I'll happily put in the work to bring it up to python community
standards. Additional functionality will have to come from someone else
though, as I'm not willing to try and scratch an itch I don't have - and
I've already got a day job.

  --pj


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] urlparse brokenness

2005-11-23 Thread Paul Jimenez

It is my assertion that urlparse is currently broken.  Specifically, I 
think that urlparse breaks an abstraction boundary with ill effect.

In writing a mailclient, I wished to allow my users to specify their
imap server as a url, such as 'imap://user:[EMAIL PROTECTED]:port/'. Which
worked fine. I then thought that the natural extension to support
configuration of imapssl would be 'imaps://user:[EMAIL PROTECTED]:port/'
which failed - user:[EMAIL PROTECTED]:port got parsed as the *path* of
the URL instead of the network location. It turns out that urlparse
keeps a table of url schemes that 'use netloc'... that is to say,
that have a 'user:[EMAIL PROTECTED]:port' part to their URL. I think this
'special knowledge' about particular schemes 1) breaks an abstraction
boundary by having a function whose charter is to pull apart a
particularly-formatted string behave differently based on the meaning of
the string instead of the structure of it and 2) fails to be extensible
or forward compatible due to hardcoded 'magic' strings - if schemes were
somehow 'registerable' as 'netloc using' or not, then this objection
might be nullified, but the previous objection would still stand.

So I propose that urlsplit, the main offender, be replaced with something
that looks like:

def urlsplit(url, scheme='', allow_fragments=1, default=('','','','','')):
"""Parse a URL into 5 components:
:///?#
Return a 5-tuple: (scheme, netloc, path, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
key = url, scheme, allow_fragments, default
cached = _parse_cache.get(key, None)
if cached:
return cached
if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
clear_cache()

if "://" in url:
uscheme, npqf = url.split("://", 1)
else:
uscheme = scheme
if not uscheme:
uscheme = default[0]
npqf = url
pathidx = npqf.find('/')
if pathidx == -1:  # not found
netloc = npqf
path, query, fragment = default[1:4]
else:
netloc = npqf[:pathidx]
pqf = npqf[pathidx:]
if '?' in pqf:
path, qf = pqf.split('?',1)
else:
path, qf = pqf, ''.join(default[3:5])
if ('#' in qf) and allow_fragments:
query, fragment = qf.split('#',1)
else:
query, fragment = default[3:5]
tuple = (uscheme, netloc, path, query, fragment)
_parse_cache[key] = tuple
return tuple

Note that I'm not sold on the _parse_cache, but I'm assuming it was there
for a reason so I'm leaving that functionality as-is.

If this isn't the right forum for this discussion, or the right place to 
submit code, please let me know.  Also, please cc: me directly on responses
as I'm not subscribed to the firehose that is python-dev.

  --pj

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com