[Python-Dev] patch 1462525 or similar solution?
I submitted patch 1462525 awhile back to solve the problem described even longer ago in http://mail.python.org/pipermail/python-dev/2005-November/058301.html and I'm wondering what my appropriate next steps are. Honestly, I don't care if you take my patch or someone else's proposed solution, but I'd like to see something go into the stdlib so that I can eventually stop having to ship custom code for what is really a standard problem. --pj ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] New uriparse.py
Announcing uriparse.py, submitted for inclusion in the standard library. Patch request 1462525. Per the original discussion at http://mail.python.org/pipermail/python-dev/2005-November/058301.html I'm submitting a library meant to deprecate the existing urlparse library. Questions and comments welcome. --pj ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] uriparsing library (Patch #1462525)
http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470 This is just a note to ask when the best time to try and get this in is - I've seen other new/changed libs going in for 2.5 and wanted to make sure this didn't fall off the radar. If now's the wrong time, please let me know when the right time is so I can stick my head up again then. Thanks, --pj ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Some more comments re new uriparse module, patch 1462525
On Friday, Jun 2, 2006, John J Lee writes: >[Not sure whether this kind of thing is best posted as tracker comments >(but then the tracker gets terribly long and is mailed out every time a >change happens) or posted here. Feel free to tell me I'm posting in the >wrong place...] I think this is a fine place - more googleable, still archived, etc. >Some comments on this patch (a new module, submitted by Paul Jimenez, >implementing the rules set out in RFC 3986 for URI parsing, joining URI >references with a base URI etc.) > >http://python.org/sf/1462525 Note that like many opensource authors, I wrote this to 'scratch an itch' that I had... and am submitting it in hopes of saving someone else somewhere some essentially identical work. I'm not married to it; I just want something *like* it to end up in the stdlib so that I can use it. >Sorry for the pause, Paul. I finally read RFC 3986 -- which I must say is >probably the best-written RFC I've read (and there was much rejoicing). No worries. Yeah, the RFC is pretty clear (for once) :) >I still haven't read 3987 and got to grips with the unicode issues >(whatever they are), but I have just implemented the same stuff you did, >so have some comments on non-unicode aspects of your implementation (the >version labelled "v23" on the tracker): > > >Your urljoin implementation seems to pass the tests (the tests taken from >the RFC), but I have to I admit I don't understand it :-) It doesn't seem >to take account of the distinction between undefined and empty URI >components. For example, the authority of the URI reference may be empty >but still defined. Anyway, if you're taking advantage of some subtle >identity that implies that you can get away with truth-testing in place of >"is None" tests, please don't ;-) It's slower than "is [not] None" tests >both for the computer and (especially!) the reader. First of all, I must say that urljoin is my least favorite part of this module; I include it only so as not to break backward compatibility - I don't have any personal use-cases for such. That said, some of the 'join' semantics are indeed a bit subtle; it took a bit of tinkering to make all the tests work. I was indeed using 'if foo:' instead of 'if foo is not None:', but that can be easily fixed; I didn't know there was a performance issue there. Stylistically I find them about the same clarity-wise. >I don't like the use of module posixpath to implement the algorithm >labelled "remove_dot_segments". URIs are not POSIX filesystem paths, and >shouldn't depend on code meant to implement the latter. But my own >implementation is exceedingly ugly ATM, so I'm in no position to grumble >too much :-) While URIs themselves are not, of course, POSIX filesystem paths, I believe there's a strong case that their path components are semantically identical in this usage. I see no need to duplicate code that I know can be fairly tricky to get right; better to let someone else worry about the corner cases and take advantage of their work when I can. >Normalisation of the base URI is optional, and your urljoin function >never normalises. Instead, it parses the base and reference, then >follows the algorithm of section 5.2 of the RFC. Parsing is required >before normalisation takes place. So urljoin forces people who need >to normalise the URI before to parse it twice, which is annoying. >There should be some way to parse 5-tuples in instead of URIs. E.g., >from my implementation: > >def urljoin(base_uri, uri_reference): > return urlunsplit(urljoin_parts(urlsplit(base_uri), > urlsplit(uri_reference))) > It would certainly be easy to add a version which took tuples instead of strings, but I was attempting, as previously stated, to conform to the extant urlparse.urljoin API for backward compatability. Also as I previously stated, I have no personal use-cases for urljoin so the issue of having to double-parse if you do normalization never came to my attention. >It would be nice to have a 5-tuple-like class (I guess implemented as a >subclass of tuple) that also exposes attributes (.authority, .path, etc.) >-- the same way module time does it. That starts to edge over into a 'generic URI' class, which I'm uncomfortable with due to the possibility of opaque URIs that don't conform to that spec. The fallback of putting everthing other than the scheme into 'path' doesn't appeal to me. >The path component is required, though may be empty. Your parser >returns None (meaning "undefined") where it should return an empty >string. Indeed. Fixed now; a fresh look at the
Re: [Python-Dev] Some more comments re new uriparse module, patch 1462525
On Thursday, Jun 8, 2006, Mike Brown writes: > >It appears that Paul uploaded a new version of his library on June 3: >http://python.org/sf/1462525 >I'm unclear on the relationship between the two now. Are they both up for >consideration? That version was in response to comments from JJ Lee. Email also went to pydev (archived at http://mail.python.org/pipermail/python-dev/2006-June/065583.html) about it. >One thing I forgot to mention in private email is that I'm concerned that the >inclusion of URI reference resolution functionality has exceeded the scope of >this 'urischemes' module that purports to be for 'extensible URI parsing'. It >is becoming a scheme-aware and general-purpose syntactic processing library >for URIs, and should be characterized as such in its name as well as in its >documentation. ...which is why i called it 'uriparse'. >Even without a new name and more accurately documented scope, people are going >to see no reason not to add the rest of STD 66's functionality to it >(percent-encoding, normalization for testing equivalence, syntax >validation...). As you can see in Ft.Lib.Uri, the latter two are not at all >hard to implement, especially if you use regular expressions. These all fall >under syntactic operations on URIs, just like reference-resolution. > >Percent-encoding gets very hairy with its API details due to application-level >uses that don't jive with STD 66 (e.g. the fuzzy specs and convoluted history >governing application/x-www-form-urlencoded), the nuances of character >encoding and Python string types, and widely varying expectations of users. ...all of which I consider scope creep. If someone else wants to add it, more power to them; I wrote this code to fix the deficiencies in the existing urlparse library, not to be an all-singing all-dancing STD 66 module. In fact, I don't care whether it's my code or someone else's that goes into the library - I just want something better than the existing urlparse library to go in, because the existing stuff has been acknowledged as insufficient. I've even provided working code with modifications to fix comments and criticism I've received. If you or someone else want to extend what I've done to add features or other functionality, that would be fine with me. If you want to rewrite the entire thing in a different vein (as Nick Coghlan as done), be my guest. I'm not married to my code or API or anything but getting an improved library into the stdlib. To that end, if it's decided to go with my code, I'll happily put in the work to bring it up to python community standards. Additional functionality will have to come from someone else though, as I'm not willing to try and scratch an itch I don't have - and I've already got a day job. --pj ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] urlparse brokenness
It is my assertion that urlparse is currently broken. Specifically, I think that urlparse breaks an abstraction boundary with ill effect. In writing a mailclient, I wished to allow my users to specify their imap server as a url, such as 'imap://user:[EMAIL PROTECTED]:port/'. Which worked fine. I then thought that the natural extension to support configuration of imapssl would be 'imaps://user:[EMAIL PROTECTED]:port/' which failed - user:[EMAIL PROTECTED]:port got parsed as the *path* of the URL instead of the network location. It turns out that urlparse keeps a table of url schemes that 'use netloc'... that is to say, that have a 'user:[EMAIL PROTECTED]:port' part to their URL. I think this 'special knowledge' about particular schemes 1) breaks an abstraction boundary by having a function whose charter is to pull apart a particularly-formatted string behave differently based on the meaning of the string instead of the structure of it and 2) fails to be extensible or forward compatible due to hardcoded 'magic' strings - if schemes were somehow 'registerable' as 'netloc using' or not, then this objection might be nullified, but the previous objection would still stand. So I propose that urlsplit, the main offender, be replaced with something that looks like: def urlsplit(url, scheme='', allow_fragments=1, default=('','','','','')): """Parse a URL into 5 components: :///?# Return a 5-tuple: (scheme, netloc, path, query, fragment). Note that we don't break the components up in smaller bits (e.g. netloc is a single string) and we don't expand % escapes.""" key = url, scheme, allow_fragments, default cached = _parse_cache.get(key, None) if cached: return cached if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth clear_cache() if "://" in url: uscheme, npqf = url.split("://", 1) else: uscheme = scheme if not uscheme: uscheme = default[0] npqf = url pathidx = npqf.find('/') if pathidx == -1: # not found netloc = npqf path, query, fragment = default[1:4] else: netloc = npqf[:pathidx] pqf = npqf[pathidx:] if '?' in pqf: path, qf = pqf.split('?',1) else: path, qf = pqf, ''.join(default[3:5]) if ('#' in qf) and allow_fragments: query, fragment = qf.split('#',1) else: query, fragment = default[3:5] tuple = (uscheme, netloc, path, query, fragment) _parse_cache[key] = tuple return tuple Note that I'm not sold on the _parse_cache, but I'm assuming it was there for a reason so I'm leaving that functionality as-is. If this isn't the right forum for this discussion, or the right place to submit code, please let me know. Also, please cc: me directly on responses as I'm not subscribed to the firehose that is python-dev. --pj ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com