[issue22118] urljoin fails with messy relative URLs

2014-09-22 Thread Roundup Robot
Roundup Robot added the comment: New changeset 901e4e52b20a by Senthil Kumaran in branch 'default': Issue #22278: Fix urljoin problem with relative urls, a regression observed https://hg.python.org/cpython/rev/901e4e52b20a -- ___ Python tracker

[issue22118] urljoin fails with messy relative URLs

2014-08-22 Thread Nick Coghlan
Nick Coghlan added the comment: Issue #1500504 (the "urischemes" proposal that never got turned into a PyPI package) has several additional test cases. This particular attachment is one AMK cleaned up to run on Py3k: http://bugs.python.org/file32591/urischemes.py The module itself likely isn'

[issue22118] urljoin fails with messy relative URLs

2014-08-22 Thread Stefan Behnel
Stefan Behnel added the comment: I'm now getting duplicated slashes in URLs, e.g.: https://new//foo.html http://my.little.server/url//logo.gif In both cases, the base URL that gets joined with the postfix had a trailing slash, e.g. "http://my.little.server/url/"; + "logo.gif" -> "http://my.l

[issue22118] urljoin fails with messy relative URLs

2014-08-21 Thread Antoine Pitrou
Antoine Pitrou added the comment: The patch is now committed to the future Python 3.5. Thank you very much for this contribution! -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker

[issue22118] urljoin fails with messy relative URLs

2014-08-21 Thread Roundup Robot
Roundup Robot added the comment: New changeset b116489d31ff by Antoine Pitrou in branch 'default': Issue #22118: Switch urllib.parse to use RFC 3986 semantics for the resolution of relative URLs, rather than RFCs 1808 and 2396. http://hg.python.org/cpython/rev/b116489d31ff -- nosy: +pyt

[issue22118] urljoin fails with messy relative URLs

2014-08-11 Thread Demian Brecht
Demian Brecht added the comment: Thanks Mike, it's always nice to get positive feedback :) -- ___ Python tracker ___ ___ Python-bugs-l

[issue22118] urljoin fails with messy relative URLs

2014-08-11 Thread Mike Lissner
Mike Lissner added the comment: Just hopping in here to say that the work going down here is beautiful. I've filed a lot of bugs. This one's not particularly difficult, but damn, I appreciate the speed and quality going into fixing it. Glad to see the Python language is a happy place with fas

[issue22118] urljoin fails with messy relative URLs

2014-08-11 Thread Demian Brecht
Demian Brecht added the comment: Uploaded new patch. Removed support for RFC1808-specific behaviour. Extracted non-compliant tests into comment blocks indicating the behaviour is no longer supported. -- Added file: http://bugs.python.org/file36347/issue22118_2.patch __

[issue22118] urljoin fails with messy relative URLs

2014-08-10 Thread Nick Coghlan
Nick Coghlan added the comment: Unfortunately, I haven't looked at the RFC compliance side of things in this space in years. At the time, we were considering a new module, leaving the old one alone. These days, I'd be more inclined to just fix it for 3.5, and suggest anyone really needing the ol

[issue22118] urljoin fails with messy relative URLs

2014-08-10 Thread Demian Brecht
Demian Brecht added the comment: FWIW, I think that it would be ever so slightly preferable to maintain support for the old behaviour but perhaps add a deprecation note in the docs. It's a little difficult to tell how much of an impact this change will have to existing code. Maintaining suppor

[issue22118] urljoin fails with messy relative URLs

2014-08-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: Demian, thanks for the update! I'm not sure the rfc1808 flag is necessary. I would be fine with switching wholesale to the new semantics. Nick, since you've done work in the past on URIs, what do you think? -- ___ P

[issue22118] urljoin fails with messy relative URLs

2014-08-06 Thread Demian Brecht
Demian Brecht added the comment: Update based on review comments, added legacy support, fixed the tests up. -- Added file: http://bugs.python.org/file36296/issue22118_1.patch ___ Python tracker

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: > It /does/ break backwards compatibility, but it seems that previous > logic was incorrect (based on my upcoming checking for consistency > between RFCs). As such, I'm not sure that it should be fixed < 3.5. > Thoughts? Actually, the logic seems to be correct a

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Demian Brecht
Demian Brecht added the comment: Here's an initial patch with a fix that passes all the test cases other than the ones that are incorrect based on examples and pseudocode in RFC 3986. I haven't checked obsoleted RFCs yet to ensure consistency, but will do so when I get a chance (likely tomorro

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Mike Lissner
Mike Lissner added the comment: @demian.brecht, that'd make me very pleased if you took this over. I won't have time to devote to it, unfortunately. -- ___ Python tracker ___ __

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Demian Brecht
Demian Brecht added the comment: I've only had a couple minutes to look into this so far, but the bug does indeed seem to be valid across all versions. In fact, the line "# XXX The stuff below is bogus in various ways..." in urljoin tipped me off to something potentially being askew ;) Unless

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- versions: +Python 3.4, Python 3.5 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: Actually, according to RFC 3986, it seems you are right and we should remove extraneous leading "/../" and "/./" components in the path. http://tools.ietf.org/html/rfc3986#section-5.4 -- nosy: +ncoghlan ___ Python tr

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Mike Lissner
Mike Lissner added the comment: @pitrou, I haven't delved into URLs in a long while, but the general idea is: scheme://domain:port/path?query_string#fragment_id When would it ever make sense to have something up a level from the root of the domain? --

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: The thing is, urljoin() isn't HTTP-specific, and such URLs *may* make sense for other protocols. -- nosy: +pitrou ___ Python tracker ___ __

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Demian Brecht
Changes by Demian Brecht : -- nosy: +demian.brecht ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail

[issue22118] urljoin fails with messy relative URLs

2014-08-04 Thread Ezio Melotti
Changes by Ezio Melotti : -- keywords: +easy nosy: +ezio.melotti, orsenthil stage: -> needs patch ___ Python tracker ___ ___ Python-b

[issue22118] urljoin fails with messy relative URLs

2014-08-01 Thread Mike Lissner
Mike Lissner added the comment: FWIW, the workaround that I've just created for this problem is this: u = 'https://www.appeals2.az.gov/../Decisions/CR20130096OPN.pdf' # Split the url and rejoin it, nuking any '/..' patterns at the # beginning of the path. s = urlsplit(u) urlunsplit(s[:2] + (re.s

[issue22118] urljoin fails with messy relative URLs

2014-08-01 Thread Mike Lissner
New submission from Mike Lissner: Not sure if this is desired behavior, but it's making my code break, so I figured I'd get it filed. I'm trying to crawl this website: https://www.appeals2.az.gov/ODSPlus/recentDecisions2.cfm Unfortunately, most of the URLs in the HTML are relative, taking the