[issue23505] Urlparse insufficient validation leads to open redirect
New submission from Yassine ABOUKIR: The module urlparse lacks proper validation of the input leading to open redirect vulnerability. The issue is that URLs do not survive the round-trip through `urlunparse(urlparse(url))`. Python sees `/foo.com` as a URL with no hostname or scheme and a path of `//foo.com`, but when it reconstructs the URL after parsing, it becomes `//foo.com`. This can be practically exploited this way : http://example.com/login?next=/evil.com The for fix this would be for `urlunparse()` to serialize paths with two leading slashes as '/%2F', at least when `scheme` and `netloc` are empty. -- components: Library (Lib) messages: 236470 nosy: yaaboukir priority: normal severity: normal status: open title: Urlparse insufficient validation leads to open redirect type: security ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23505] Urlparse insufficient validation leads to open redirect
Changes by Yassine ABOUKIR : -- nosy: +benjamin.peterson, pitrou, python-dev ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23505] Urlparse insufficient validation leads to open redirect
Yassine ABOUKIR added the comment: For your information, this security issue has been assigned a CVE ID : CVE-2015-2104 -- ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23505] Urlparse insufficient validation leads to open redirect
Yassine ABOUKIR added the comment: Yes, exploiting this bug an attacker may redirect a specific vitim to a malicious website, in our case evil.com >>> x = urlparse("evil.com") ///evil.com will be parsed as relative-path URL which is the correct expected behaviour >>> print x >>> ParseResult(scheme='', netloc='', path='//evil.com', params='', query='', >>> fragment='') As you see two slashes are removed and it is marked as a relative-path URL but when we reconstruct the URL using urlunparse() function, the URL is treated as an absolute URL to which you will be redirected. >>> x = urlunparse(urlparse("evil.com")) >>> urlparse(x) ParseResult(scheme='', netloc='evil.com', path='', params='', query='', fragment='') -- ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23505] Urlparse insufficient validation leads to open redirect
Yassine ABOUKIR added the comment: When you directly type //evil.com or evil.com in Firefox URL bar you will be redirect to evil.com and that is very known, read this : http://homakov.blogspot.com/2014/01/evolution-of-open-redirect-vulnerability.html Here is a video demonstration of the vulnerability : http://youtu.be/l0uDAqpRPpo -- ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23505] Urlparse insufficient validation leads to open redirect
Yassine ABOUKIR added the comment: I am not quiet sure about the first proposal but I strongly believe the appropriate method to fix this is by checking if the path starts with double slashes and then URL encoding the two leading slashes. -- ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23505] Urlparse insufficient validation leads to open redirect
Yassine ABOUKIR added the comment: "Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component." https://docs.python.org/2/library/urlparse.html 2015-03-03 22:16 GMT+00:00 Paul McMillan <>: Yeah. I agree the lack of round trip is surprising, and I agree we should fix it. I think the underlying issue here is that urlparse has a pretty different view of the world when compared with the browsers. I know that bit me when I first started using python, and it periodically surfaces in cases like this, where the browser thinks that "//evil.com" is a url, but we've parsed it as part of a path. Backwards compatibility makes it hard to update urlparse to precisely match browser behavior, but there's probably room for a new library designed with browser compatibility as a primary feature. -Paul On Tue, Mar 3, 2015 at 10:07 PM, Antoine Pitrou <> wrote: > > Hi Paul, > > Le 03/03/2015 23:01, Paul McMillan a écrit : >> I understand how this works. You don't need to paste the example again. >> >> The documentation makes no guarantee that parse/unparse will do what >> you want them to do, and does explicitly lay out the specific rules >> used for separating the parts. > > Well, I don't know if it's a security issue, but failure to roundtrip > *is* surprising (and IMHO dangerous for that reason) behaviour to say > the least. > > Moreover, the urlunparse() documentation (in 3.x) says: > """ > Construct a URL from a tuple as returned by urlparse(). [...] This may > result in a slightly different, but equivalent URL, if the URL that was > parsed originally had unnecessary delimiters > """ > (https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlunparse) > > which implies that any divergence when roundtripping should only consist > in cosmetic, not essential, differences ("equivalent URL"). > > Regards > > Antoine. > - > Python Security Response Team > Unsubscribe: https://mail.python.org/mailman/options/psrt/paul %40mcmillan.ws -- ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23505] Urlparse insufficient validation leads to open redirect
Yassine ABOUKIR added the comment: From: cve-assign () mitre org Date: Thu, 5 Mar 2015 16:42:02 -0500 (EST) -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 We think that the issue reduces to the question of whether it's acceptable for urlparse to provide inconsistent information about the structure of a URL. https://docs.python.org/2/library/urlparse.html says: urlparse.urlparse(urlstring[, scheme[, allow_fragments]]) Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. urlparse.urlunparse(parts) Construct a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable. This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent). The first issue is that the urlunparse documentation is ambiguous. We believe the reasonable interpretation is that there is a missing third sentence: "This ALWAYS results in a URL that is either identical or equivalent to the URL that was parsed originally." There's another interpretation that we believe is unreasonable: "This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters. If the URL that was parsed originally did not have unnecessary delimiters, then the behavior of urlunparse is UNDEFINED." So, our expectation is that urlunparse(urlparse(original_url)) should not have any significant effect on the meaning of original_url. We also think that a Python user should be able to rely on that property to make security-relevant decisions. To simply the situation, consider a case where the URL is used exclusively within Python code, and is never accessed by any web browser. The actual behavior is: >>> from urlparse import urlparse, urlunparse >>> print urlparse("example.com") ParseResult(scheme='', netloc='', path='//example.com', params='', query='', fragment='') >>> print urlparse(urlunparse(urlparse("example.com"))) ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='') >>> print urlparse(urlunparse(urlparse(urlunparse(urlparse("example.com") ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='') Here, urlparse(urlunparse(original_url)) does have a significant effect on the meaning of original_url. The Python user may have wanted to make a security-relevant decision based on whether netloc was an empty string. However, netloc is different depending on whether urlparse(urlunparse(original_url)) occurs at least once. The user's application (suppose it's called "PyNetlocExaminer") is affected in a security-relevant way. The next question is, if there is a CVE for a report of a security-relevant problem, what product is named as the primary affected product within that CVE. There is no perfect answer to this question. Especially in the case of a general-purpose language such as Python, there's an extremely wide range of bugs that might become security-relevant in some applications. What we usually try to do is make the CVE useful to users who may need to perform a software update. Specifically: 1. If the language implementation is not ever going to be changed (for example: because the language maintainer believes the observed behavior has always been correct, or the language maintainer believes that it has retroactively become correct because any change would break compatibility with other applications), then the application is named as the primary affected product in the CVE. In other words, if the inconsistency between netloc='' and netloc='example.com' were actually the intended behavior all along, then PyNetlocExaminer would be named in the CVE. Here, realistically, the end user would need to update or manually fix PyNetlocExaminer. 2. If the language implementation is incorrect and is planned to be changed at some point, and that would eliminate the security-relevant problem, then the language implementation is named in the CVE. (An application might also be named in the CVE, especially if there are very few affected applications.) This option occurs regardless of whether the language maintainer believes that it is a language vulnerability. (The language maintainer has the option of composing a dispute that would be appended to the CVE.) Here, the end user may ultimately deci
[issue23505] Urlparse insufficient validation leads to open redirect
Yassine ABOUKIR added the comment: From: Amos Jeffries Date: Fri, 06 Mar 2015 14:09:55 +1300 On 6/03/2015 10:42 a.m., cve-assign () mitre org wrote: We think that the issue reduces to the question of whether it's acceptable for urlparse to provide inconsistent information about the structure of a URL. https://docs.python.org/2/library/urlparse.html says: urlparse.urlparse(urlstring[, scheme[, allow_fragments]]) Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. My 2c ... no it does not. There are 7 parts in a URL. What is called "netloc" in that description is actually two fields: [userinfo '@'] authority The userinfo field is very much alive and well in non-HTTP schemes. Ignoring the userinfo field leaves implementations open to attacks of the form: scheme://example.com () phishing com/path AYJ -- ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23505] Urlparse insufficient validation leads to open redirect
Yassine ABOUKIR added the comment: Any updates concerning this issue ? is it going to be fixed or at least modify the documentation in order to warn developers about this behaviour ? -- ___ Python tracker <http://bugs.python.org/issue23505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com