[issue44744] [security] Open redirect attack due to insufficient validation in Urlparse
New submission from ready-research : `urlparse` mishandles certain uses of extra slash or backslash(such as https:/// , https:/, https:\) and interprets the URI as a relative path. A userland logic implementation that bases its decision on the urlparse() function may introduce a security vulnerability due to the unexpected returned values of the function. These vulnerabilities may manifest as an SSRF, Open Redirect, and other types of vulnerabilities related to incorrectly trusting a URL. ``` from urllib.parse import urlparse url1=urlparse('https://www.attacker.com/a/b') url2=urlparse('https:///www.attacker.com/a/b') url3=urlparse('https:/www.attacker.com/a/b') url4=urlparse('https:\www.attacker.com/a/b') print("Normal behaviour: HOSTNAME should be in netloc\n") print(url1) print("\nMishandling hostname and returning it as path\n") print(url2) print(url3) print(url4) ``` OUTPUT: ``` Normal behaviour: HOSTNAME should be in netloc ParseResult(scheme='https', netloc='www.attacker.com', path='/a/b', params='', query='', fragment='') Mishandling hostname and returning it as path ParseResult(scheme='https', netloc='', path='/www.attacker.com/a/b', params='', query='', fragment='') ParseResult(scheme='https', netloc='', path='/www.attacker.com/a/b', params='', query='', fragment='') ParseResult(scheme='https', netloc='', path='\\www.attacker.com/a/b', params='', query='', fragment='') ``` -- components: Parser messages: 398232 nosy: lys.nikolaou, pablogsal, ready-research priority: normal severity: normal status: open title: [security] Open redirect attack due to insufficient validation in Urlparse versions: Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue44744> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44744] [security] Open redirect attack due to insufficient validation in Urlparse
ready-research added the comment: Some other examples to test this behaviour: urlparse('https:/\/\/\www.attacker.com/a/b') urlparse('https:/\www.attacker.com/a/b') ## Comparing it to other languages/runtimes How do other languages and their runtimes work with URL parsing functions? Here's Node.js, also showing that it is missing the `host` and `hostname`, with a similar behavior to the currently reported "buggy" python `urlparse()` one: ``` node >require("url").parse("https:/\/\/\www.attacker.com/a/b"); Will return Url { protocol: 'https:', slashes: true, auth: null, host: '', port: null, hostname: '', hash: null, search: null, query: null, pathname: '/www.attacker.com/a/b', path: '/www.attacker.com/a/b', href: 'https:///www.attacker.com/a/b' } ``` But it is already documented that using Node.js url.parse can lead to security issues: https://nodejs.org/dist/latest-v16.x/docs/api/url.html#url_url_parse_urlstring_parsequerystring_slashesdenotehost `Use of the legacy url.parse() method is discouraged. Users should use the WHATWG URL API. Because the url.parse() method uses a lenient, non-standard algorithm for parsing URL strings, security issues can be introduced. Specifically, issues with host name spoofing and incorrect handling of usernames and passwords have been identified.` Here's Ruby, also showing that it is missing the `host` and `hostname`, with a similar behavior to the currently reported "buggy" python `urlparse()` one: ```sh irb(main):001:0> require 'uri' => false irb(main):002:0> uri = URI.parse('https:/www.attacker.com/a/b') => # irb(main):003:0> uri.host => nil irb(main):004:0> uri.hostname => nil irb(main):005:0> uri.scheme => "https" irb(main):006:0> uri.path => "/www.attacker.com/a/b" ``` That said, it seems that Ruby throws on other permutations of the bad URL, which python does not. For example: ``` irb(main):011:0> other_uri = URI.parse('https:/\/\/\www.attacker.com/a/b') Traceback (most recent call last): 8: from /usr/bin/irb:23:in `' 7: from /usr/bin/irb:23:in `load' 6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `' 5: from (irb):11 4: from (irb):11:in `rescue in irb_binding' 3: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in `parse' 2: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in `parse' 1: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in `split' URI::InvalidURIError (bad URI(is not URI?): "https:/\\/\\/\\www.attacker.com/a/b") ``` Same for this other URI, which Ruby does not accept (unlike python, which does accept it and returns with a missing host and hostname properties as evident earlier in this report): ``` irb(main):012:0> other_uri = URI.parse('https:/\www.attacker.com/a/b') Traceback (most recent call last): 8: from /usr/bin/irb:23:in `' 7: from /usr/bin/irb:23:in `load' 6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `' 5: from (irb):12 4: from (irb):12:in `rescue in irb_binding' 3: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in `parse' 2: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in `parse' 1: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in `split' URI::InvalidURIError (bad URI(is not URI?): "https:/\\www.attacker.com/a/b") ``` Let's look at PHP. PHP's parse_url() function behaves much like python, where it misses to identify the host property for all 3 examples provided in this report: ``` ❯ php -a Interactive shell php > var_dump(parse_url('https:/\www.attacker.com/a/b')); array(2) { ["scheme"]=> string(5) "https" ["path"]=> string(22) "/\www.attacker.com/a/b" } php > var_dump(parse_url('https:/www.attacker.com/a/b')); array(2) { ["scheme"]=> string(5) "https" ["path"]=> string(21) "/www.attacker.com/a/b" } php > var_dump(parse_url('https:/\/\/\www.attacker.com/a/b')); array(2) { ["scheme"]=> string(5) "https" ["path"]=> string(26) "/\/\/\www.attacker.com/a/b" } php > var_dump(parse_url('https://www.attacker.com/a/b')); array(3) { ["scheme"]=> string(5)
[issue44744] [security] Open redirect attack due to insufficient validation in Urlparse
ready-research added the comment: Node.js is recommending using WHATWG URL API. Which handles all these correctly. We can test the same using https://jsdom.github.io/whatwg-url/ For example test the below and will return the same(correct) for all. https:///www.attacker.com/a/b https:/www.attacker.com/a/b https:\www.attacker.com/a/b https:/\/\/\www.attacker.com/a/b https:/\www.attacker.com/a/b ``` hrefhttps://www.attacker.com/a/b protocolhttps: username(empty string) password(empty string) port(empty string) hostnamewww.attacker.com pathname/a/b search (empty string) hash(empty string) ``` SUMMARY: python urlparse() function should also handle all the above in the same way. -- ___ Python tracker <https://bugs.python.org/issue44744> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com