[issue44744] [security] Open redirect attack due to insufficient validation in Urlparse

2021-07-26 Thread ready-research


New submission from ready-research :

`urlparse` mishandles certain uses of extra slash or backslash(such as 
https:/// , https:/, https:\) and interprets the URI as a relative path. 

A userland logic implementation that bases its decision on the urlparse() 
function may introduce a security vulnerability due to the unexpected returned 
values of the function. These vulnerabilities may manifest as an SSRF, Open 
Redirect, and other types of vulnerabilities related to incorrectly trusting a 
URL.

```
from urllib.parse import urlparse
url1=urlparse('https://www.attacker.com/a/b')
url2=urlparse('https:///www.attacker.com/a/b')
url3=urlparse('https:/www.attacker.com/a/b')
url4=urlparse('https:\www.attacker.com/a/b')
print("Normal behaviour: HOSTNAME should be in netloc\n")
print(url1)
print("\nMishandling hostname and returning it as path\n")
print(url2)
print(url3)
print(url4)
```

OUTPUT:
```
Normal behaviour: HOSTNAME should be in netloc

ParseResult(scheme='https', netloc='www.attacker.com', path='/a/b', params='', 
query='', fragment='')

Mishandling hostname and returning it as path

ParseResult(scheme='https', netloc='', path='/www.attacker.com/a/b', params='', 
query='', fragment='')
ParseResult(scheme='https', netloc='', path='/www.attacker.com/a/b', params='', 
query='', fragment='')
ParseResult(scheme='https', netloc='', path='\\www.attacker.com/a/b', 
params='', query='', fragment='')
```

--
components: Parser
messages: 398232
nosy: lys.nikolaou, pablogsal, ready-research
priority: normal
severity: normal
status: open
title: [security] Open redirect attack due to insufficient validation in 
Urlparse
versions: Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 
3.9

___
Python tracker 
<https://bugs.python.org/issue44744>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44744] [security] Open redirect attack due to insufficient validation in Urlparse

2021-07-26 Thread ready-research

ready-research  added the comment:

Some other examples to test this behaviour:
urlparse('https:/\/\/\www.attacker.com/a/b')
urlparse('https:/\www.attacker.com/a/b')

## Comparing it to other languages/runtimes

How do other languages and their runtimes work with URL parsing functions?

Here's Node.js, also showing that it is missing the `host` and `hostname`, with 
a similar behavior to the currently reported "buggy" python `urlparse()` one:
```
node
>require("url").parse("https:/\/\/\www.attacker.com/a/b");

Will return

Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: '',
  port: null,
  hostname: '',
  hash: null,
  search: null,
  query: null,
  pathname: '/www.attacker.com/a/b',
  path: '/www.attacker.com/a/b',
  href: 'https:///www.attacker.com/a/b'
}
```
But it is already documented that using Node.js url.parse can lead to security 
issues: 
https://nodejs.org/dist/latest-v16.x/docs/api/url.html#url_url_parse_urlstring_parsequerystring_slashesdenotehost
`Use of the legacy url.parse() method is discouraged. Users should use the 
WHATWG URL API. Because the url.parse() method uses a lenient, non-standard 
algorithm for parsing URL strings, security issues can be introduced. 
Specifically, issues with host name spoofing and incorrect handling of 
usernames and passwords have been identified.`


Here's Ruby, also showing that it is missing the `host` and `hostname`, with a 
similar behavior to the currently reported "buggy" python `urlparse()` one:

```sh
irb(main):001:0> require 'uri'
=> false
irb(main):002:0> uri = URI.parse('https:/www.attacker.com/a/b')
=> #
irb(main):003:0> uri.host
=> nil
irb(main):004:0> uri.hostname
=> nil
irb(main):005:0> uri.scheme
=> "https"
irb(main):006:0> uri.path
=> "/www.attacker.com/a/b"
```

That said, it seems that Ruby throws on other permutations of the bad URL, 
which python does not. For example:

```
irb(main):011:0> other_uri = URI.parse('https:/\/\/\www.attacker.com/a/b')
Traceback (most recent call last):
8: from /usr/bin/irb:23:in `'
7: from /usr/bin/irb:23:in `load'
6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `'
5: from (irb):11
4: from (irb):11:in `rescue in irb_binding'
3: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in
 `parse'
2: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in
 `parse'
1: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in
 `split'
URI::InvalidURIError (bad URI(is not URI?): 
"https:/\\/\\/\\www.attacker.com/a/b")
```

Same for this other URI, which Ruby does not accept (unlike python, which does 
accept it and returns with a missing host and hostname properties as evident 
earlier in this report):

```
irb(main):012:0> other_uri = URI.parse('https:/\www.attacker.com/a/b')
Traceback (most recent call last):
8: from /usr/bin/irb:23:in `'
7: from /usr/bin/irb:23:in `load'
6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `'
5: from (irb):12
4: from (irb):12:in `rescue in irb_binding'
3: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in
 `parse'
2: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in
 `parse'
1: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in
 `split'
URI::InvalidURIError (bad URI(is not URI?): "https:/\\www.attacker.com/a/b")
```

Let's look at PHP. PHP's parse_url() function behaves much like python, where 
it misses to identify the host property for all 3 examples provided in this 
report:
```
❯ php -a
Interactive shell

php > var_dump(parse_url('https:/\www.attacker.com/a/b'));
array(2) {
  ["scheme"]=>
  string(5) "https"
  ["path"]=>
  string(22) "/\www.attacker.com/a/b"
}
php > var_dump(parse_url('https:/www.attacker.com/a/b'));
array(2) {
  ["scheme"]=>
  string(5) "https"
  ["path"]=>
  string(21) "/www.attacker.com/a/b"
}
php > var_dump(parse_url('https:/\/\/\www.attacker.com/a/b'));
array(2) {
  ["scheme"]=>
  string(5) "https"
  ["path"]=>
  string(26) "/\/\/\www.attacker.com/a/b"
}
php > var_dump(parse_url('https://www.attacker.com/a/b'));
array(3) {
  ["scheme"]=>
  string(5)

[issue44744] [security] Open redirect attack due to insufficient validation in Urlparse

2021-07-26 Thread ready-research


ready-research  added the comment:

Node.js is recommending using WHATWG URL API. Which handles all these 
correctly. We can test the same using https://jsdom.github.io/whatwg-url/

For example test the below and will return the same(correct) for all. 
https:///www.attacker.com/a/b
https:/www.attacker.com/a/b
https:\www.attacker.com/a/b
https:/\/\/\www.attacker.com/a/b
https:/\www.attacker.com/a/b



```
hrefhttps://www.attacker.com/a/b
protocolhttps:
username(empty string)
password(empty string)
port(empty string)
hostnamewww.attacker.com
pathname/a/b
search  (empty string)
hash(empty string)
```

SUMMARY:
python urlparse() function should also handle all the above in the same way.

--

___
Python tracker 
<https://bugs.python.org/issue44744>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com