Re: Add support for IDNA 2008

2022-09-14 Thread 'Julien Bernard' via Django developers (Contributions to Django itself)
Hi Carlton,

Le mardi 13 septembre 2022 à 07:17:31 UTC-4, carlton...@gmail.com a écrit :

> Hi Julien. 
>
> I didn't get a canonical answer from the security team yet, but it may be 
> that we can make the idna an optional dependency quite easily. I already 
> have it installed in my dev environment, for instance, coming from selenium 
> and requests. 
>
> From the package docs: https://pypi.org/project/idna/
>
>You may use the codec encoding and decoding methods using 
> the idna.codec module:
>>>> import idna.codec 
>>>> print('домен.испытание'.encode('idna')) 
>b'xn--d1acufc.xn--80akhbyknj4f'
>
> So "use if installed" (catching the ImportError if not) would look 
> feasible. (The usage in the punycode helper is just `domain.encode("idna")` 
> which matches this example already.)
>

That's great news! Thanks.
 

>
> Would you fancy looking a PR around that? 
>

Yes, no problem.
 

>
> We'd need *some* tests for both the installed and not-installed cases, 
> ideally showing the difference. I didn't immediately have success with your 
> https://fuss.standcore.com/ example: 
>
> % python
> Python 3.10.6 (v3.10.6:9c7b4bd164, Aug  1 2022, 17:13:48) [Clang 
> 13.0.0 (clang-1300.0.29.30)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> print('https://fuß.standcore.com/'.encode('idna'))
> b'https://fuss.standcore.com/'
> >>> import idna.codec
> >>> print('https://fuß.standcore.com/'.encode('idna'))
> b'https://fuss.standcore.com/'  # Was expecting 
> https://xn--fu-hia.standcore.com/ from discussion 🤔
> >>> import idna
> >>> idna.encode('https://fuß.standcore.com/')
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/carlton/Envs/django/lib/python3.10/site-packages/idna/core.py", 
> line 357, in encode
> s = alabel(label)
>   File 
> "/Users/carlton/Envs/django/lib/python3.10/site-packages/idna/core.py", 
> line 269, in alabel
> check_label(label)
>   File 
> "/Users/carlton/Envs/django/lib/python3.10/site-packages/idna/core.py", 
> line 250, in check_label
> raise InvalidCodepoint('Codepoint {} at position {} of {} not 
> allowed'.format(_unot(cp_value), pos+1, repr(label)))
> idna.core.InvalidCodepoint: Codepoint U+003A at position 6 of 
> 'https://fuß' not allowed
>

I was not able to get .encode('idna') to work either. I reported this issue 
https://github.com/kjd/idna/issues/128 to check why this is not working as 
expected.

For the last part, idna works with labels or domains, so you would have to 
provide only the domain to the encode method:

% python
Python 3.10.7 (main, Sep  6 2022, 21:22:27) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import idna
>>> idna.encode('fuß.standcore.com')
b'xn--fu-hia.standcore.com'

Best regards,
Julien
 

>
> Possibly there's some objection to such a change, but I'm struggling to 
> imagine it short of concrete cases... 
>
> Thanks! 
>
> Kind Regards,
>
> Carlton
>
>
>
> On Wednesday, 7 September 2022 at 08:18:13 UTC+2 Carlton Gibson wrote:
>
>> Hey Julien. 
>>
>> Thanks, OK... 📖
>>
>> The Python docs have it 
>> 
>> : 
>>
>> > If you need the IDNA 2008 standard from *RFC 5891* 
>>  and *RFC 5895* 
>> , use the third-party idna 
>> module .
>>
>> So the question is do we **need** the newer standard?
>>
>> I will have a read of the various resources here, and I'll also ask the 
>> Django Security Team if they have any thoughts. 
>>
>> Kind Regards,
>>
>> Carlton
>>
>>
>> On Tue, 6 Sept 2022 at 23:03, 'Julien Bernard' via Django developers 
>> (Contributions to Django itself)  wrote:
>>
>>> Hi Carlton,
>>>
>>> IDNA 2008 made some changes in the valid or invalid IDNs and some 
>>> differences in the ways some characters are transformed in Punycode 
>>> compared to IDNA 2003 for multiple reasons.
>>> A difference that is often used as an example is the german 'ß' 
>>> character. In IDNA 2003 it is transformed into 'ss' while it is converted 
>>> into Punycode in IDNA 2008.
>>> It means that, depending on the standard that is implemented, you may 
>>> reach totally different domains with the same IDN, which may lead to 
>>> security issues.
>>> For example, the URL https://fuß.standcore.com/ 
>>>  would be https://fuss.standcore.com/ with 
>>> IDNA 2003 and https://xn--fu-hia.standcore.com/ with IDNA 2008.
>>> This is only a very brief insight, for further quick readings, 
>>> https://www.unicode.org/faq/idn.html is quite informative too.
>>>
>>> Best regards,
>>> Julien
>>>
>>> Le mardi 6 septembre 2022 à 14:39:49 UTC-4, carlton...@gmail.com a 
>>> écrit :
>>>
 Hey Julian. 

 What's maybe missing is some concrete cases. "T

Re: Add support for IDNA 2008

2022-09-14 Thread Carlton Gibson
OK, great, thanks.

I'll await your PR. Let's continue on GitHub for the moment then
Good hustle 👍

On Wed, 14 Sept 2022 at 15:54, 'Julien Bernard' via Django developers
(Contributions to Django itself)  wrote:

> Hi Carlton,
>
> Le mardi 13 septembre 2022 à 07:17:31 UTC-4, carlton...@gmail.com a
> écrit :
>
>> Hi Julien.
>>
>> I didn't get a canonical answer from the security team yet, but it may be
>> that we can make the idna an optional dependency quite easily. I already
>> have it installed in my dev environment, for instance, coming from selenium
>> and requests.
>>
>> From the package docs: https://pypi.org/project/idna/
>>
>>You may use the codec encoding and decoding methods using
>> the idna.codec module:
>>>>> import idna.codec
>>>>> print('домен.испытание'.encode('idna'))
>>b'xn--d1acufc.xn--80akhbyknj4f'
>>
>> So "use if installed" (catching the ImportError if not) would look
>> feasible. (The usage in the punycode helper is just `domain.encode("idna")`
>> which matches this example already.)
>>
>
> That's great news! Thanks.
>
>
>>
>> Would you fancy looking a PR around that?
>>
>
> Yes, no problem.
>
>
>>
>> We'd need *some* tests for both the installed and not-installed cases,
>> ideally showing the difference. I didn't immediately have success with your
>> https://fuss.standcore.com/ example:
>>
>> % python
>> Python 3.10.6 (v3.10.6:9c7b4bd164, Aug  1 2022, 17:13:48) [Clang
>> 13.0.0 (clang-1300.0.29.30)] on darwin
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> print('https://fuß.standcore.com/'.encode('idna')
>> )
>> b'https://fuss.standcore.com/'
>> >>> import idna.codec
>> >>> print('https://fuß.standcore.com/'.encode('idna')
>> )
>> b'https://fuss.standcore.com/'  # Was expecting
>> https://xn--fu-hia.standcore.com/ from discussion 🤔
>> >>> import idna
>> >>> idna.encode('https://fuß.standcore.com/
>> ')
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File
>> "/Users/carlton/Envs/django/lib/python3.10/site-packages/idna/core.py",
>> line 357, in encode
>> s = alabel(label)
>>   File
>> "/Users/carlton/Envs/django/lib/python3.10/site-packages/idna/core.py",
>> line 269, in alabel
>> check_label(label)
>>   File
>> "/Users/carlton/Envs/django/lib/python3.10/site-packages/idna/core.py",
>> line 250, in check_label
>> raise InvalidCodepoint('Codepoint {} at position {} of {} not
>> allowed'.format(_unot(cp_value), pos+1, repr(label)))
>> idna.core.InvalidCodepoint: Codepoint U+003A at position 6 of '
>> https://fuß ' not allowed
>>
>
> I was not able to get .encode('idna') to work either. I reported this
> issue https://github.com/kjd/idna/issues/128 to check why this is not
> working as expected.
>
> For the last part, idna works with labels or domains, so you would have to
> provide only the domain to the encode method:
>
> % python
> Python 3.10.7 (main, Sep  6 2022, 21:22:27) [GCC 12.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import idna
> >>> idna.encode('fuß.standcore.com ')
> b'xn--fu-hia.standcore.com'
>
> Best regards,
> Julien
>
>
>>
>> Possibly there's some objection to such a change, but I'm struggling to
>> imagine it short of concrete cases...
>>
>> Thanks!
>>
>> Kind Regards,
>>
>> Carlton
>>
>>
>>
>> On Wednesday, 7 September 2022 at 08:18:13 UTC+2 Carlton Gibson wrote:
>>
>>> Hey Julien.
>>>
>>> Thanks, OK... 📖
>>>
>>> The Python docs have it
>>> 
>>> :
>>>
>>> > If you need the IDNA 2008 standard from *RFC 5891*
>>>  and *RFC 5895*
>>> , use the third-party idna
>>> module .
>>>
>>> So the question is do we **need** the newer standard?
>>>
>>> I will have a read of the various resources here, and I'll also ask the
>>> Django Security Team if they have any thoughts.
>>>
>>> Kind Regards,
>>>
>>> Carlton
>>>
>>>
>>> On Tue, 6 Sept 2022 at 23:03, 'Julien Bernard' via Django developers
>>> (Contributions to Django itself)  wrote:
>>>
 Hi Carlton,

 IDNA 2008 made some changes in the valid or invalid IDNs and some
 differences in the ways some characters are transformed in Punycode
 compared to IDNA 2003 for multiple reasons.
 A difference that is often used as an example is the german 'ß'
 character. In IDNA 2003 it is transformed into 'ss' while it is converted
 into Punycode in IDNA 2008.
 It means that, depending on the standard that is implemented, you may
 reach totally different domains with the same IDN, which may lead to
 security issues.
 For example, the URL https://fu