Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Ivan Kozik
On Fri, Jan 20, 2012 at 00:48, Victor Stinner
 wrote:
> I propose to solve the hash collision vulnerability by counting
> collisions because it does fix the vulnerability with a minor or no
> impact on applications or backward compatibility. I don't see why we
> should use a different fix for Python 3.3. If counting collisons
> solves the issue for stable versions, it is also enough for Python
> 3.3. We now know all issues of the randomized hash solution, and I
> think that there are more drawbacks than advantages. IMO the
> randomized hash is overkill to fix the hash collision issue.

I'd like to point out that an attacker is not limited to sending just
one dict full of colliding keys.  Given a 22ms stall for a dict full
of 1000 colliding keys, and 100 such objects inside a parent object
(perhaps JSON), you can stall a server for 2.2+ seconds.  Going with
the raise-at-1000 approach doesn't solve the problem for everyone.

In addition, because the raise-at-N-collisions approach raises an
exception, everyone who wants to handle this error condition properly
has to change their code to catch a previously-unexpected exception.
(I know they're usually still better off with the fix, but why force
many people to change code when you can actually fix the hashing
problem?)

Another issue is that even with a configurable limit, different
modules can't have their own limits.  One module might want a
relatively safe raise-at-100, and another module creating massive
dicts might want raise-at-1000.  How does a developer know whether
they can raise or lower the limit, given that they use a bunch of
different modules?

I actually went with this stop-at-N-collisions approach by patching my
CPython a few years ago, where I limiting dictobject and setobject's
critical `for` loop to 100 iterations (I realize this might handle
fewer than 100 collisions.)  This worked fine until I tried to compile
PyPy, where the translator blew up due to a massive dict.  This,
combined with the second problem (needing to catch an exception), led
me to abandon this approach and write Securetypes, which has a
securedict that uses SHA-1.  Not that I like this either; I think I'm
happy with the randomize-hash() approach.

Ivan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Ivan Kozik
On Fri, Jan 20, 2012 at 03:48, Guido van Rossum  wrote:
> I think that's because your collision-counting algorithm was much more
> primitive than MAL's.

Conceded.

>> This,
>> combined with the second problem (needing to catch an exception), led
>> me to abandon this approach and write Securetypes, which has a
>> securedict that uses SHA-1.  Not that I like this either; I think I'm
>> happy with the randomize-hash() approach.
>
>
> Why did you need to catch the exception? Were you not happy with the program
> simply terminating with a traceback when it got attacked?

No, I wasn't happy with termination.  I wanted to treat it just like a
JSON decoding error, and send the appropriate response.

I actually forgot to mention the main reason I abandoned the
stop-at-N-collisions approach.  I had a server with a dict that stayed
in memory, across many requests.  It was being populated with
identifiers chosen by clients.  I couldn't have my server stay broken
if this dict filled up with a bunch of colliding keys.  (I don't think
I could have done another thing either, like nuke the dict or evict
some keys.)

Ivan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com