Michael Foord wrote:
Hello all,

A paper (well, presentation) has been published highlighting security problems 
with the hashing algorithm (exploiting collisions) in many programming 
languages Python included:

         
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

Although it's a security issue I'm posting it here because it is now public and 
seems important.

The issue they report can cause (for example) handling an http post to consume 
horrible amounts of cpu. For Python the figures they quoted:

        reasonable-sized attack strings only for 32 bits Plone has max. POST 
size of 1 MB
        7 minutes of CPU usage for a 1 MB request
        ~20 kbits/s → keep one Core Duo core busy

This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
The same vulnerability was reported against various languages and web 
frameworks, and is already fixed in some of them.

Their recommended fix is to randomize the hash function.


The attack relies on being able to predict the hash value for a given string. Randomising the string hash function is quite straightforward.
There is no need to change the dictionary code.

A possible (*untested*) patch is attached. I'll leave it for those more familiar with unicodeobject.c to do properly.

Cheers,
Mark
diff -r f1298f4ec638 Objects/unicodeobject.c
--- a/Objects/unicodeobject.c	Wed Dec 28 14:59:40 2011 +0000
+++ b/Objects/unicodeobject.c	Thu Dec 29 11:11:09 2011 +0000
@@ -41,6 +41,7 @@
 #define PY_SSIZE_T_CLEAN
 #include "Python.h"
 #include "ucnhash.h"
+#include <time.h>               /* for seeding of hash value */
 
 #ifdef MS_WINDOWS
 #include <windows.h>
@@ -184,6 +185,9 @@
 /* Single character Unicode strings in the Latin-1 range are being
    shared as well. */
 static PyObject *unicode_latin1[256];
+
+/* Base hash value (hash of empty string) */
+static Py_uhash_t base_hash;
 
 /* Fast detection of the most frequent whitespace characters */
 const unsigned char _Py_ascii_whitespace[] = {
@@ -11107,7 +11111,7 @@
 
     /* The hash function as a macro, gets expanded three times below. */
 #define HASH(P) \
-    x = (Py_uhash_t)*P << 7; \
+    x = base_hash + (Py_uhash_t)*P << 7; \
     while (--len >= 0) \
         x = (1000003*x) ^ (Py_uhash_t)*P++;
 
@@ -13869,6 +13873,14 @@
 int _PyUnicode_Init(void)
 {
     int i;
+    time_t now;
+
+    time(&now);
+    base_hash = (Py_uhash_t)now;
+    /* Avoid overly large numbers to reduce the chance of small strings
+     * hashing to -1 */
+    if (base_hash == (Py_uhash_t)-1)
+        base_hash = (Py_uhash_t)-2;
 
     /* XXX - move this array to unicodectype.c ? */
     Py_UCS2 linebreak[] = {
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to