[Python-Dev] py2.7: dictobject not properly resizing

2013-03-30 Thread Micha Gorelick
I was taking a look at dictobject.c and realized that the logic
controlling whether a resizedict will occur in
dict_set_item_by_hash_or_entry disallows for the shrinking of a
dictionary.  This is contrary to what the comments directly above say:

(http://hg.python.org/cpython/file/f3032825f637/Objects/dictobject.c#l771)
771 /* If we added a key, we can safely resize. Otherwise just return!
772 * If fill >= 2/3 size, adjust size. Normally, this doubles or
773 * quaduples the size, but it's also possible for the dict to shrink
774 * (if ma_fill is much larger than ma_used, meaning a lot of dict
775 * keys have been * deleted).

The "bug" occures in the following conditional since we exit out of
the function without checking the relative magnitudes of ma_filled to
ma_used. Instead, we only check if we still have a correct loading
factor (and the "don't resize on modification" bit).

This can be fixed by changing the following conditional on line 785 to:

if (mp->ma_used <= n_used || (mp->ma_fill*3 < (mp->ma_mask+1)*2 &&
mp->ma_used*5 > mp->ma_fill))

The factor of 5 was chosen arbitrarily... I'm sure with some
benchmarking we could tune it to an optimal value for the internal use
of dictionaries.  However, before I put this effort in I was wondering if
this is in fact desired behavior or if it is indeed a bug.  At the
very least, the comments should be updated to reflect the actual
resizing dynamics of the dictionary.

Micha
-
http://micha.gd/
http://github.com/mynameisfiber/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] py2.7: dictobject not properly resizing

2013-03-30 Thread Micha Gorelick
True, but your example mechanism of getting a shrink event is purely
based on ma_fill.  This is happening because your last loop is
increasing ma_fill to the point where it thinks it needs to resize
because of the load factor and then it calculates the new size based
on ma_used.  The comment that I pointed out from the code seems to
imply that simply having ma_fill >> ma_used will trigger a resize.
The two conditions for a resize are definitely not equivalent!

Micha
-
http://micha.gd/
http://github.com/mynameisfiber/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] py2.7: dictobject not properly resizing

2013-03-31 Thread Micha Gorelick
So I did a bit of benchmarking and attached is the code I used.  With
downsizing happening when ma_used * 2 <= ma_filled, or the following
for the condition in question:

if (mp->ma_used <= n_used || (mp->ma_fill*3 < (mp->ma_mask+1)*2 &&
mp->ma_used*2 > mp->ma_fill))

I see marginally faster performance with downsizing.  I chose a factor
of 2x because it will ensure downsizings before the ma_fill load
factor comes into play and will also not cause a resize on the next
insert.  Using the vanilla python2.7.3 code, there are never any
resize events where the new size is smaller than the old size.

Cheers,
Micha
-
http://micha.gd/
http://github.com/mynameisfiber/


test.py
Description: Binary data
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com