[Python-Dev] py2.7: dictobject not properly resizing
I was taking a look at dictobject.c and realized that the logic controlling whether a resizedict will occur in dict_set_item_by_hash_or_entry disallows for the shrinking of a dictionary. This is contrary to what the comments directly above say: (http://hg.python.org/cpython/file/f3032825f637/Objects/dictobject.c#l771) 771 /* If we added a key, we can safely resize. Otherwise just return! 772 * If fill >= 2/3 size, adjust size. Normally, this doubles or 773 * quaduples the size, but it's also possible for the dict to shrink 774 * (if ma_fill is much larger than ma_used, meaning a lot of dict 775 * keys have been * deleted). The "bug" occures in the following conditional since we exit out of the function without checking the relative magnitudes of ma_filled to ma_used. Instead, we only check if we still have a correct loading factor (and the "don't resize on modification" bit). This can be fixed by changing the following conditional on line 785 to: if (mp->ma_used <= n_used || (mp->ma_fill*3 < (mp->ma_mask+1)*2 && mp->ma_used*5 > mp->ma_fill)) The factor of 5 was chosen arbitrarily... I'm sure with some benchmarking we could tune it to an optimal value for the internal use of dictionaries. However, before I put this effort in I was wondering if this is in fact desired behavior or if it is indeed a bug. At the very least, the comments should be updated to reflect the actual resizing dynamics of the dictionary. Micha - http://micha.gd/ http://github.com/mynameisfiber/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] py2.7: dictobject not properly resizing
True, but your example mechanism of getting a shrink event is purely based on ma_fill. This is happening because your last loop is increasing ma_fill to the point where it thinks it needs to resize because of the load factor and then it calculates the new size based on ma_used. The comment that I pointed out from the code seems to imply that simply having ma_fill >> ma_used will trigger a resize. The two conditions for a resize are definitely not equivalent! Micha - http://micha.gd/ http://github.com/mynameisfiber/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] py2.7: dictobject not properly resizing
So I did a bit of benchmarking and attached is the code I used. With downsizing happening when ma_used * 2 <= ma_filled, or the following for the condition in question: if (mp->ma_used <= n_used || (mp->ma_fill*3 < (mp->ma_mask+1)*2 && mp->ma_used*2 > mp->ma_fill)) I see marginally faster performance with downsizing. I chose a factor of 2x because it will ensure downsizings before the ma_fill load factor comes into play and will also not cause a resize on the next insert. Using the vanilla python2.7.3 code, there are never any resize events where the new size is smaller than the old size. Cheers, Micha - http://micha.gd/ http://github.com/mynameisfiber/ test.py Description: Binary data ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com