from:"Kirat Singh"

[Python-Dev] Reducing memory overhead for dictionaries by removing me_hash

2006-04-22 Thread Kirat Singh

Hi, this is my first python dev post, so please forgive me if this topic has already been discussed.It seemed to me that removing me_hash from a dict entry would save 2/3 of the space used by dictionaries and also improve alignment of the entries since they'd be 8 bytes instead of 12. And sets end up having just 4 byte entries.
I'm guessing that string dicts are the most common (hence the specialized lookupdict_string routine), and since strings already contain their hash, this would probably mitigate the performance impact. One could also add a hash to Tuples since they are immutable.
If this isn't a totally stupid idea, I'd be happy to volunteer to try the experiment and run any suggested tests.thanks!-KiratPS any opinion on making _Py_StringEq a macro? inline function would be nice but I hesitate to bring up the C/C++ debate, both languages suck in their own special way ;-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reducing memory overhead for dictionaries by removing me_hash

2006-04-23 Thread Kirat Singh

Interesting, thanks for the responses. And yeah I meant 1/3, I always mix up negatives.Agree that as you point out the biggest slowdown will be on classes that define their own __hash__, however since classes use instancedicts and this would reduce the dict size from 96 -> 64 bytes, we could blow 4 bytes to cache the hash on the object.
In fact PyObject_Hash could 'intern' the result of __hash__ into a __hashvalue__ member of the class dict. This might be the best of both worlds since it'll only use space for the hashvalue if its needed.Oh and the reason I brought up strings was that one can grab the ob_shash from the stringobject in lookupdict_string to avoid even the function call to get the hash for a string, so its just the same as storing the hash on the entry for strings.
The reason I looked into this to begin with was that my code used up a bunch of memory which was traceable to lots of little objects with instance dicts, so it seemed that if instancedicts took less memory I wouldn't have to go and add __slots__ to a bunch of my classes, or rewrite things as tuples/lists, etc.
thanks!-KiratOn 4/23/06, Tim Peters <[EMAIL PROTECTED]> wrote:
[Kirat Singh]> Hi, this is my first python dev post, so please forgive me if this topic has> already been discussed.It's hard to find one that hasn't -- but it's even harder to find theold discussions ;-)
> It seemed to me that removing me_hash from a dict entry would save 2/3 of> the space1/3, right?> used by dictionaries and also improve alignment of the entries> since they'd be 8 bytes instead of 12.
How would that help?  On 32-bit boxes, we have 3 4-byte members inPyDictEntry, and they'll all 4-byte aligned.  In what respect relatedto alignment is that sub-optimal?> And sets end up having just 4 byte entries.
>> I'm guessing that string dicts are the most common (hence the specialized> lookupdict_string routine),String dicts were the only kind at first, and their speed is criticalbecause Python itself makes heavy use of them (
e.g., to implementinstance and module namespaces, and keyword arguments).> and since strings already contain their hash, this would probably mitigate> the performance impact.No slowdown in string dicts would be welcome, but since strings
already cache their own hash, they seem unaffected by this.It's the speed of other key types that would suffer, and for classesthat define their own __hash__ method that could well be deadly (seeMartin's reply for more detail).
> One could also add a hash to Tuples since they are immutable.A patch to do that was recently rejected.  You can read its commentsfor some of the reasons:
http://www.python.org/sf/1462796More reasons were given in a python-dev thread about the same thingearlier this month:
http://mail.python.org/pipermail/python-dev/2006-April/063275.html> If this isn't a totally stupid idea, I'd be happy to volunteer to try the> experiment and run any suggested tests.I'd be -1 if it slowed dict operations for classes that define their
own __hash__.  I do a lot of that ;-)> PS any opinion on making _Py_StringEq a macro?Yes:  don't bother unless it provably speeds something "important" :-) It's kinda messy for a macro otherwise, macros always make debugging
harder (can't step through the source expansion in a debugger w/o alot of pain), etc.> inline function would be nice but I hesitate to bring up the C/C++ debate, both> languages suck in their own special way ;-)
Does the Python source even compile as C++ now?  People have beenworking toward that, but my last impression was that it's not thereyet.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reducing memory overhead for dictionaries by removing me_hash

2006-04-24 Thread Kirat Singh

very true, but python makes it oh so easy to be lazy :-)On 4/24/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
On 4/23/06, "Martin v. Löwis" <[EMAIL PROTECTED]
> wrote:> Kirat Singh wrote:> > The reason I looked into this to begin with was that my code used up a> > bunch of memory which was traceable to lots of little objects with> > instance dicts, so it seemed that if instancedicts took less memory I
> > wouldn't have to go and add __slots__ to a bunch of my classes, or> > rewrite things as tuples/lists, etc.>> Ah. In that case, I would be curious if tuning PyDict_MINSIZE could> help. If you have many objects of the same type, am I right assuming
> they all have the same number of dictionary keys? If so, what is the> dictionary size? Do they use ma_smalltable, or do they have an extra> ma_table?But the space savings by using __slots__ is so much bigger! (And less
work than hacking the C code too. :-)Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Reducing memory overhead for dictionaries by removing me_hash

Re: [Python-Dev] Reducing memory overhead for dictionaries by removing me_hash

Re: [Python-Dev] Reducing memory overhead for dictionaries by removing me_hash

3 matches

Site Navigation

Mail list logo

Footer information