On Thu, 4 Mar 2010 06:47:04 pm spir wrote: > Hello, > > In python like in most languages, I guess, objects (at least > composite ones -- I don't know about ints, for instance -- someone > knows?) are internally represented as associative arrays.
No. You can consider a Python object to be something like a C struct, or a Pascal record. The actual structure of the object depends on the type of the object (ints, strings, lists, etc will be slightly different). See Dave Angel's post for a good description. And of course this is implementation dependent: CPython, being written in C, uses C structs to implement objects. Other Pythons, written in other languages, will use whatever data structure makes sense for their language. That's almost certainly going to be a struct-like structure, since that is a pretty fundamental data type, but it could be different. If you want to know exactly how objects are represented internally by Python, I'm afraid you will probably need to read the source code. But a good start is here: http://effbot.org/zone/python-objects.htm > Python > associative arrays are dicts, which in turn are implemented as hash > tables. Correct? Does this mean that the associative arrays > representing objects are implemented like python dicts, thus hash > tables? "Associate array" is a generic term for a data structure that maps keys to items. There are lots of ways of implementing such an associative array: at least two different sorts of hash tables, about a dozen different types of binary trees, and so on. In Python, associate arrays are called "dicts", and they are implemented in CPython as hash tables with chaining. But objects are not hash tables. *Some* objects INCLUDE a hash table as one of its fields, but not all. For example: >>> int.__dict__ <dictproxy object at 0xb7f09674> >>> (2).__dict__ Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'int' object has no attribute '__dict__' >>> class C: pass ... >>> C.__dict__ {'__module__': '__main__', '__doc__': None} That is why you can't add attributes to arbitrary built-in objects: >>> {}.x = None Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'dict' object has no attribute 'x' Because the dict instance has no __dict__, there is nowhere to insert the attribute. > I was wondering about the question because I guess the constraints > are quite different: * Dict keys are of any type, including > heterogeneous (mixed). Object keys are names, ie a subset of strings. CPython's implementation of hash tables is highly optimized for speed. The biggest bottleneck is the hash function, and that is tuned to be extremely efficient for strings and ints. [...] > So, I guess the best implementations for objects and dicts may be > quite different. i wonder about alternatives for objects, in > particuliar trie and variants: http://en.wikipedia.org/wiki/Trie, > because they are specialised for associative arrays which keys are > strings. *shrug* Maybe, maybe not. Tries are a more complicated data structure, which means bigger code and more bugs. They don't fit in very well with CPython's memory management system. And they use a large number of pointers, which can be wasteful. E.g. a trie needs six pointers just to represent the single key "python": '' -> 'p' -> 'y' -> 't' -> 'h' -> 'o' -> 'n' while a hash table uses just one: -> 'python' -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor