If this is not a replay of an old message, please move the discussion to python-ideas.
On 2/20/07, Steven Elliott <[EMAIL PROTECTED]> wrote: > I'm finally getting back into this. I'd like to take one more shot at > it with a revised version of what I proposed before. > > For those of you that did not see the original thread it was about ways > that accessing builtins could be more efficient. It's a bit much to > summarize again now, but you should be able to find it in the archive > with this subject and a date of 2006-03-08. > > On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: > > Steven Elliott wrote: > > > One way of handling it is to > > > alter STORE_ATTR (op code for assigning to mod.str) to always check to > > > see if the key being assigned is one of the default builtins. If it is, > > > then the module's indexed array of builtins is assigned to. > > > > As long as you're going to all that trouble, it > > doesn't seem like it would be much harder to treat > > all global names that way, instead of just a predefined > > set. The compiler already knows all of the names that > > are used as globals in the module's code. > > What I have in mind may be close to what you are suggesting above. My > thought now is that builtins are a set of tokens that typically, but > don't necessarily, point to the same objects in all modules. Such > tokens, which I'll refer to as "global tokens", can be roughly broken > into two sets: > 1) Global tokens that typically point to the same object in all > modules. > 2) Global tokens that that are likely to point to the different > objects (or be undefined) in different modules. > Set 1) is pretty much the the builtins. "True" and "len" are likely to > point to the same objects in all modules, but not necessarily. Set 2) > might be things like "os" and "sys" which are often defined (imported) > in modules, but not necessarily. > > Access to the globals of a module, including the current module, is done > with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME). For > each of these opcodes the following snippet of code from ceval.c (for > LOAD_GLOBAL) is relevant to this discussion: > /* This is the un-inlined version of the code above */ > x = PyDict_GetItem(f->f_globals, w); > if (x == NULL) { > x = PyDict_GetItem(f->f_builtins, w); > if (x == NULL) { > load_global_error: > format_exc_check_arg( > PyExc_NameError, > GLOBAL_NAME_ERROR_MSG, w); > break; > } > } > > So, to avoid the hash table lookups above maybe the global tokens could > be assigned an index value that is fixed for any given version of the > interpreter and that is the same for all modules (that "True" is always > index 7, "len" is always index 3, etc.) > > Once a set of indexes have been determined a new opcode, that I'll call > "LOAD_GTOKEN", could be created that avoids the hash table lookup by > functioning in a way that is similar to LOAD_FAST (pull a local variable > value out of an array). For example, static references to "True" could > always be compiled to > LOAD_GTOKEN 7 (True) > > As to set 1) and set 2) that I mentioned above - there is only a need to > distinguish between the two sets if a copy-on-write mechanism is used. > That way global tokens that are likely to have their value changed > (group 2) ) can all be together in one group so that only that group > needs to be copied when one of the global tokens is written to. For > example code such as: > True = 1 > print True > would be compiled into something like: > 1 LOAD_CONST 1 (1) > STORE_GTOKEN1 7 (True) > 2 LOAD_GTOKEN1 7 (True) > PRINT_ITEM > PRINT_NEWLINE > Note that "1" has been appended to "STORE_GTOKEN" to indicate that group > 1) is being worked with. The store command will copy the array of > pointers once, the first time it is called. > > Just as a new opcode is needed for LOAD_GLOBAL one would be needed for > LOAD_ATTR. Perhaps "LOAD_ATOKEN" would work. For example: > amodule.len = my_len > print amodule.len > would be compiled into something like: > 1 LOAD_GLOBAL 0 (my_len) > LOAD_GLOBAL 1 (amodule) > STORE_ATOKEN1 3 (len) > > 2 LOAD_GLOBAL 1 (amodule) > LOAD_ATOKEN1 3 (len) > PRINT_ITEM > PRINT_NEWLINE > LOAD_CONST 0 (None) > RETURN_VALUE > > Note that it looks almost identical to the code that is currently > generated, but the oparg "3" shown for the "LOAD_ATOKEN1" above indexes > into an array (like LOAD_FAST) to get at the attribute directly whereas > the oparg that would be shown for LOAD_ATTR is an index into an array of > constants/strings which is then used to retrieve the attribute from the > module's global hash table. > > > > That's great, but I'm curious if additional gains can be > > > made be focusing just on builtins. > > > > As long as builtins can be shadowed, I can't see how > > to make any extra use of the fact that it's a builtin. > > A semantic change would be needed, such as forbidding > > shadowing of builtins, or at least forbidding this > > from outside the module. > > I now think that it best not to think of builtins as being a special > case. What really matters is that if all modules agree on a way of > indexing global tokens (possible since they are all compiled with the > same compiler) then it should be possible to avoid any hash table > lookups when those global tokens are read or written to. So > while True: > can be just as efficient as > while 1: > > Is it reasonable to require the compiler to be aware of the set of > builtins and their indexes at the time the bytecode is produced? That > is one cost to consider - a less clean separation between the compiler > and the meaning behind the code it compiles. > > One cost to consider is the memory consumed by unsused slots in the > array of global tokens for a module if that module uses a small number > of builtins, but also writes to one of them so that the array needs to > be copied. But hopefully that won't be too bad. > > -- > ----------------------------------------------------------------------- > | Steven Elliott | [EMAIL PROTECTED] | > ----------------------------------------------------------------------- > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com