[Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.
I would like to participate in Google Summer of Code. The idea consists in creation of a specialized class for work with binary trees AVL and RB. The part of ideas is taken from Parrot (Perl6) where for pair values the specialized type is stipulated. As advantages it is possible to note: * High speed. On simple types at quantity of elements up to 1 speed of work concedes to the standard dictionary of all in 2.7 times. * Safety of work in a multithread operating mode. * The method for frosts of object is stipulated. * The data storage is carried out in the sorted kind. * A high overall performance `item' and `iteritem' methods as it is not necessary to create new objects. * At lots of objects in a tree memory is used much less intensively. * Absence of collisions. As consequence, it is impossible to generate bad data for creation DoS of the attacks influencing the dictionary of transferred arguments. * For objects existence of a method __ hash __ is not necessary. * The same kind of the dictionary by means of the overloaded operations probably change of its behaviour so that it supported the multivariational representation based on stratification given and subsequent recoils. Additional requirements to key objects: * The opportunity of comparison of key objects function cmp is necessary. There are the reasons, braking development of this project: * Compulsion of heading of a module `gc' for each compound object (this structure is unessential to some objects, but updating `gc' the module is required). * Absence of standard object the pair, probably depends on the previous item. Lacks of a binary tree: * Average time of search for hash - O (1), and for trees - O (log2 N). * A lot of memory under a single element of a tree: (3*sizeof (void *) + sizeof (int))*2, one element is used rather - the pair, the second site of memory is allocated under node of a tree. In protection of object "pair": * The logic of methods of the given object noticeably is easier than tuple, that as a result can affect speed of work of the program in the best party. * Alignment on 8 bytes has no big sense at present architecture where in cache sample a minimum on 64 bytes is made. Use of type "pair" will give an appreciable prize at use of alignment in 4 bytes. Otherwise on 64-bit platforms it is much more favourable to use tuple object. The given project can demand processing of the module `gcmodule.c' and `tupleobject.c'. It is necessary to reduce the size of static objects, for this purpose the opportunity is necessary is transparent to pass objects not having direct support from the module `gcmodule.c'. Also it will be partially necessary to process the module `obmalloc.c' for more effective distribution of memory. I shall be glad to answer questions on this theme. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.
I would like to participate in Google Summer of Code. The idea consists in creation of a specialized class for work with binary trees AVL and RB. The part of ideas is taken from Parrot (Perl6) where for pair values the specialized type is stipulated. As advantages it is possible to note: * High speed. On simple types at quantity of elements up to 1 speed of work concedes to the standard dictionary of all in 2.7 times. * Safety of work in a multithread operating mode. * The method for frosts of object is stipulated. * The data storage is carried out in the sorted kind. * A high overall performance `item' and `iteritem' methods as it is not necessary to create new objects. * At lots of objects in a tree memory is used much less intensively. * Absence of collisions. As consequence, it is impossible to generate bad data for creation DoS of the attacks influencing the dictionary of transferred arguments. * For objects existence of a method __ hash __ is not necessary. * The same kind of the dictionary by means of the overloaded operations probably change of its behaviour so that it supported the multivariational representation based on stratification given and subsequent recoils. Additional requirements to key objects: * The opportunity of comparison of key objects function cmp is necessary. There are the reasons, braking development of this project: * Compulsion of heading of a module `gc' for each compound object (this structure is unessential to some objects, but updating `gc' the module is required). * Absence of standard object the pair, probably depends on the previous item. Lacks of a binary tree: * Average time of search for hash - O (1), and for trees - O (log2 N). * A lot of memory under a single element of a tree: (3*sizeof (void *) + sizeof (int))*2, one element is used rather - the pair, the second site of memory is allocated under node of a tree. In protection of object "pair": * The logic of methods of the given object noticeably is easier than tuple, that as a result can affect speed of work of the program in the best party. * Alignment on 8 bytes has no big sense at present architecture where in cache sample a minimum on 64 bytes is made. Use of type "pair" will give an appreciable prize at use of alignment in 4 bytes. Otherwise on 64-bit platforms it is much more favourable to use tuple object. The given project can demand processing of the module `gcmodule.c' and `tupleobject.c'. It is necessary to reduce the size of static objects, for this purpose the opportunity is necessary is transparent to pass objects not having direct support from the module `gcmodule.c'. Also it will be partially necessary to process the module `obmalloc.c' for more effective distribution of memory. I shall be glad to answer questions on this theme. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.
Josiah Carlson wrote: There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of *needing* the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1 I would suggest you look through some suggested SoC projects in the wiki: http://wiki.python.org/moin/SummerOfCode - Josiah Thanks for the answer! I already saw pyavl-1.1. But for this reason I would like to see the module in a standard package python. Functionality for pyavl and dict to compare difficultly. Functionality of my module will differ from functionality dict in the best party. I have lead some tests on for work with different types both for a package pyavl-1.1, and for the prototype of own module. The script of check is resulted in attached a file avl-timeit.py In files timeit-result-*-*.txt results of this test. The first in the name of a file means quantity of the added elements, the second - argument of a method timeit. There it is visible, that in spite of the fact that the module xtree is more combined in comparison with pyavl the module (for everyone again inserted pair [the key, value], is created two elements: python object - pair, and an internal element of a tree), even in this case it works a little bit more quickly. Besides the module pyavl is unstable for work in multithread appendices (look the attached file module-avl-is-thread-unsafe.py). I think, that creation of this type (together with type of pair), will make programming more convenient since sorting by function sort will be required less often. I can probably borrow in this module beyond the framework of the project google. The demand of such type of data is interesting. Because of necessity of processing `gcmodule.c' and `obmalloc.c' this module cannot be realized as the external module. #!/usr/local/bin/python # Initialize modules import avl import xtree import types import sys import timeit if len(sys.argv) != 3: iterations = 100 count = 10 else: iterations = int(sys.argv[1]) count = int(sys.argv[2]) print "Iterations", iterations print "Repeats", count print result_avl = timeit.Timer(""" import avl a = avl.new() for i in xrange(%d): a.insert(i) """ % iterations).timeit(count) print "object avl.new()", result_avl result_xtree = timeit.Timer(""" import xtree a = xtree.xtree() for i in xrange(%d): a[i] = True """ % iterations).timeit(count) print "object xtree.xtree()", result_xtree result_dict = timeit.Timer(""" import types a = dict() for i in xrange(%d): a[i] = True """ % iterations).timeit(count) print "object dict()", result_dict print " Dict Xtree AVL" print "Dict %.2f %.2f %.2f" % (float(result_dict)/result_dict, float(result_dict)/result_xtree, float(result_dict)/result_avl) print "Xtree %.2f %.2f %.2f" % (float(result_xtree)/result_dict, float(result_xtree)/result_xtree, float(result_xtree)/result_avl) print "AVL %.2f %.2f %.2f" % (float(result_avl)/result_dict, float(result_avl)/result_xtree, float(result_avl)/result_avl) fox:root!~/Downloads/Python/PYTHON/pyavl-1.1 > python setup.py install running install running build running build_ext building 'avl' extension creating build creating build/temp.freebsd-5.4-RELEASE-i386-2.4 cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t -DTHREAD_STACK_SIZE=0x2 -O2 -fPIC -DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c avl.c -o build/temp.freebsd-5.4 -RELEASE-i386-2.4/avl.o -O2 -Wno-parentheses -Wno-uninitialized cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t -DTHREAD_STACK_SIZE=0x2 -O2 -fPIC -DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c avlmodule.c -o build/temp.freeb sd-5.4-RELEASE-i386-2.4/avlmodule.o -O2 -Wno-parentheses -Wno-uninitialized creating build/lib.freebsd-5.4-RELEASE-i386-2.4 cc -shared -pthread -O2 build/temp.freebsd-5.4-RELEASE-i386-2.4/avl.o build/temp.freebsd-5.4-RELEASE -i386-2.4/avlmodule.o -o build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -Wl,-x running install_lib copying build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -> /usr/local/lib/python2.4/site-packages ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.
Josiah Carlson wrote: There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of *needing* the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1 I would suggest you look through some suggested SoC projects in the wiki: http://wiki.python.org/moin/SummerOfCode - Josiah Thanks for the answer! I already saw pyavl-1.1. But for this reason I would like to see the module in a standard package python. Functionality for pyavl and dict to compare difficultly. Functionality of my module will differ from functionality dict in the best party. I have lead some tests on for work with different types both for a package pyavl-1.1, and for the prototype of own module. The script of check is resulted in attached a file avl-timeit.py In files timeit-result-*-*.txt results of this test. The first in the name of a file means quantity of the added elements, the second - argument of a method timeit. There it is visible, that in spite of the fact that the module xtree is more combined in comparison with pyavl the module (for everyone again inserted pair [the key, value], is created two elements: python object - pair, and an internal element of a tree), even in this case it works a little bit more quickly. Besides the module pyavl is unstable for work in multithread appendices (look the attached file module-avl-is-thread-unsafe.py). I think, that creation of this type (together with type of pair), will make programming more convenient since sorting by function sort will be required less often. I can probably borrow in this module beyond the framework of the project google. The demand of such type of data is interesting. Because of necessity of processing `gcmodule.c' and `obmalloc.c' this module cannot be realized as the external module. #!/usr/local/bin/python # Initialize modules import avl import xtree import types import sys import timeit if len(sys.argv) != 3: iterations = 100 count = 10 else: iterations = int(sys.argv[1]) count = int(sys.argv[2]) print "Iterations", iterations print "Repeats", count print result_avl = timeit.Timer(""" import avl a = avl.new() for i in xrange(%d): a.insert(i) """ % iterations).timeit(count) print "object avl.new()", result_avl result_xtree = timeit.Timer(""" import xtree a = xtree.xtree() for i in xrange(%d): a[i] = True """ % iterations).timeit(count) print "object xtree.xtree()", result_xtree result_dict = timeit.Timer(""" import types a = dict() for i in xrange(%d): a[i] = True """ % iterations).timeit(count) print "object dict()", result_dict print " Dict Xtree AVL" print "Dict %.2f %.2f %.2f" % (float(result_dict)/result_dict, float(result_dict)/result_xtree, float(result_dict)/result_avl) print "Xtree %.2f %.2f %.2f" % (float(result_xtree)/result_dict, float(result_xtree)/result_xtree, float(result_xtree)/result_avl) print "AVL %.2f %.2f %.2f" % (float(result_avl)/result_dict, float(result_avl)/result_xtree, float(result_avl)/result_avl) fox:root!~/Downloads/Python/PYTHON/pyavl-1.1 > python setup.py install running install running build running build_ext building 'avl' extension creating build creating build/temp.freebsd-5.4-RELEASE-i386-2.4 cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t -DTHREAD_STACK_SIZE=0x2 -O2 -fPIC -DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c avl.c -o build/temp.freebsd-5.4 -RELEASE-i386-2.4/avl.o -O2 -Wno-parentheses -Wno-uninitialized cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t -DTHREAD_STACK_SIZE=0x2 -O2 -fPIC -DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c avlmodule.c -o build/temp.freeb sd-5.4-RELEASE-i386-2.4/avlmodule.o -O2 -Wno-parentheses -Wno-uninitialized creating build/lib.freebsd-5.4-RELEASE-i386-2.4 cc -shared -pthread -O2 build/temp.freebsd-5.4-RELEASE-i386-2.4/avl.o build/temp.freebsd-5.4-RELEASE -i386-2.4/avlmodule.o -o build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -Wl,-x running install_lib copying build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -> /usr/local/lib/python2.4/site-packages #!/usr/local/bin/python import avl import time a = avl.new() crash = 61215 # may be any in range 0 .. 8 a.insert(crash) a.insert(crash + 1) b = iter(a) b.next() for i in xrange(10): a.insert(i) a.remove(crash) k=[] for i in xrange(100): # fill memory with padding for 32-bit machine k.append(pow(65536, 2)) # fill memory with padding for 64-bit machine k.append(pow(65536, 10)) for i in b: print i Iterations 30 Repeats 100 object avl.new() 44.3226678371 object xtree.xtree() 30.8406031132 object dict() 12.9507939816 Dict Xtree AVL Dict 1.00 0.42 0.29 Xtre
Re: [Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.
Hye-Shik Chang wrote: > On 4/25/06, Vladimir 'Yu' Stepanov <[EMAIL PROTECTED]> wrote: > >> Thanks for the answer, but after application of a patch on python-2.4.3 >> I could not compile it. A conclusion of a stage of compilation in the >> attached file. >> >> > > Aah. The patch is for Python in Subversion trunk. I think you can > add the following line somewhere in collectionsmodule.c to avoid > to error. > > #define Py_ssize_t int > > Thank you! > > Hye-Shik > > Thanks! I have compared realizations. The object collections.rbtree works very well. Why it still extends in the form of a patch? Speed of work can be compared to speed of work of the standard dictionary. Here comparative results of speed of work. My module concedes in speed of work of that realization without the index on a parental element and allocation is used goes two objects on each element. There is an offer on improvement of its work. To add new exceptions for indication of the fact of blocking, for example: Exception |__ StandartError |__ LockError |__ FreezeError |__ RDLockError |__ WRLockError That speed of work was is high enough for check on each kind of blocking it is necessary to check only one identifier. For example these macro: - void _lock_except(int lockcnt) { if (lockcnt > 0) { if ((lockcnt&1) == 0) PyErr_SetNone(PyExc_RDLockError); else PyErr_SetNone(PyExc_FreezeLockError); } else PyErr_SetNone(PyExc_WRLockError); } #define wrlock_SET(x,r) if ((x)->ob_lockcnt != 0) { \ _lock_except((x)->ob_lockcnt); \ return r; \ } \ (x)->ob_lockcnt = -1 #define wrlock_UNSET(x) (x)->ob_lockcnt = 0 #define rdlock_SET(x,r) if ((x)->ob_lockcnt < 0) { \ _lock_except((x)->ob_lockcnt); \ return r; \ } \ (x)->ob_lockcnt += 2 #define rdlock_UNSET(x) (x)->ob_lockcnt -= 2 #define freezelock_SET(x, r) if ((x)->ob_lockcnt < 0) { \ _lock_except((x)->ob_lockcnt); \ return r; \ } \ (x)->ob_lockcnt |= 1; - In object the counter in the form of an integer with a sign should be certain - ob_lockcnt: - struct newobject { PyObject_HEAD // or PyObject_HEAD_VAR ... int ob_lockcnt; ... }; - And by any call depending on if the method only reads data of object, the sequence rdlock_* should be used. Example: - PyObject * method_for_read_something(PyObject *ob) { rdlock_SET(ob, NULL); ... do something ... if (failure) goto failed; ... do some else ... rdlock_UNSET(ob); Py_INCREF(Py_None); return Py_None; failed: rdlock_UNSET(ob); return NULL; } - If the given method can make any critical changes that the sequence wrlock_* should be used. Example: - PyObject * method_for_some_change(PyObject *ob) { wrlock_SET(ob, NULL); ... do something ... if (failure) goto failed; ... do some else ... wrlock_UNSET(ob); Py_INCREF(Py_None); return Py_None; failed: wrlock_UNSET(ob); return NULL; } - If the object needs to be frozen, it is equivalent to installation of constant blocking on reading. Thus it is possible to establish blocking on already readable object. For the open iterators it is necessary to carry out opening of blocking on reading. Closing can be carried out or on end of work of iterator, or on its closing: - PyObject * map_object_keys(PyObject *ob) { rdlock_SET(ob, NULL); ... allocate iterator and initialize ... } map_object_keys_iter_destroy(PyObject *ob) { ... destroy data ... rdlock_UNSET(ob); } -
Re: [Python-Dev] binary trees. Review obmalloc.c
Josiah Carlson wrote: > "Vladimir 'Yu' Stepanov" <[EMAIL PROTECTED]> wrote: > >> Josiah Carlson wrote: >> >>> There exists various C and Python implementations of both AVL and >>> Red-Black trees. For users of Python who want to use AVL and/or >>> Red-Black trees, I would urge them to use the Python implementations. >>> In the case of *needing* the speed of a C extension, there already >>> exists a CPython extension module for AVL trees: >>> http://www.python.org/pypi/pyavl/1.1 >>> >>> I would suggest you look through some suggested SoC projects in the >>> wiki: >>> http://wiki.python.org/moin/SummerOfCode >>> >>> - Josiah >>> >>> >>> >> Thanks for the answer! >> >> I already saw pyavl-1.1. But for this reason I would like to see the module >> in a standard package python. Functionality for pyavl and dict to compare >> difficultly. Functionality of my module will differ from functionality dict >> in the best party. I have lead some tests on for work with different types >> both for a package pyavl-1.1, and for the prototype of own module. The >> script >> of check is resulted in attached a file avl-timeit.py In files >> timeit-result-*-*.txt results of this test. The first in the name of a file >> means quantity of the added elements, the second - argument of a method >> timeit. There it is visible, that in spite of the fact that the module >> xtree >> is more combined in comparison with pyavl the module (for everyone again >> inserted pair [the key, value], is created two elements: python object - >> pair, >> and an internal element of a tree), even in this case it works a little bit >> more quickly. Besides the module pyavl is unstable for work in multithread >> appendices (look the attached file module-avl-is-thread-unsafe.py). >> > > I'm not concerned about the speed of the external AVL module, and I'm > not concerned about the speed of trees in Python; so far people have > found that dictionaries are generally sufficient. More specifically, > while new lambda syntaxes are presented almost weekly, I haven't heard > anyone complain about Python's lack of a tree module in months. As a > result, I don't find the possible addition of any tree structure to the > collections module as being a generally compelling addition. > The object dict is irreplaceable in most cases, it so. I do not assume, that binary trees can sometime replace standard `dict' object. Sorting of the list also is and will be irreplaceable function. But there is a number of problems which can be rather easily solved by means of binary trees. Most likely for them there will be an alternative. But, as a rule, it is less obvious to the programmer. And most likely realization will be not so fast. Probably that nobody mentioned necessity of binary trees. In my opinion it is not a reason in an estimation of necessity of the given type. To take for example PHP. The array and the dictionary there is realized as the list. There is no even a hint on hash object. Those users to whom was necessary hash already have chosen perl/python/ruby/etc. The similar situation can arise and concerning binary trees. > Again, I believe that the existance of 3rd party extension modules which > implement AVL and red-black trees, regardless of their lack of thread > safety, slower than your module by a constant, or lack of particular > functionality to be basically immaterial to this discussion. In my mind, > there are only two relevant items to this discussion: > > 1. Is having a tree implementation (AVL or Red-Black) important, and if > so, is it important enough to include in the collections module? > 2. Is a tree implementation important enough for google to pay for its > inclusion, given that there exists pyavl and a collectionsmodule.c patch > for a red-black tree implementation? > > Then again, you already have your own implementation of a tree module, > and it seems as though you would like to be paid for this already-done > work. I don't know how google feels about such things, but you should > remember to include this information in your application. > > I have looked a patch for collectionsmodule.c. In my opinion it is fine realization of binary trees. If there is this class that to me there is no need to create one more copy. My persistence speaks a demand of this type of data. As to my message on the beginning of the project for SoC I ask to consider that this theme has not been lifted :) For me it was a minor question. >> I think, that creation of this type (togeth
Re: [Python-Dev] binary trees. Review obmalloc.c
Tim Peters wrote: [Vladimir 'Yu' Stepanov] * To adapt allocation of blocks of memory with other alignment. Now alignment is rigidly set on 8 bytes. As a variant, it is possible to use alignment on 4 bytes. And this value can be set at start of the interpreter through arguments/variable environments/etc. At testing with alignment on 4 or 8 bytes difference in speed of work was not appreciable. [Martin v. Löwis] That depends on the hardware you use, of course. Some architectures absolutely cannot stand mis-aligned accesses, and programs will just crash if they try to perform such accesses. Note that we _had_ to add a goofy "long double" to the PyGC_Head union because some Python platform required 8-byte alignment for some types ... see rev 25454. Any spelling of malloc() also needs to return memory aligned for any legitimate purpose, and 8-byte alignment is the strictest requirement we know of across current Python platforms. It is possible to receive the reference about these tests? I have lead some tests, which very small difference of speed of work at alignment on 4 and 8 byte (a maximum of 8%, and in the basic less than one percent). In tests consecutive search of elements in one piece of memory was used. I understand, that it is necessary to lead still the test with a casual choice of addresses to minimize optimization of work cache the processor. And certainly not all platforms will show the same result (I shall try to not forget to attach a source code of the test and its result of work :) ). On the other hand reduction of a memory size by each separate copy of object is capable to increase speed by the same percent that is lost at change of displacement about 8 bytes up to 4 :) Besides begins to refuse probably superstructures on allocation of memory at some objects, since int. So Python should err on the safe side, and only use a smaller alignment when it is known not to hurt. OTOH, I don't see the *advantage* in reducing the alignment. It could cut wasted bytes. There is no per-object memory overhead in a release-build obmalloc: the allocatable part of a pool is entirely used for user-visible object memory, except when the alignment requirement ends up forcing unused (by both the user and by obmalloc) pad bytes. For example, Python ints consume 12 bytes on 32-bit boxes, but if space were allocated for them by obmalloc (it's not), obmalloc would have to use 16 bytes per int to preserve 8-byte alignment. Good note. OTOH, obmalloc (unlike PyGC_Head) has always used 8-byte alignment, because that's what worked best for Vladimir Marangozov during his extensive timing tests. It's not an isolated decision -- e.g., it's also influenced by, and influences, "the best" pool size, and (of course) cutting alignment to 4 would double the number of "size classes" obmalloc has to juggle. Yes, the maximum quantity of buffers will really be doubled. But it should not to affect as that productivity of system, or on total amount of consumed memory. Change of a fragmentation of memory to count it is impossible, since it will depend on the concrete application. * To expand the maximal size which can be allocated by means of the given module. Now the maximal size is 256 bytes. Right. This is somewhat deliberate, though; the expectation is that fragmentation will increase dramatically if even larger size classes are supported. It's entirely deliberate ;-) obmalloc is no way trying to be a general-purpose allocator. It was designed specifically for the common memory patterns in Python, aiming at large numbers of small objects of the same size. That's extremely common in Python, and allocations larger than 256 bytes are not. The maximum size was actually 64 bytes at first.After that, dicts grew an embedded 8-element table, which vastly increased the size of the dict struct. obmalloc's limit was boosted to 256 then, although it could have stopped at the size of a dict (in the rough ballpark of 150 bytes). There was no reason (then or now) to go beyond 256. See below. * At the size of the allocated memory close to maximal, the allocation of blocks becomes inefficient. For example, for the sizesof blocks 248 and 256 (blocksize), values of quantity of elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be possible to use for the sizes of the block 248 same page, as for the size of the block 256. Good idea; that shouldn't be too difficult to implement: for sizes above 215, pools need to be spaced apart only at 16 bytes. I'd rather drop the limit to 215 then <0.3 wink>. Allocations that large probably still aren't important to obmalloc, but it is important that finding a requested allocation's "size index" be as cheap as possible. Uniformity helps that. An available module on allocation of memory really does not approach for overall aims. And speed of work
Re: [Python-Dev] binary trees. Review obmalloc.c
Josiah Carlson wrote: "Vladimir 'Yu' Stepanov" <[EMAIL PROTECTED]> wrote: Comparison of functions of sorting and binary trees not absolutely correctly. I think that function sort will lose considerably on greater lists. Especially after an insert or removal of all one element. Generally speaking, people who understand at least some of Python's internals (list internals specifically), will not be *removing* entries from lists one at a time (at least not using list.pop(0) ), because that is slow. If they need to remove items one at a time from the smallest to the largest, they will usually use list.reverse(), then repeatedly list.pop(), as this is quite fast (in general). Yes. I understood it when resulted a set example. However, as I just said, people usually don't remove items from just-sorted lists, they tend to iterate over them via 'for i in list:' . Such problem arises at creation of the list of timers. And this list is in constant change: addition/removal of elements in the list. collections.deque here does not approach, as if addition in the big list is made or search of the nearest value on the average it is necessary to lead quantity of checks N/2 is made. For a binary tree the quantity of necessary checks on former is equal log2 (N). Other variant of use of binary trees: search of concurrence to ranges. Such ranges can be blocks IP of addresses. Also this kind of the dictionary can be used for a fast finding, whether the given point enters into one of pieces. These problems can be realized by means of binary search. For binary search the same lacks, as for above resulted example are characteristic: the insert and removal for lists are carried out slowly and after an insert sorting of the list is required. Except for that function of binary search is absent in standard package Python. It is possible to write its analogue on pure Python, but it will be more than twice more slowly. - Josiah As an aside, I have personally implemented trees a few times for different reasons. One of the issues I have with most tree implementations is that it is generally difficult to do things like "give me the kth smallest/largest item". Of course the standard solution is what is generally called a "partial order" or "order statistic" tree, but then you end up with yet another field in your structure. One nice thing about Red-Black trees combined with order-statistic trees, is that you can usually use the uppermost bit of the order statistics for red/black information. Of course, this is really only interesting from an "implement this in C" perspective; because if one were implementing it in Python, one may as well be really lazy and not bother implementing guaranteed balanced trees and be satisfied with randomized-balancing via Treaps. Here that I have found through Google on a line " order statistic binary tree ": http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/ I have understood, that similar I have already made something :). The result of the test shows on the most bad result, but is certainly worse, than a usual tree. Badly that up to the end I have not tested this realization. Percent on 90 I am assured that it is efficient. There is a question. What API should be used? Therefore as if to address to object, as to MAP to object __getitem__ will return one result. And if as to the list - another. Here the list of already existing attributes and methods: class xtree(__builtin__.object) | X-Tree. Binary tree with AVL balance mechanism. | | Methods defined here: | | __contains__(...) | x.__contains__(y) <==> y in x | | __delitem__(...) | x.__delitem__(y) <==> del x[y] | | __getitem__(...) | x.__getitem__(y) <==> x[y] | | __iter__(...) | x.__iter__() <==> iter(x) | | __len__(...) | x.__len__() <==> len(x) | | __repr__(...) | x.__repr__() <==> repr(x) | | __setitem__(...) | x.__setitem__(i, y) <==> x[i]=y | | append(...) | D.append((k: v)) -> None, append new pair element into tree with sorting. | Return pair element if argument is exist. | | clear(...) | D.clear() -> None. Remove all items from D. | | copy(...) | D.copy() -> a shallow copy of D. | | get(...) | D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None. | | has_key(...) | D.has_key(k) -> True if D has a key k, else False. | | items(...) | D.items() -> list of D's (key, value) pairs. | | iteritems(...) | D.iteritems() -> an iterator over the (key, value) items of D. | | iterkeys(...) | D.iterkeys() -> an iterator over the keys of D. |
Re: [Python-Dev] binary trees.
Josiah Carlson wrote: > "Vladimir 'Yu' Stepanov" <[EMAIL PROTECTED]> wrote: >> Josiah Carlson wrote: >>> However, as I just said, people usually don't remove items from >>> just-sorted lists, they tend to iterate over them via 'for i in list:' . >>> >> Such problem arises at creation of the list of timers. > > I've never seen that particular use-case. > > Please understand something. I believe that trees can be useful in some > cases. However, I don't believe that they are generally useful > enough in Python, given that we already have key,value dictionaries and > sortable lists. They become even less worthwhile in Python, given the > total ordering problem that I describe later in this message. I tiresome. It so :) Sorry for that. >> Here that I have found through Google on a line " order statistic binary >> tree ": >> >> http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/ > > Yes, that link does describe what an 'order statistic tree' is. One > thing to note is that you can add order statistics to any tree wih one > more field on each tree node. That is, you can have your AVL or > Red-Black tree, and add order statistics to it. Allowing tree['x'] to > return the associated value for the key 'x' (the same as a dictionary), > as well as offer the ability to do tree.select(10) to get the 10th (or > 11th if one counts from 0) smallest key (or key,value), or get the 'rank' > of a key with tree.rank('x'). Thanks. > This problem has nothing to do with dictionaries and hashing, it has to > do with the fact that there may not be a total ordering on the elements > of a sequence. It is sad. I did not know it. Therefore and have not understood. I have looked in Google on "python dev total ordering". On intentions to correct this problem I have not found anything. It already earlier was discussed ? -- Vladimir Stepanov ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] binary trees.
Martin v. Löwis wrote: > Vladimir 'Yu' Stepanov wrote: > >> Yes. I understood it when resulted a set example. >> >>> However, as I just said, people usually don't remove items from >>> just-sorted lists, they tend to iterate over them via 'for i in list:' . >>> >>> >> Such problem arises at creation of the list of timers. >> > > For a list of timers/timeout events, a priority queue is often the best > data structure, which is already available through the heapq module. > It is in most cases valid so. heapq very much approaches for these purposes. The unique problem is connected with removal of an any element from the list. -- Vladimir Stepanov ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] total ordering.
On Sat, May 06, 2006 at 03:12:11AM -0700, Josiah Carlson wrote: > > "Vladimir 'Yu' Stepanov" <[EMAIL PROTECTED]> wrote: > > Josiah Carlson wrote: > > > This problem has nothing to do with dictionaries and hashing, it has to > > > do with the fact that there may not be a total ordering on the elements > > > of a sequence. > > > > It is sad. I did not know it. Therefore and have not understood. > > > > I have looked in Google on "python dev total ordering". On intentions to > > correct this problem I have not found anything. It already earlier was > > discussed ? > > Not really. It has to do with how non-comparable instances compare to > each other. Generally speaking, when instances aren't comparable, > Python compares their type, which is actually comparing the names of > types. This wouldn't be a big deal, except that: > > >>> str < tuple < unicode > True > > And you can actually compare str and unicode, so, if you have a str that > is greater than the unicode, you run into this issue. With unicode > becoming str in Py3k, we may not run into this issue much then, unless > bytes are comparable to str, in which case we end up witht the same > problems. > > Actually, according to my google of "python dev total ordering", it > gives me... > > http://mail.python.org/pipermail/python-dev/2003-March/034169.html > http://mail.python.org/pipermail/python-list/2003-March/154142.html > > Which were earlier discussions on this topic, which are quite topical. > The ultimate solution is to choose a total ordering on types and > consider the problem solved. Then list.sort() and binary trees will > work properly. > Thanks for so detailed answer. Under these references discussion is conducted is presented in the form of "so is", instead of "as it to correct". Therefore I would like to mention a question "as it to correct". In case of incomparability of types can be it is meaningful compare the identifiers compared to each concrete type. More truly on them to calculate weight which will play a direct role in case of impossibility of comparison of types. Below one of variants of the decision of this problem is resulted. To each type of data in Python to add three fields: .. int tp_id; int tp_qualifier; int tp_weight; .. Rigidly to appoint some built in and external to types identifiers (tp_id). The others receive free identifiers on their initialization. Except for that types should be classified on their basic properties - tp_qualifier. The type can be classified as: 0 = None 1 = integer (bool, int, long, etc..) 2 = float (decimal, float, etc..) 3 = math (complex, matrix may be ?? ) 4 = string (str, unicode, splice, etc..) 5 = sequence (iterators, deque, xrange, enumerate, etc..) 6 = array (tuple, list, bytes, array, etc..) 7 = dictionary (dict, defdict, etc..) 8 = set (set, etc..) 9 = file (file, socket.socket, cStringIO.StringIO) .. 127 = non qualifier (sre.*, datetime.*, ClassType, IntsenceType, etc..) It is the approximate list on which it will be convenient to classify types (in my opinion). The weight can be precisely set in structure or if be equal -1 should is calculated under the formula: ob_type->tp_weight = (ob_type->tp_qualifier<<24) + (ob_type->tp_id<<8) Thus objects with similar properties will follow one after another. A problem can make external types. But if to develop further classification, problems to arise should not. -- SY. Vladimir Stepanov ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] total ordering.
Guido van Rossum wrote: > On 5/6/06, Vladimir Yu. Stepanov <[EMAIL PROTECTED]> wrote: > [proposing a total ordering between types] > > It Ain't Gonna Happen. (From now on, I'll write this as IAGH.) > > In Python 3000, we'll actually *remove* ordering between arbitrary > types as a feature; only types that explicitly care to be ordered with > respect to one another will be ordered. Equality tests are unaffected, > x==y will simply return False if x and y are of incomparable types; > but x > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/vys%40renet.ru > > > When comparison is made like ("something" < 123), generating of exception is necessary. At sorting or use of binary trees it is not so obvious. The behavior function of comparison in this case depends on needs of the user. At present time use of redefined function is complicated. I shall give an example. By a call of a method sort for the list can give absolutely different exceptions, depending on the order of sorted values or data (thanks for David Mertz and Josiah Carlson): - > >> [chr(255),1j,1,u't'].sort() Traceback (most recent call last): File "", line 1, in TypeError: no ordering relation is defined for complex numbers > >> [chr(255),1j,u't',1].sort() Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128) - If for Python-3000 similar it will be shown concerning types str(), int(), complex() and so on, and the type of exceptions will strongly vary, it will make problematic redefinition of behavior of function of sorting. It would be quite good to create one more class of exceptions which would unify generation of mistakes at comparison of non-comparable types or data. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] binary trees.
Josiah Carlson wrote: > And you can actually compare str and unicode, so, if you have a str that > is greater than the unicode, you run into this issue. With unicode > becoming str in Py3k, we may not run into this issue much then, unless > bytes are comparable to str, in which case we end up witht the same > problems. > > Actually, according to my google of "python dev total ordering", it > gives me... > > http://mail.python.org/pipermail/python-dev/2003-March/034169.html > http://mail.python.org/pipermail/python-list/2003-March/154142.html > > Which were earlier discussions on this topic, which are quite topical. > The ultimate solution is to choose a total ordering on types and > consider the problem solved. Then list.sort( > Under the second reference there was a question. complex it can be partially comparable with int, long, float? In fact 1 == 1+0j? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] total ordering.
Guido van Rossum wrote: > On 5/11/06, Vladimir 'Yu' Stepanov <[EMAIL PROTECTED]> wrote: >> If for Python-3000 similar it will be shown concerning types >> str(), int(), complex() and so on, and the type of exceptions >> will strongly vary, it will make problematic redefinition of >> behavior of function of sorting. > > Not really. We'll just document that sort() should only be used on a > list of objects that implement a total ordering. The behavior > otherwise will simply be undefined; it will raise whatever exception > is first raised by an unsupported comparison (most likely TypeError). > In practice this won't be a problem. > In my opinion for functions of comparisons there is no additional kind of exceptions. If made action is not operation of reduction of type more often exception TypeError means a mistake of programming. At creation of exception, characteristic only for comparison, it is possible to involve methods `.__r(eq|ne|le|lt|ge|gt|cmp)__()' not being afraid to hide a mistake of programming TypeError (sorry for a tautology). It will be possible it conveniently to use as exception of management by a stream, for indication of necessity to involve `.__r(eq|ne|le|lt|ge|gt|cmp)__()' a method. This kind of a class can carry out function, similarly to StopIteration for `.next()'. In case of unsuccessful comparison the user has an opportunity in function `cmp' a method .sort() simple image to define the behaviour necessary to it, including an opportunity of sorting of diverse elements. At present time similar function is carried out with exception NotImplemented. This exception is generated in a number of mathematical operations. For this reason I ask to consider an opportunity of creation of a new class. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] total ordering.
Jason Orendorff wrote: On 5/11/06, Vladimir 'Yu' Stepanov <[EMAIL PROTECTED]> wrote: If for Python-3000 similar it will be shown concerning types str(), int(), complex() and so on, and the type of exceptions will strongly vary, it will make problematic redefinition of behavior of function of sorting. I don't see what you mean by "redefinition of behavior of function of sorting". Is this something a Python programmer might want to do? Can you give an example? It can be shown concerning external types. When everyone them we shall compare to any internal type, but not with each other. It concerns functions and the methods realized through CPython API. For example types of the whole greater numbers of different mathematical libraries. Or in case of use of several different representation of character arrays. One more example of work with a database which is more realistic. For Python-3000 it will be possible to use comparison only compatible types. There can be a problem to sort types of data incompatible for comparison. conn = MySQLdb.connect(...) cu = conn.cursor() cu.execute(""" CREATE TABLE bigtable ( id INT NOT NULL AUTO_INCREMENT, group_id INT NULL, value VARCHAR(255) NOT NULL, PRIMARY KEY (id) )""") for query in ( "INSERT INTO bigtable VALUES (NULL, 50, 'i am lazy')", "INSERT INTO bigtable VALUES (NULL, 20, 'i am very lazy')", "INSERT INTO bigtable VALUES (NULL, 19, 'i am very very lazy')", "INSERT INTO bigtable VALUES (NULL, 40, 'i am very very very lazy :)')", "INSERT INTO bigtable VALUES (NULL, NULL, 'Something different')" ): cu.execute(query) cu.execute("SELECT * FROM bigtable ORDER BY id") lst = [ r for r in cu ] ... do something in given order ... lst.sort(cmp=lambda a, b: cmp(a[1], b[1])) If it is required to sort records on a column group_id it hardly will be possible to us for python-3000 as numbers in a column group_id will alternate with None values. Certainly it is possible to filter the list preliminary. But it will not be effective. Especially, if to assume, that sorted values can have some various incompatible types (it already beyond the framework of the given example). The essence of my offer consists in at first to admit check of types due to fast mechanisms CPython of methods, and only in case of incompatibility of types of data to use spare model of comparison, only if it is necessary. Creation of uniform exception should facilitate creation of a design try-except. Class of exceptions TypeError have wide using. For functions and methods of comparison the specialized class is necessary. In fact nobody uses class StopIteration for generation of messages on a error :) Other example when classes of different modules are used, and it is necessary to sort them (see attached "example1.py"). On 5/16/06, Vladimir 'Yu' Stepanov <[EMAIL PROTECTED]> wrote: It will be possible it conveniently to use as exception of management by a stream, for indication of necessity to involve `.__r(eq|ne|le|lt|ge|gt|cmp)__()' a method. This kind of a class can carry out function, similarly to StopIteration for `.next()'. There are no .__r(eq|ne|le|lt|ge|gt|cmp)__() methods, for a logical reason which you might enjoy deducing yourself... Oops :) I has simply hastened. About __rcmp__ has once read through, and only here has emerged in memory when about it already all to think have forgotten. As inopportunely. At present time similar function is carried out with exception NotImplemented. This exception is generated in a number of mathematical operations. For this reason I ask to consider an opportunity of creation of a new class. Can you explain this? NotImplemented isn't an exception. (NotImplementedError is, but that's something quite different.) I was mistaken. Sorry. It is valid not a class of exceptions. NotImplemented has exactly one purpose in Python, as far as I can tell. What mathematical operations do you mean? At multiplication/division/addition/subtraction/binary of operation/etc - if types are non-comparable (binary_op1 in abstract.c). Very universal element. CPython API will transform returned element NotImplemented to generation of exception TypeError. Whether it is good? In my opinion thus it is possible to hide a mistake of programming. As an example the attached file example2.py. Thanks. #!/usr/local/bin/python import types class CompareError: pass # class in module1 class A: uniform_global_sort_criterion = 1 def __init__(self, val): self.val = val def __cmp__(self, other): if type(self) is type(other) and self.__class__ ==
[Python-Dev] xrange vs. int.__getslice__
You were bothered yet with function xrange ? :) I suggest to replace it. - for i in xrange(100): pass vs. for i in int[:100]: pass - - for i in xrange(1000, 1020): pass vs. for i in int[1000:1020]: pass - - for i in xrange(200, 100, -2): pass vs. for i in int[200:100:-2]: pass - ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] xrange vs. int.__getslice__
Thomas Wouters wrote: > http://www.python.org/dev/peps/pep-0204/ > > (If you must really discuss this, which would probably be futile and > senseless, please do it on python-3000 only.) Certainly looks very similar. PEP-204 demands change in a parser and considers a new design as replacement to range functions. My offer can be considered as replacement to xrange functions. Any change in a syntactic design of language to spend it is not necessary. Thanks. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com