[Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

2006-04-24 Thread Vladimir 'Yu'; Stepanov
I would like to participate in Google Summer of Code. The idea consists in
creation of a specialized class for work with binary trees AVL and RB. The
part of ideas is taken from Parrot (Perl6) where for pair values the
specialized type is stipulated. As advantages it is possible to note:
  * High speed. On simple types at quantity of elements up to 1 speed of
work concedes to the standard dictionary of all in 2.7 times.
  * Safety of work in a multithread operating mode.
  * The method for frosts of object is stipulated.
  * The data storage is carried out in the sorted kind.
  * A high overall performance `item' and `iteritem' methods as it is not
necessary to create new objects.
  * At lots of objects in a tree memory is used much less intensively.
  * Absence of collisions. As consequence, it is impossible to generate bad
data for creation DoS of the attacks influencing the dictionary of
transferred arguments.
  * For objects existence of a method __ hash __ is not necessary.
  * The same kind of the dictionary by means of the overloaded operations
probably change of its behaviour so that it supported the 
multivariational
representation based on stratification given and subsequent recoils.

Additional requirements to key objects:
  * The opportunity of comparison of key objects function cmp is necessary.

There are the reasons, braking development of this project:
  * Compulsion of heading of a module `gc' for each compound object (this
structure is unessential to some objects, but updating `gc' the module
is required).
  * Absence of standard object the pair, probably depends on the 
previous item.

Lacks of a binary tree:
  * Average time of search for hash - O (1), and for trees - O (log2 N).
  * A lot of memory under a single element of a tree:
(3*sizeof (void *) + sizeof (int))*2,
one element is used rather - the pair, the second site of memory is
allocated under node of a tree.

In protection of object "pair":
  * The logic of methods of the given object noticeably is easier than 
tuple,
that as a result can affect speed of work of the program in the best 
party.
  * Alignment on 8 bytes has no big sense at present architecture where in
cache sample a minimum on 64 bytes is made. Use of type "pair" will give
an appreciable prize at use of alignment in 4 bytes. Otherwise on 64-bit
platforms it is much more favourable to use tuple object.

The given project can demand processing of the module `gcmodule.c' and
`tupleobject.c'. It is necessary to reduce the size of static objects, 
for this
purpose the opportunity is necessary is transparent to pass objects not 
having
direct support from the module `gcmodule.c'.

Also it will be partially necessary to process the module `obmalloc.c' 
for more
effective distribution of memory.

I shall be glad to answer questions on this theme.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

2006-04-24 Thread Vladimir 'Yu'; Stepanov
I would like to participate in Google Summer of Code. The idea consists in
creation of a specialized class for work with binary trees AVL and RB. The
part of ideas is taken from Parrot (Perl6) where for pair values the
specialized type is stipulated. As advantages it is possible to note:
  * High speed. On simple types at quantity of elements up to 1 speed of
work concedes to the standard dictionary of all in 2.7 times.
  * Safety of work in a multithread operating mode.
  * The method for frosts of object is stipulated.
  * The data storage is carried out in the sorted kind.
  * A high overall performance `item' and `iteritem' methods as it is not
necessary to create new objects.
  * At lots of objects in a tree memory is used much less intensively.
  * Absence of collisions. As consequence, it is impossible to generate bad
data for creation DoS of the attacks influencing the dictionary of
transferred arguments.
  * For objects existence of a method __ hash __ is not necessary.
  * The same kind of the dictionary by means of the overloaded operations
probably change of its behaviour so that it supported the 
multivariational
representation based on stratification given and subsequent recoils.

Additional requirements to key objects:
  * The opportunity of comparison of key objects function cmp is necessary.

There are the reasons, braking development of this project:
  * Compulsion of heading of a module `gc' for each compound object (this
structure is unessential to some objects, but updating `gc' the module
is required).
  * Absence of standard object the pair, probably depends on the 
previous item.

Lacks of a binary tree:
  * Average time of search for hash - O (1), and for trees - O (log2 N).
  * A lot of memory under a single element of a tree:
(3*sizeof (void *) + sizeof (int))*2,
one element is used rather - the pair, the second site of memory is
allocated under node of a tree.

In protection of object "pair":
  * The logic of methods of the given object noticeably is easier than 
tuple,
that as a result can affect speed of work of the program in the best 
party.
  * Alignment on 8 bytes has no big sense at present architecture where in
cache sample a minimum on 64 bytes is made. Use of type "pair" will give
an appreciable prize at use of alignment in 4 bytes. Otherwise on 64-bit
platforms it is much more favourable to use tuple object.

The given project can demand processing of the module `gcmodule.c' and
`tupleobject.c'. It is necessary to reduce the size of static objects, 
for this
purpose the opportunity is necessary is transparent to pass objects not 
having
direct support from the module `gcmodule.c'.

Also it will be partially necessary to process the module `obmalloc.c' 
for more
effective distribution of memory.

I shall be glad to answer questions on this theme.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

2006-04-26 Thread Vladimir 'Yu'; Stepanov

Josiah Carlson wrote:

There exists various C and Python implementations of both AVL and
Red-Black trees.  For users of Python who want to use AVL and/or
Red-Black trees, I would urge them to use the Python implementations. 
In the case of *needing* the speed of a C extension, there already

exists a CPython extension module for AVL trees:
http://www.python.org/pypi/pyavl/1.1

I would suggest you look through some suggested SoC projects in the
wiki:
http://wiki.python.org/moin/SummerOfCode

 - Josiah

  

Thanks for the answer!

I already saw pyavl-1.1. But for this reason I would like to see the module
in a standard package python. Functionality for pyavl and dict to compare
difficultly. Functionality of my module will differ from functionality dict
in the best party. I have lead some tests on for work with different types
both for a package pyavl-1.1, and for the prototype of own module. The 
script

of check is resulted in attached a file avl-timeit.py In files
timeit-result-*-*.txt results of this test. The first in the name of a file
means quantity of the added elements, the second - argument of a method
timeit. There it is visible, that in spite of the fact that the module 
xtree

is more combined in comparison with pyavl the module (for everyone again
inserted pair [the key, value], is created two elements: python object - 
pair,

and an internal element of a tree), even in this case it works a little bit
more quickly. Besides the module pyavl is unstable for work in multithread
appendices (look the attached file module-avl-is-thread-unsafe.py).

I think, that creation of this type (together with type of pair), will make
programming more convenient since sorting by function sort will be required
less often.

I can probably borrow in this module beyond the framework of the project
google. The demand of such type of data is interesting. Because of 
necessity

of processing `gcmodule.c' and `obmalloc.c' this module cannot be realized
as the external module.


#!/usr/local/bin/python

# Initialize modules
import avl
import xtree
import types
import sys

import timeit

if len(sys.argv) != 3:
iterations = 100
count = 10
else:
iterations = int(sys.argv[1])
count = int(sys.argv[2])

print "Iterations", iterations
print "Repeats", count
print

result_avl = timeit.Timer("""
import avl
a = avl.new()
for i in xrange(%d):
a.insert(i)
""" % iterations).timeit(count)
print "object avl.new()", result_avl

result_xtree = timeit.Timer("""
import xtree
a = xtree.xtree()
for i in xrange(%d):
a[i] = True
""" % iterations).timeit(count)
print "object xtree.xtree()", result_xtree

result_dict = timeit.Timer("""
import types
a = dict()
for i in xrange(%d):
a[i] = True
""" % iterations).timeit(count)
print "object dict()", result_dict

print "  Dict  Xtree AVL"
print "Dict  %.2f  %.2f  %.2f" % (float(result_dict)/result_dict, 
float(result_dict)/result_xtree, float(result_dict)/result_avl)
print "Xtree %.2f  %.2f  %.2f" % (float(result_xtree)/result_dict, 
float(result_xtree)/result_xtree, float(result_xtree)/result_avl)
print "AVL   %.2f  %.2f  %.2f" % (float(result_avl)/result_dict, 
float(result_avl)/result_xtree, float(result_avl)/result_avl)
fox:root!~/Downloads/Python/PYTHON/pyavl-1.1 > python setup.py install
running install
running build
running build_ext
building 'avl' extension
creating build
creating build/temp.freebsd-5.4-RELEASE-i386-2.4
cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t 
-DTHREAD_STACK_SIZE=0x2 -O2 -fPIC
-DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c avl.c -o 
build/temp.freebsd-5.4
-RELEASE-i386-2.4/avl.o -O2 -Wno-parentheses -Wno-uninitialized
cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t 
-DTHREAD_STACK_SIZE=0x2 -O2 -fPIC
-DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c 
avlmodule.c -o build/temp.freeb
sd-5.4-RELEASE-i386-2.4/avlmodule.o -O2 -Wno-parentheses -Wno-uninitialized
creating build/lib.freebsd-5.4-RELEASE-i386-2.4
cc -shared -pthread -O2 build/temp.freebsd-5.4-RELEASE-i386-2.4/avl.o 
build/temp.freebsd-5.4-RELEASE
-i386-2.4/avlmodule.o -o build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -Wl,-x
running install_lib
copying build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -> 
/usr/local/lib/python2.4/site-packages



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

2006-04-26 Thread Vladimir 'Yu'; Stepanov

Josiah Carlson wrote:

There exists various C and Python implementations of both AVL and
Red-Black trees.  For users of Python who want to use AVL and/or
Red-Black trees, I would urge them to use the Python implementations. 
In the case of *needing* the speed of a C extension, there already

exists a CPython extension module for AVL trees:
http://www.python.org/pypi/pyavl/1.1

I would suggest you look through some suggested SoC projects in the
wiki:
http://wiki.python.org/moin/SummerOfCode

 - Josiah

  

Thanks for the answer!

I already saw pyavl-1.1. But for this reason I would like to see the module
in a standard package python. Functionality for pyavl and dict to compare
difficultly. Functionality of my module will differ from functionality dict
in the best party. I have lead some tests on for work with different types
both for a package pyavl-1.1, and for the prototype of own module. The 
script

of check is resulted in attached a file avl-timeit.py In files
timeit-result-*-*.txt results of this test. The first in the name of a file
means quantity of the added elements, the second - argument of a method
timeit. There it is visible, that in spite of the fact that the module 
xtree

is more combined in comparison with pyavl the module (for everyone again
inserted pair [the key, value], is created two elements: python object - 
pair,

and an internal element of a tree), even in this case it works a little bit
more quickly. Besides the module pyavl is unstable for work in multithread
appendices (look the attached file module-avl-is-thread-unsafe.py).

I think, that creation of this type (together with type of pair), will make
programming more convenient since sorting by function sort will be required
less often.

I can probably borrow in this module beyond the framework of the project
google. The demand of such type of data is interesting. Because of 
necessity

of processing `gcmodule.c' and `obmalloc.c' this module cannot be realized
as the external module.


#!/usr/local/bin/python

# Initialize modules
import avl
import xtree
import types
import sys

import timeit

if len(sys.argv) != 3:
iterations = 100
count = 10
else:
iterations = int(sys.argv[1])
count = int(sys.argv[2])

print "Iterations", iterations
print "Repeats", count
print

result_avl = timeit.Timer("""
import avl
a = avl.new()
for i in xrange(%d):
a.insert(i)
""" % iterations).timeit(count)
print "object avl.new()", result_avl

result_xtree = timeit.Timer("""
import xtree
a = xtree.xtree()
for i in xrange(%d):
a[i] = True
""" % iterations).timeit(count)
print "object xtree.xtree()", result_xtree

result_dict = timeit.Timer("""
import types
a = dict()
for i in xrange(%d):
a[i] = True
""" % iterations).timeit(count)
print "object dict()", result_dict

print "  Dict  Xtree AVL"
print "Dict  %.2f  %.2f  %.2f" % (float(result_dict)/result_dict, 
float(result_dict)/result_xtree, float(result_dict)/result_avl)
print "Xtree %.2f  %.2f  %.2f" % (float(result_xtree)/result_dict, 
float(result_xtree)/result_xtree, float(result_xtree)/result_avl)
print "AVL   %.2f  %.2f  %.2f" % (float(result_avl)/result_dict, 
float(result_avl)/result_xtree, float(result_avl)/result_avl)
fox:root!~/Downloads/Python/PYTHON/pyavl-1.1 > python setup.py install
running install
running build
running build_ext
building 'avl' extension
creating build
creating build/temp.freebsd-5.4-RELEASE-i386-2.4
cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t 
-DTHREAD_STACK_SIZE=0x2 -O2 -fPIC
-DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c avl.c -o 
build/temp.freebsd-5.4
-RELEASE-i386-2.4/avl.o -O2 -Wno-parentheses -Wno-uninitialized
cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t 
-DTHREAD_STACK_SIZE=0x2 -O2 -fPIC
-DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c 
avlmodule.c -o build/temp.freeb
sd-5.4-RELEASE-i386-2.4/avlmodule.o -O2 -Wno-parentheses -Wno-uninitialized
creating build/lib.freebsd-5.4-RELEASE-i386-2.4
cc -shared -pthread -O2 build/temp.freebsd-5.4-RELEASE-i386-2.4/avl.o 
build/temp.freebsd-5.4-RELEASE
-i386-2.4/avlmodule.o -o build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -Wl,-x
running install_lib
copying build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -> 
/usr/local/lib/python2.4/site-packages



#!/usr/local/bin/python

import avl
import time

a = avl.new()

crash = 61215 # may be any in range 0 .. 8

a.insert(crash)
a.insert(crash + 1)

b = iter(a)
b.next()

for i in xrange(10):
a.insert(i)

a.remove(crash)

k=[]
for i in xrange(100):
# fill memory with padding for 32-bit machine
k.append(pow(65536, 2))
# fill memory with padding for 64-bit machine
k.append(pow(65536, 10))

for i in b:
print i
Iterations 30
Repeats 100

object avl.new() 44.3226678371
object xtree.xtree() 30.8406031132
object dict() 12.9507939816
  Dict  Xtree AVL
Dict  1.00  0.42  0.29
Xtre

Re: [Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

2006-04-26 Thread Vladimir 'Yu'; Stepanov
Hye-Shik Chang wrote:
> On 4/25/06, Vladimir 'Yu' Stepanov <[EMAIL PROTECTED]> wrote:
>   
>> Thanks for the answer, but after application of a patch on python-2.4.3
>> I could not compile it. A conclusion of a stage of compilation in the
>> attached file.
>>
>> 
>
> Aah.  The patch is for Python in Subversion trunk.  I think you can
> add the following line somewhere in collectionsmodule.c to avoid
> to error.
>
> #define Py_ssize_t int
>
> Thank you!
>
> Hye-Shik
>
>   
Thanks! I have compared realizations. The object collections.rbtree works
very well. Why it still extends in the form of a patch?

Speed of work can be compared to speed of work of the standard
dictionary. Here comparative results of speed of work. My module concedes
in speed of work of that realization without the index on a parental
element and allocation is used goes two objects on each element.

There is an offer on improvement of its work. To add new exceptions for
indication of the fact of blocking, for example:
Exception
|__ StandartError
|__ LockError
|__ FreezeError
|__ RDLockError
|__ WRLockError

That speed of work was is high enough for check on each kind of blocking
it is necessary to check only one identifier. For example these macro:
-
void
_lock_except(int lockcnt) {
if (lockcnt > 0) {
if ((lockcnt&1) == 0)
PyErr_SetNone(PyExc_RDLockError);
else
PyErr_SetNone(PyExc_FreezeLockError);
} else
PyErr_SetNone(PyExc_WRLockError);
}

#define wrlock_SET(x,r) if ((x)->ob_lockcnt != 0) { \
_lock_except((x)->ob_lockcnt); \
return r;   \
}   \
(x)->ob_lockcnt = -1
#define wrlock_UNSET(x) (x)->ob_lockcnt = 0

#define rdlock_SET(x,r) if ((x)->ob_lockcnt < 0) {  \
_lock_except((x)->ob_lockcnt); \
return r;   \
}   \
(x)->ob_lockcnt += 2
#define rdlock_UNSET(x) (x)->ob_lockcnt -= 2

#define freezelock_SET(x, r) if ((x)->ob_lockcnt < 0) { \
_lock_except((x)->ob_lockcnt); \
return r;   \
}   \
(x)->ob_lockcnt |= 1;
-

In object the counter in the form of an integer with a sign should be
certain - ob_lockcnt:
-
struct newobject {
PyObject_HEAD // or PyObject_HEAD_VAR
...
int ob_lockcnt;
...
};
-

And by any call depending on if the method only reads data of object, the
sequence rdlock_* should be used. Example:
-
PyObject *
method_for_read_something(PyObject *ob) {
rdlock_SET(ob, NULL);
... do something ...
if (failure)
goto failed;
... do some else ...
rdlock_UNSET(ob);
Py_INCREF(Py_None);
return Py_None;
failed:
rdlock_UNSET(ob);
return NULL;
}
-

If the given method can make any critical changes that the sequence
wrlock_* should be used. Example:
-
PyObject *
method_for_some_change(PyObject *ob) {
wrlock_SET(ob, NULL);
... do something ...
if (failure)
goto failed;
... do some else ...
wrlock_UNSET(ob);
Py_INCREF(Py_None);
return Py_None;
failed:
wrlock_UNSET(ob);
return NULL;
}
-

If the object needs to be frozen, it is equivalent to installation of
constant blocking on reading. Thus it is possible to establish blocking on
already readable object.

For the open iterators it is necessary to carry out opening of blocking on
reading. Closing can be carried out or on end of work of iterator, or on its
closing:
-
PyObject *
map_object_keys(PyObject *ob) {
rdlock_SET(ob, NULL);
... allocate iterator and initialize ...
}


map_object_keys_iter_destroy(PyObject *ob) {
... destroy data ...
rdlock_UNSET(ob);
}
-

Re: [Python-Dev] binary trees. Review obmalloc.c

2006-04-27 Thread Vladimir &#x27;Yu'; Stepanov
Josiah Carlson wrote:
> "Vladimir 'Yu' Stepanov" <[EMAIL PROTECTED]> wrote:
>   
>> Josiah Carlson wrote:
>> 
>>> There exists various C and Python implementations of both AVL and
>>> Red-Black trees.  For users of Python who want to use AVL and/or
>>> Red-Black trees, I would urge them to use the Python implementations. 
>>> In the case of *needing* the speed of a C extension, there already
>>> exists a CPython extension module for AVL trees:
>>> http://www.python.org/pypi/pyavl/1.1
>>>
>>> I would suggest you look through some suggested SoC projects in the
>>> wiki:
>>> http://wiki.python.org/moin/SummerOfCode
>>>
>>>  - Josiah
>>>
>>>   
>>>   
>> Thanks for the answer!
>>
>> I already saw pyavl-1.1. But for this reason I would like to see the module
>> in a standard package python. Functionality for pyavl and dict to compare
>> difficultly. Functionality of my module will differ from functionality dict
>> in the best party. I have lead some tests on for work with different types
>> both for a package pyavl-1.1, and for the prototype of own module. The 
>> script
>> of check is resulted in attached a file avl-timeit.py In files
>> timeit-result-*-*.txt results of this test. The first in the name of a file
>> means quantity of the added elements, the second - argument of a method
>> timeit. There it is visible, that in spite of the fact that the module 
>> xtree
>> is more combined in comparison with pyavl the module (for everyone again
>> inserted pair [the key, value], is created two elements: python object - 
>> pair,
>> and an internal element of a tree), even in this case it works a little bit
>> more quickly. Besides the module pyavl is unstable for work in multithread
>> appendices (look the attached file module-avl-is-thread-unsafe.py).
>> 
>
> I'm not concerned about the speed of the external AVL module, and I'm
> not concerned about the speed of trees in Python; so far people have
> found that dictionaries are generally sufficient.  More specifically,
> while new lambda syntaxes are presented almost weekly, I haven't heard
> anyone complain about Python's lack of a tree module in months.  As a
> result, I don't find the possible addition of any tree structure to the
> collections module as being a generally compelling addition.
>   
The object dict is irreplaceable in most cases, it so. I do not assume, 
that binary
trees can sometime replace standard `dict' object. Sorting of the list 
also is and
will be irreplaceable function.

But there is a number of problems which can be rather easily solved by means
of binary trees. Most likely for them there will be an alternative. But, 
as a rule,
it is less obvious to the programmer. And most likely realization will 
be not so
fast.

Probably that nobody mentioned necessity of binary trees. In my opinion 
it is
not a reason in an estimation of necessity of the given type.

To take for example PHP. The array and the dictionary there is realized 
as the
list. There is no even a hint on hash object.  Those users to whom was 
necessary
hash already have chosen perl/python/ruby/etc. The similar situation can 
arise
and concerning binary trees.
> Again, I believe that the existance of 3rd party extension modules which
> implement AVL and red-black trees, regardless of their lack of thread
> safety, slower than your module by a constant, or lack of particular
> functionality to be basically immaterial to this discussion.  In my mind,
> there are only two relevant items to this discussion:
>
> 1. Is having a tree implementation (AVL or Red-Black) important, and if
> so, is it important enough to include in the collections module?
> 2. Is a tree implementation important enough for google to pay for its
> inclusion, given that there exists pyavl and a collectionsmodule.c patch
> for a red-black tree implementation?
>
> Then again, you already have your own implementation of a tree module,
> and it seems as though you would like to be paid for this already-done
> work.  I don't know how google feels about such things, but you should
> remember to include this information in your application.
>
>   
I have looked a patch for collectionsmodule.c. In my opinion it is fine
realization of binary trees. If there is this class that to me there is 
no need to
create one more copy.

My persistence speaks a demand of this type of data. As to my message on
the beginning of the project for SoC I ask to consider that this theme 
has not
been lifted :) For me it was a minor question.
>> I think, that creation of this type (togeth

Re: [Python-Dev] binary trees. Review obmalloc.c

2006-05-04 Thread Vladimir &#x27;Yu'; Stepanov

Tim Peters wrote:

[Vladimir 'Yu' Stepanov]

* To adapt allocation of blocks of memory with other alignment. Now
alignment is rigidly set on 8 bytes. As a variant, it is possible to
use alignment on 4 bytes. And this value can be set at start of the
interpreter through arguments/variable environments/etc. At testing
with alignment on 4 or 8 bytes difference in speed of work was not
appreciable.


[Martin v. Löwis]

That depends on the hardware you use, of course. Some architectures
absolutely cannot stand mis-aligned accesses, and programs will just
crash if they try to perform such accesses.


Note that we _had_ to add a goofy "long double" to the PyGC_Head union
because some Python platform required 8-byte alignment for some types
... see rev 25454.  Any spelling of malloc() also needs to return
memory aligned for any legitimate purpose, and 8-byte alignment is the
strictest requirement we know of across current Python platforms.

It is possible to receive the reference about these tests? I have lead
some tests, which very small difference of speed of work at alignment on
4 and 8 byte (a maximum of 8%, and in the basic less than one percent).

In tests consecutive search of elements in one piece of memory was used.
I understand, that it is necessary to lead still the test with a casual
choice of addresses to minimize optimization of work cache the processor.
And certainly not all platforms will show the same result (I shall try
to not forget to attach a source code of the test and its result of
work :) ).

On the other hand reduction of a memory size by each separate copy of
object is capable to increase speed by the same percent that is lost at
change of displacement about 8 bytes up to 4 :) Besides begins to
refuse probably superstructures on allocation of memory at some objects,
since int.



So Python should err on the safe side, and only use a smaller alignment
when it is known not to hurt.

OTOH, I don't see the *advantage* in reducing the alignment.


It could cut wasted bytes.  There is no per-object memory overhead in
a release-build obmalloc:  the allocatable part of a pool is entirely
used for user-visible object memory, except when the alignment
requirement ends up forcing unused (by both the user and by obmalloc)
pad bytes.  For example, Python ints consume 12 bytes on 32-bit boxes,
but if space were allocated for them by obmalloc (it's not), obmalloc
would have to use 16 bytes per int to preserve 8-byte alignment.

Good note.


OTOH, obmalloc (unlike PyGC_Head) has always used 8-byte alignment,
because that's what worked best for Vladimir Marangozov during his
extensive timing tests.  It's not an isolated decision -- e.g., it's
also influenced by, and influences, "the best" pool size, and (of
course) cutting alignment to 4 would double the number of "size
classes" obmalloc has to juggle.

Yes, the maximum quantity of buffers will really be doubled. But it
should not to affect as that productivity of system, or on total amount
of consumed memory. Change of a fragmentation of memory to count it is
impossible, since it will depend on the concrete application.


* To expand the maximal size which can be allocated by means of the
given module. Now the maximal size is 256 bytes.



Right. This is somewhat deliberate, though; the expectation is that
fragmentation will increase dramatically if even larger size classes
are supported.


It's entirely deliberate ;-)  obmalloc is no way trying to be a
general-purpose allocator.  It was designed specifically for the
common memory patterns in Python, aiming at large numbers of small
objects of the same size.  That's extremely common in Python, and
allocations larger than 256 bytes are not. The maximum size was
actually 64 bytes at first.After that, dicts grew an embedded
8-element table, which vastly increased the size of the dict struct. 
obmalloc's limit was boosted to 256 then, although it could have
stopped at the size of a dict (in the rough ballpark of 150 bytes). 
There was no reason (then or now) to go beyond 256.

See below.

* At the size of the allocated memory close to maximal, the
allocation of blocks becomes inefficient. For example, for the
sizesof blocks 248 and 256 (blocksize), values of quantity of
elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be
possible to use for the sizes of the block 248 same page, as for the
size of the block 256.



Good idea; that shouldn't be too difficult to implement: for sizes above
215, pools need to be spaced apart only at 16 bytes.


I'd rather drop the limit to 215 then <0.3 wink>.  Allocations that
large probably still aren't important to obmalloc, but it is important
that finding a requested allocation's "size index" be as cheap as
possible.  Uniformity helps that.

An available module on allocation of memory really does not approach for
overall aims. And speed of work

Re: [Python-Dev] binary trees. Review obmalloc.c

2006-05-05 Thread Vladimir &#x27;Yu'; Stepanov

Josiah Carlson wrote:

"Vladimir 'Yu' Stepanov" <[EMAIL PROTECTED]> wrote:

  
Comparison of functions of sorting and binary trees not absolutely 
correctly. I think that function sort will lose considerably on

greater lists. Especially after an insert or removal of all one element.



Generally speaking, people who understand at least some of Python's
internals (list internals specifically), will not be *removing* entries
from lists one at a time (at least not using list.pop(0) ), because that
is slow.  If they need to remove items one at a time from the smallest
to the largest, they will usually use list.reverse(), then repeatedly
list.pop(), as this is quite fast (in general).
  

Yes. I understood it when resulted a set example.

However, as I just said, people usually don't remove items from
just-sorted lists, they tend to iterate over them via 'for i in list:' .
  

Such problem arises at creation of the list of timers. And this list is in
constant change: addition/removal of elements in the list.
collections.deque here does not approach, as if addition in the big list is
made or search of the nearest value on the average it is necessary to lead
quantity of checks N/2 is made. For a binary tree the quantity of necessary
checks on former is equal log2 (N).

Other variant of use of binary trees: search of concurrence to ranges. Such
ranges can be blocks IP of addresses. Also this kind of the dictionary can
be used for a fast finding, whether the given point enters into one of
pieces. These problems can be realized by means of binary search. For
binary search the same lacks, as for above resulted example are
characteristic: the insert and removal for lists are carried out slowly and
after an insert sorting of the list is required. Except for that function
of binary search is absent in standard package Python. It is possible to
write its analogue on pure Python, but it will be more than twice more
slowly.

 - Josiah

As an aside, I have personally implemented trees a few times for
different reasons.  One of the issues I have with most tree
implementations is that it is generally difficult to do things like
"give me the kth smallest/largest item".  Of course the standard
solution is what is generally called a "partial order" or "order
statistic" tree, but then you end up with yet another field in your
structure.  One nice thing about Red-Black trees combined with
order-statistic trees, is that you can usually use the uppermost bit of
the order statistics for red/black information.  Of course, this is
really only interesting from an "implement this in C" perspective;
because if one were implementing it in Python, one may as well be really
lazy and not bother implementing guaranteed balanced trees and be
satisfied with randomized-balancing via Treaps.
Here that I have found through Google on a line " order statistic binary 
tree ":


   http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/

I have understood, that similar I have already made something :). The 
result of

the test shows on the most bad result, but is certainly worse, than a usual
tree. Badly that up to the end I have not tested this realization. 
Percent on

90 I am assured that it is efficient.

There is a question. What API should be used? Therefore as if to address to
object, as to MAP to object __getitem__ will return one result. And if as to
the list - another.

Here the list of already existing attributes and methods:

   class xtree(__builtin__.object)
|  X-Tree. Binary tree with AVL balance mechanism.
|
|  Methods defined here:
|
|  __contains__(...)
|  x.__contains__(y) <==> y in x
|
|  __delitem__(...)
|  x.__delitem__(y) <==> del x[y]
|
|  __getitem__(...)
|  x.__getitem__(y) <==> x[y]
|
|  __iter__(...)
|  x.__iter__() <==> iter(x)
|
|  __len__(...)
|  x.__len__() <==> len(x)
|
|  __repr__(...)
|  x.__repr__() <==> repr(x)
|
|  __setitem__(...)
|  x.__setitem__(i, y) <==> x[i]=y
|
|  append(...)
|  D.append((k: v)) -> None, append new pair element into tree 
with sorting.

|  Return pair element if argument is exist.
|
|  clear(...)
|  D.clear() -> None. Remove all items from D.
|
|  copy(...)
|  D.copy() -> a shallow copy of D.
|
|  get(...)
|  D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
|
|  has_key(...)
|  D.has_key(k) -> True if D has a key k, else False.
|
|  items(...)
|  D.items() -> list of D's (key, value) pairs.
|
|  iteritems(...)
|  D.iteritems() -> an iterator over the (key, value) items of D.
|
|  iterkeys(...)
|  D.iterkeys() -> an iterator over the keys of D.
|

Re: [Python-Dev] binary trees.

2006-05-06 Thread Vladimir &#x27;Yu'; Stepanov
Josiah Carlson wrote:
> "Vladimir 'Yu' Stepanov" <[EMAIL PROTECTED]> wrote:
>> Josiah Carlson wrote:
>>> However, as I just said, people usually don't remove items from
>>> just-sorted lists, they tend to iterate over them via 'for i in list:' .
>>>   
>> Such problem arises at creation of the list of timers.
>
> I've never seen that particular use-case.
>
> Please understand something.  I believe that trees can be useful in some
> cases. However, I don't believe that they are generally useful
> enough in Python, given that we already have key,value dictionaries and
> sortable lists.  They become even less worthwhile in Python, given the
> total ordering problem that I describe later in this message.

I tiresome. It so :)
Sorry for that.

>> Here that I have found through Google on a line " order statistic binary 
>> tree ":
>>
>> http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/
>
> Yes, that link does describe what an 'order statistic tree' is.  One
> thing to note is that you can add order statistics to any tree wih one
> more field on each tree node.  That is, you can have your AVL or
> Red-Black tree, and add order statistics to it.  Allowing tree['x'] to
> return the associated value for the key 'x' (the same as a dictionary),
> as well as offer the ability to do tree.select(10) to get the 10th (or
> 11th if one counts from 0) smallest key (or key,value), or get the 'rank'
> of a key with tree.rank('x').

Thanks.

> This problem has nothing to do with dictionaries and hashing, it has to
> do with the fact that there may not be a total ordering on the elements
> of a sequence.

It is sad. I did not know it. Therefore and have not understood.

I have looked in Google on "python dev total ordering". On intentions to
correct this problem I have not found anything. It already earlier was
discussed ?

--
Vladimir Stepanov
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] binary trees.

2006-05-06 Thread Vladimir &#x27;Yu'; Stepanov
Martin v. Löwis wrote:
> Vladimir 'Yu' Stepanov wrote:
>   
>> Yes. I understood it when resulted a set example.
>> 
>>> However, as I just said, people usually don't remove items from
>>> just-sorted lists, they tend to iterate over them via 'for i in list:' .
>>>   
>>>   
>> Such problem arises at creation of the list of timers.
>> 
>
> For a list of timers/timeout events, a priority queue is often the best
> data structure, which is already available through the heapq module.
>   

It is in most cases valid so. heapq very much approaches for these purposes.
The unique problem is connected with removal of an any element from the 
list.

--
Vladimir Stepanov
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] total ordering.

2006-05-07 Thread Vladimir Yu. Stepanov
On Sat, May 06, 2006 at 03:12:11AM -0700, Josiah Carlson wrote:
> 
> "Vladimir 'Yu' Stepanov" <[EMAIL PROTECTED]> wrote:
> > Josiah Carlson wrote:
> > > This problem has nothing to do with dictionaries and hashing, it has to
> > > do with the fact that there may not be a total ordering on the elements
> > > of a sequence.
> > 
> > It is sad. I did not know it. Therefore and have not understood.
> > 
> > I have looked in Google on "python dev total ordering". On intentions to
> > correct this problem I have not found anything. It already earlier was
> > discussed ?
> 
> Not really.  It has to do with how non-comparable instances compare to
> each other.  Generally speaking, when instances aren't comparable,
> Python compares their type, which is actually comparing the names of
> types. This wouldn't be a big deal, except that:
> 
> >>> str < tuple < unicode
> True
> 
> And you can actually compare str and unicode, so, if you have a str that
> is greater than the unicode, you run into this issue.  With unicode
> becoming str in Py3k, we may not run into this issue much then, unless
> bytes are comparable to str, in which case we end up witht the same
> problems.
> 
> Actually, according to my google of "python dev total ordering", it
> gives me...
> 
> http://mail.python.org/pipermail/python-dev/2003-March/034169.html
> http://mail.python.org/pipermail/python-list/2003-March/154142.html
> 
> Which were earlier discussions on this topic, which are quite topical. 
> The ultimate solution is to choose a total ordering on types and
> consider the problem solved.  Then list.sort() and binary trees will
> work properly.
> 

Thanks for so detailed answer.

Under these references discussion is conducted is presented in the form
of "so is", instead of "as it to correct". Therefore I would like to
mention a question "as it to correct".

In case of incomparability of types can be it is meaningful
compare the identifiers compared to each concrete type. More truly
on them to calculate weight which will play a direct role in case
of impossibility of comparison of types.

Below one of variants of the decision of this problem is resulted.

To each type of data in Python to add three fields:
..
int tp_id;
int tp_qualifier;
int tp_weight;
..

Rigidly to appoint some built in and external to types
identifiers (tp_id). The others receive free identifiers on their
initialization. Except for that types should be classified on
their basic properties - tp_qualifier. The type can be
classified as:
0 = None
1 = integer (bool, int, long, etc..)
2 = float (decimal, float, etc..)
3 = math (complex, matrix may be ?? )
4 = string (str, unicode, splice, etc..)
5 = sequence (iterators, deque, xrange, enumerate, etc..)
6 = array (tuple, list, bytes, array, etc..)
7 = dictionary (dict, defdict, etc..)
8 = set (set, etc..)
9 = file (file, socket.socket, cStringIO.StringIO)

..
127 = non qualifier (sre.*, datetime.*, ClassType, IntsenceType, etc..)

It is the approximate list on which it will be convenient
to classify types (in my opinion).

The weight can be precisely set in structure or if be equal -1
should is calculated under the formula:

ob_type->tp_weight = (ob_type->tp_qualifier<<24) + (ob_type->tp_id<<8)

Thus objects with similar properties will follow one after
another. A problem can make external types. But if to develop
further classification, problems to arise should not.


-- 
SY. Vladimir Stepanov

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] total ordering.

2006-05-11 Thread Vladimir &#x27;Yu'; Stepanov
Guido van Rossum wrote:
> On 5/6/06, Vladimir Yu. Stepanov <[EMAIL PROTECTED]> wrote:
> [proposing a total ordering between types]
>
> It Ain't Gonna Happen. (From now on, I'll write this as IAGH.)
>
> In Python 3000, we'll actually *remove* ordering between arbitrary
> types as a feature; only types that explicitly care to be ordered with
> respect to one another will be ordered. Equality tests are unaffected,
> x==y will simply return False if x and y are of incomparable types;
> but x
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/vys%40renet.ru
>
>
>   
When comparison is made like ("something" < 123), generating of
exception is necessary. At sorting or use of binary trees it is not
so obvious. The behavior function of comparison in this case
depends on needs of the user. At present time use of redefined
function is complicated.

I shall give an example. By a call of a method sort for the list
can give absolutely different exceptions, depending on the order
of sorted values or data (thanks for David Mertz and Josiah
Carlson):
-
> >> [chr(255),1j,1,u't'].sort()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: no ordering relation is defined for complex numbers
> >> [chr(255),1j,u't',1].sort()
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: 
ordinal not in range(128)
-

If for Python-3000 similar it will be shown concerning types
str(), int(), complex() and so on, and the type of exceptions
will strongly vary, it will make problematic redefinition of
behavior of function of sorting.

It would be quite good to create one more class of exceptions
which would unify generation of mistakes at comparison of
non-comparable types or data.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] binary trees.

2006-05-11 Thread Vladimir &#x27;Yu'; Stepanov
Josiah Carlson wrote:
> And you can actually compare str and unicode, so, if you have a str that
> is greater than the unicode, you run into this issue.  With unicode
> becoming str in Py3k, we may not run into this issue much then, unless
> bytes are comparable to str, in which case we end up witht the same
> problems.
>
> Actually, according to my google of "python dev total ordering", it
> gives me...
>
> http://mail.python.org/pipermail/python-dev/2003-March/034169.html
> http://mail.python.org/pipermail/python-list/2003-March/154142.html
>
> Which were earlier discussions on this topic, which are quite topical. 
> The ultimate solution is to choose a total ordering on types and
> consider the problem solved.  Then list.sort(
>   

Under the second reference there was a question. complex it can be
partially comparable with int, long, float? In fact 1 == 1+0j?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] total ordering.

2006-05-16 Thread Vladimir &#x27;Yu'; Stepanov
Guido van Rossum wrote:
> On 5/11/06, Vladimir 'Yu' Stepanov <[EMAIL PROTECTED]> wrote:
>> If for Python-3000 similar it will be shown concerning types
>> str(), int(), complex() and so on, and the type of exceptions
>> will strongly vary, it will make problematic redefinition of
>> behavior of function of sorting.
>
> Not really. We'll just document that sort() should only be used on a
> list of objects that implement a total ordering. The behavior
> otherwise will simply be undefined; it will raise whatever exception
> is first raised by an unsupported comparison (most likely TypeError).
> In practice this won't be a problem.
>

In my opinion for functions of comparisons there is no additional
kind of exceptions. If made action is not operation of reduction of
type more often exception TypeError means a mistake of programming.
At creation of exception, characteristic only for comparison, it
is possible to involve methods `.__r(eq|ne|le|lt|ge|gt|cmp)__()'
not being afraid to hide a mistake of programming TypeError (sorry
for a tautology).

It will be possible it conveniently to use as exception of
management by a stream, for indication of necessity to involve
`.__r(eq|ne|le|lt|ge|gt|cmp)__()' a method. This kind of a class
can carry out function, similarly to StopIteration for `.next()'.
In case of unsuccessful comparison the user has an opportunity in
function `cmp' a method .sort() simple image to define the
behaviour necessary to it, including an opportunity of sorting of
diverse elements.

At present time similar function is carried out with exception
NotImplemented. This exception is generated in a number of
mathematical operations. For this reason I ask to consider an
opportunity of creation of a new class.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] total ordering.

2006-05-17 Thread Vladimir &#x27;Yu'; Stepanov

Jason Orendorff wrote:

On 5/11/06, Vladimir 'Yu' Stepanov <[EMAIL PROTECTED]> wrote:

If for Python-3000 similar it will be shown concerning types
str(), int(), complex() and so on, and the type of exceptions
will strongly vary, it will make problematic redefinition of
behavior of function of sorting.


I don't see what you mean by "redefinition of behavior of function of
sorting".  Is this something a Python programmer might want to do?
Can you give an example?



It can be shown concerning external types. When everyone them
we shall compare to any internal type, but not with each other.
It concerns functions and the methods realized through CPython
API. For example types of the whole greater numbers of different
mathematical libraries. Or in case of use of several different
representation of character arrays.

One more example of work with a database which is more
realistic. For Python-3000 it will be possible to use
comparison only compatible types. There can be a problem
to sort types of data incompatible for comparison.

conn = MySQLdb.connect(...)
cu = conn.cursor()
cu.execute("""
CREATE TABLE bigtable (
   id INT NOT NULL AUTO_INCREMENT,
   group_id INT NULL,
   value VARCHAR(255) NOT NULL,
   PRIMARY KEY (id)
)""")
for query in (
"INSERT INTO bigtable VALUES (NULL, 50, 'i am lazy')",
"INSERT INTO bigtable VALUES (NULL, 20, 'i am very lazy')",
"INSERT INTO bigtable VALUES (NULL, 19, 'i am very very lazy')",
"INSERT INTO bigtable VALUES (NULL, 40, 'i am very very very lazy :)')",
"INSERT INTO bigtable VALUES (NULL, NULL, 'Something different')"
):
   cu.execute(query)
cu.execute("SELECT * FROM bigtable ORDER BY id")
lst = [ r for r in cu ]
... do something in given order ...
lst.sort(cmp=lambda a, b: cmp(a[1], b[1]))


If it is required to sort records on a column group_id it
hardly will be possible to us for python-3000 as numbers in
a column group_id will alternate with None values. Certainly
it is possible to filter the list preliminary. But it will not
be effective. Especially, if to assume, that sorted values can
have some various incompatible types (it already beyond the
framework of the given example).

The essence of my offer consists in at first to admit check of
types due to fast mechanisms CPython of methods, and only in
case of incompatibility of types of data to use spare model of
comparison, only if it is necessary. Creation of uniform
exception should facilitate creation of a design try-except.

Class of exceptions TypeError have wide using. For functions
and methods of comparison the specialized class is necessary.
In fact nobody uses class StopIteration for generation of
messages on a error :)

Other example when classes of different modules are used, and
it is necessary to sort them (see attached "example1.py").


On 5/16/06, Vladimir 'Yu' Stepanov <[EMAIL PROTECTED]> wrote:

It will be possible it conveniently to use as exception of
management by a stream, for indication of necessity to involve
`.__r(eq|ne|le|lt|ge|gt|cmp)__()' a method. This kind of a class
can carry out function, similarly to StopIteration for `.next()'.


There are no .__r(eq|ne|le|lt|ge|gt|cmp)__() methods, for a logical
reason which you might enjoy deducing yourself...


Oops :) I has simply hastened. About __rcmp__ has once read
through, and only here has emerged in memory when about it
already all to think have forgotten. As inopportunely.




At present time similar function is carried out with exception
NotImplemented. This exception is generated in a number of
mathematical operations. For this reason I ask to consider an
opportunity of creation of a new class.


Can you explain this?  NotImplemented isn't an exception.
(NotImplementedError is, but that's something quite different.)


I was mistaken. Sorry. It is valid not a class of exceptions.

NotImplemented has exactly one purpose in Python, as far as I can
tell.  What mathematical operations do you mean?


At multiplication/division/addition/subtraction/binary of operation/etc
- if types are non-comparable (binary_op1 in abstract.c). Very universal
element. CPython API will transform returned element NotImplemented to
generation of exception TypeError. Whether it is good?

In my opinion thus it is possible to hide a mistake of programming. As
an example the attached file example2.py.

Thanks.

#!/usr/local/bin/python
import types

class CompareError:
pass

# class in module1
class A:
uniform_global_sort_criterion = 1
def __init__(self, val):
self.val = val
def __cmp__(self, other):
if type(self) is type(other) and self.__class__ == 

[Python-Dev] xrange vs. int.__getslice__

2006-06-13 Thread Vladimir &#x27;Yu'; Stepanov
You were bothered yet with function xrange ? :) I suggest to replace it.

-
for i in xrange(100): pass
vs.
for i in int[:100]: pass
-

-
for i in xrange(1000, 1020): pass
vs.
for i in int[1000:1020]: pass
-

-
for i in xrange(200, 100, -2): pass
vs.
for i in int[200:100:-2]: pass
-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] xrange vs. int.__getslice__

2006-06-13 Thread Vladimir &#x27;Yu'; Stepanov
Thomas Wouters wrote:
> http://www.python.org/dev/peps/pep-0204/
>
> (If you must really discuss this, which would probably be futile and 
> senseless, please do it on python-3000 only.)

Certainly looks very similar. PEP-204 demands change in a parser
and considers a new design as replacement to range functions. My
offer can be considered as replacement to xrange functions. Any
change in a syntactic design of language to spend it is not
necessary.

Thanks.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com