[Python-Dev] Complexity documentation request

2008-03-08 Thread Dimitrios Apostolou
Hello all,

Is it possible to include algorithm complexity information for the various 
built-in types (strings, sets, lists, dictionaries...)? This would ease 
the decision for choosing the correct type. The information could simply 
be added as a new column in the tables found on pages as the following:

http://docs.python.org/lib/types-set.html
http://docs.python.org/lib/typesseq.html


It took me a while to find some information for my purposes, however I'm 
not sure whether it's outdated or incomplete. The best sources I found are 
python-list archive and a PEP:

http://www.python.org/dev/peps/pep-3128/
http://mail.python.org/pipermail/python-list/2007-June/446877.html


Nevertheless, algorithm complexity is not always the answer. I remember 
some years ago I preferred using "x in list" rather than "x in set" as 
member checking in a tight loop, because the list was rather small (10-20 
elements) and the overhead for the set was far greater than the better 
algorithm complexity advantage. So if this information is available, it 
would be nice to document constant time factors too. :-)


Thanks in advance,
Dimitris

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Complexity documentation request

2008-03-10 Thread Dimitrios Apostolou
Hello again,

Guido van Rossum wrote:
> Well, there you have hit the nail on the head -- should we document
> the actual or the guaranteed O() expression? I think this is a can of
> worms better left unopened. At best we should include some hints to

I will open this can and say that average case complexity is the most 
important. For example sorting O(nlogn) and dict lookup O(1). But 
because worst case is important too, it would be nice (but not 
necessary) if there was an extra column on the table with that 
information, or blank if not applicable.

> clarify that random access to list and tuple elements is constant time
> in CPython, and that dicts and sets are implemented using a hash table
> with open hashing -- readers can draw their own conclusions from that.

Notes are nice and already exist at random places in the *huge* 
documentation archive. But I still believe that the best place for this 
are the already existing tables in the docs (I pointed them at my 
initial post). One should trivially be able to find this information.

On the other hand notes could be added for various oddities according to 
experience. For example that for sets up to 10(?) elements, a list would 
probably be better because of the hashing overhead.

> What other implementations do should be up to them -- it becomes a
> Quality of Implementation issue.

I agree that CPython docs should document CPython behaviour. This would 
be the most helpful for someone writing a program in CPython. People who 
use other implementations should consult the appropriate docs. 
Implementors with worst algorithms should try to reach CPython efficiency.

> 
> Regarding the OP's need for this kind of information (deciding between
> the various standard data types), I would recommend a different
> approach -- pick the data type that produces the shortest code. In all
> likelihood it's also the most efficient.

Hmmm, the first thing that comes to mind is prepending an item in a list 
which is small and seems nice, but is O(n) I think! And besides that, 
for someone who *cares* about his code good looks is not enough...


Thanks for your answers,
Dimitris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Complexity documentation request

2008-03-10 Thread Dimitrios Apostolou
Fred Drake wrote:
> On Mar 9, 2008, at 10:22 AM, Aahz wrote:
>> This has been discussed before and rejected for two reasons:
>>
>> * Other Python implementations (Jython, PyPy, IronPython) may not be
>> able to provide the same type implementations
>>
>> * Algorithmic information does sometimes change between versions, and
>> keeping the docs updated is not trivial
> 
> Also, given the significance of the constant factors and the fact that  
> these are significantly dependent, especially for containers, on the  
> objects passed in (either to be contained or tested), it's not clear  
> that it often makes sense to worry about at too detailed a level.   
> Common sense, knowledge of the interpreter, and experience are often  
> more valuable the easily-outdated documentation.

Fair enough but the fact is that this documentation already exists, at 
random locations unfortunately. Who would expect to find such valuable 
info in a rejected PEP (3128)! I will agree that experience and 
interpreter inside knowledge is most valuable for choosing the right 
structure, but isn't this too much for occasional python programmers?


Thanks,
Dimitris


> 
> 
>-Fred
> 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Complexity documentation request

2008-03-11 Thread Dimitrios Apostolou
Daniel Stutzbach wrote:
> On Sun, Mar 9, 2008 at 9:22 AM, Aahz <[EMAIL PROTECTED]> wrote:
>>  There probably would be some value in a wiki page on python.org that
>>  provides this information, particularly across versions.  You may be
>>  able to find volunteers to help on comp.lang.python.
> 
> I just created a very basic one at
> http://wiki.python.org/moin/TimeComplexity?action=show
> 
> I'm not that familiar with the Wiki syntax, so the tables are kind of
> ugly at the moment.
> 
> I wasn't sure about many of the set() operations, so I didn't include those.
> 

Thanks! I'm interested too in some operations I don't know about, I 
think I'll just add them with blank fields so that eventually people who 
know fill them out.


Dimitris

P.S. This wiki is really bad in tables: I can't figure out how to draw a 
border for the tables and every table element contains a  tag, making 
the cell spanning 2-3 lines of height... :-@


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Complexity documentation request

2008-03-12 Thread Dimitrios Apostolou
Daniel Stutzbach wrote:
> I just created a very basic one at
> http://wiki.python.org/moin/TimeComplexity?action=show

Hi,

Just one quick note. What exactly do you mean by "Amortized worst case"? 
Shouldn't it just be "Worst case"? I think that the word "amortized" 
better describes the time complexity of specific operations.

For example  I think that the insertion in a dictionary should be noted 
as "O(1) amortized" for the average case meaning that when doing 
infinite random insertions, the time asymptotically tends to be 
constant. And worst case is simply O(n), not amortized. Am I missing 
something?


Thanks,
Dimitris


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Complexity documentation request

2008-03-13 Thread Dimitrios Apostolou
Daniel Stutzbach wrote:
> On Wed, Mar 12, 2008 at 2:52 PM, Dimitrios Apostolou <[EMAIL PROTECTED]> 
> wrote:
>>  Just one quick note. What exactly do you mean by "Amortized worst case"?
>>  Shouldn't it just be "Worst case"? I think that the word "amortized"
>>  better describes the time complexity of specific operations.
> 
> http://en.wikipedia.org/wiki/Amortized_analysis
> 

Thanks for this, I understand now what it means. However given that for 
the list and deque types both columns have exactly the same values, 
wouldn't it be more useful if we simply mentioned the worst case in the 
second, or another, column?

On another note which sorting algorithm is python using? Perhaps we can 
add this as a footnote. I always thought it was quicksort, with a worst 
case of O(n^2).


Thanks,
Dimitris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Complexity documentation request

2008-03-15 Thread Dimitrios Apostolou
Hi,

I just dug into the source code looking for complexity of set 
operations. In the wiki page I documented an interesting finding, that 
it is different to do s-t and s.difference(t). It is also interesting 
that you can do the first only for sets, but the second for every 
iterable in t.

Are these portable characteristics of the python language or just 
implementation specific details? In addition, can someone explain me the 
usefulness of the loop starting with 'if (PyDict_CheckExact(other))' in 
set_difference()? As I understand it set_difference() is always called 
with two sets as arguments (set_sub() does the actual call).

I'm just trying to figure out the complexity of the other set 
operations, but things get more complicated. I'd appreciate your help.


Thanks,
Dimitris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Complexity documentation request

2008-03-15 Thread Dimitrios Apostolou
Correcting myself:

Dimitrios Apostolou wrote:
> Hi,
> 
> I just dug into the source code looking for complexity of set 
> operations. In the wiki page I documented an interesting finding, that 
> it is different to do s-t and s.difference(t). It is also interesting 

it is different to do s-t than s.difference_update(t), as fas as 
complexity is involved. The first one is O(len(s)) while the second is 
O(len(t)) (I *think so, I may have missed lots of things in the source 
code).

> that you can do the first only for sets, but the second for every 
> iterable in t.
> 
> Are these portable characteristics of the python language or just 
> implementation specific details? In addition, can someone explain me the 

I just found it documented in the library reference, that s.method() can 
accept any iterable while s-t can't. So I guess it is a language 
characteristic.

> usefulness of the loop starting with 'if (PyDict_CheckExact(other))' in 
> set_difference()? As I understand it set_difference() is always called 
> with two sets as arguments (set_sub() does the actual call).
> 
> I'm just trying to figure out the complexity of the other set 
> operations, but things get more complicated. I'd appreciate your help.
> 
> 
> Thanks,
> Dimitris


P.S. Who is the wiki admin? I'm desperately trying to improve the looks 
of tables (Add border, remove the  element from every cell) but I 
can't. I think that the page stylesheet needs to be modified, for 
starters...

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com