[Python-Dev] Complexity documentation request
Hello all, Is it possible to include algorithm complexity information for the various built-in types (strings, sets, lists, dictionaries...)? This would ease the decision for choosing the correct type. The information could simply be added as a new column in the tables found on pages as the following: http://docs.python.org/lib/types-set.html http://docs.python.org/lib/typesseq.html It took me a while to find some information for my purposes, however I'm not sure whether it's outdated or incomplete. The best sources I found are python-list archive and a PEP: http://www.python.org/dev/peps/pep-3128/ http://mail.python.org/pipermail/python-list/2007-June/446877.html Nevertheless, algorithm complexity is not always the answer. I remember some years ago I preferred using "x in list" rather than "x in set" as member checking in a tight loop, because the list was rather small (10-20 elements) and the overhead for the set was far greater than the better algorithm complexity advantage. So if this information is available, it would be nice to document constant time factors too. :-) Thanks in advance, Dimitris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Complexity documentation request
Hello again, Guido van Rossum wrote: > Well, there you have hit the nail on the head -- should we document > the actual or the guaranteed O() expression? I think this is a can of > worms better left unopened. At best we should include some hints to I will open this can and say that average case complexity is the most important. For example sorting O(nlogn) and dict lookup O(1). But because worst case is important too, it would be nice (but not necessary) if there was an extra column on the table with that information, or blank if not applicable. > clarify that random access to list and tuple elements is constant time > in CPython, and that dicts and sets are implemented using a hash table > with open hashing -- readers can draw their own conclusions from that. Notes are nice and already exist at random places in the *huge* documentation archive. But I still believe that the best place for this are the already existing tables in the docs (I pointed them at my initial post). One should trivially be able to find this information. On the other hand notes could be added for various oddities according to experience. For example that for sets up to 10(?) elements, a list would probably be better because of the hashing overhead. > What other implementations do should be up to them -- it becomes a > Quality of Implementation issue. I agree that CPython docs should document CPython behaviour. This would be the most helpful for someone writing a program in CPython. People who use other implementations should consult the appropriate docs. Implementors with worst algorithms should try to reach CPython efficiency. > > Regarding the OP's need for this kind of information (deciding between > the various standard data types), I would recommend a different > approach -- pick the data type that produces the shortest code. In all > likelihood it's also the most efficient. Hmmm, the first thing that comes to mind is prepending an item in a list which is small and seems nice, but is O(n) I think! And besides that, for someone who *cares* about his code good looks is not enough... Thanks for your answers, Dimitris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Complexity documentation request
Fred Drake wrote: > On Mar 9, 2008, at 10:22 AM, Aahz wrote: >> This has been discussed before and rejected for two reasons: >> >> * Other Python implementations (Jython, PyPy, IronPython) may not be >> able to provide the same type implementations >> >> * Algorithmic information does sometimes change between versions, and >> keeping the docs updated is not trivial > > Also, given the significance of the constant factors and the fact that > these are significantly dependent, especially for containers, on the > objects passed in (either to be contained or tested), it's not clear > that it often makes sense to worry about at too detailed a level. > Common sense, knowledge of the interpreter, and experience are often > more valuable the easily-outdated documentation. Fair enough but the fact is that this documentation already exists, at random locations unfortunately. Who would expect to find such valuable info in a rejected PEP (3128)! I will agree that experience and interpreter inside knowledge is most valuable for choosing the right structure, but isn't this too much for occasional python programmers? Thanks, Dimitris > > >-Fred > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Complexity documentation request
Daniel Stutzbach wrote: > On Sun, Mar 9, 2008 at 9:22 AM, Aahz <[EMAIL PROTECTED]> wrote: >> There probably would be some value in a wiki page on python.org that >> provides this information, particularly across versions. You may be >> able to find volunteers to help on comp.lang.python. > > I just created a very basic one at > http://wiki.python.org/moin/TimeComplexity?action=show > > I'm not that familiar with the Wiki syntax, so the tables are kind of > ugly at the moment. > > I wasn't sure about many of the set() operations, so I didn't include those. > Thanks! I'm interested too in some operations I don't know about, I think I'll just add them with blank fields so that eventually people who know fill them out. Dimitris P.S. This wiki is really bad in tables: I can't figure out how to draw a border for the tables and every table element contains a tag, making the cell spanning 2-3 lines of height... :-@ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Complexity documentation request
Daniel Stutzbach wrote: > I just created a very basic one at > http://wiki.python.org/moin/TimeComplexity?action=show Hi, Just one quick note. What exactly do you mean by "Amortized worst case"? Shouldn't it just be "Worst case"? I think that the word "amortized" better describes the time complexity of specific operations. For example I think that the insertion in a dictionary should be noted as "O(1) amortized" for the average case meaning that when doing infinite random insertions, the time asymptotically tends to be constant. And worst case is simply O(n), not amortized. Am I missing something? Thanks, Dimitris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Complexity documentation request
Daniel Stutzbach wrote: > On Wed, Mar 12, 2008 at 2:52 PM, Dimitrios Apostolou <[EMAIL PROTECTED]> > wrote: >> Just one quick note. What exactly do you mean by "Amortized worst case"? >> Shouldn't it just be "Worst case"? I think that the word "amortized" >> better describes the time complexity of specific operations. > > http://en.wikipedia.org/wiki/Amortized_analysis > Thanks for this, I understand now what it means. However given that for the list and deque types both columns have exactly the same values, wouldn't it be more useful if we simply mentioned the worst case in the second, or another, column? On another note which sorting algorithm is python using? Perhaps we can add this as a footnote. I always thought it was quicksort, with a worst case of O(n^2). Thanks, Dimitris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Complexity documentation request
Hi, I just dug into the source code looking for complexity of set operations. In the wiki page I documented an interesting finding, that it is different to do s-t and s.difference(t). It is also interesting that you can do the first only for sets, but the second for every iterable in t. Are these portable characteristics of the python language or just implementation specific details? In addition, can someone explain me the usefulness of the loop starting with 'if (PyDict_CheckExact(other))' in set_difference()? As I understand it set_difference() is always called with two sets as arguments (set_sub() does the actual call). I'm just trying to figure out the complexity of the other set operations, but things get more complicated. I'd appreciate your help. Thanks, Dimitris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Complexity documentation request
Correcting myself: Dimitrios Apostolou wrote: > Hi, > > I just dug into the source code looking for complexity of set > operations. In the wiki page I documented an interesting finding, that > it is different to do s-t and s.difference(t). It is also interesting it is different to do s-t than s.difference_update(t), as fas as complexity is involved. The first one is O(len(s)) while the second is O(len(t)) (I *think so, I may have missed lots of things in the source code). > that you can do the first only for sets, but the second for every > iterable in t. > > Are these portable characteristics of the python language or just > implementation specific details? In addition, can someone explain me the I just found it documented in the library reference, that s.method() can accept any iterable while s-t can't. So I guess it is a language characteristic. > usefulness of the loop starting with 'if (PyDict_CheckExact(other))' in > set_difference()? As I understand it set_difference() is always called > with two sets as arguments (set_sub() does the actual call). > > I'm just trying to figure out the complexity of the other set > operations, but things get more complicated. I'd appreciate your help. > > > Thanks, > Dimitris P.S. Who is the wiki admin? I'm desperately trying to improve the looks of tables (Add border, remove the element from every cell) but I can't. I think that the page stylesheet needs to be modified, for starters... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com