[issue32846] Deletion of large sets of strings is extra slow

2019-06-15 Thread Tim Peters
Tim Peters added the comment: Thanks, Terry! Based on your latest results, "quadratic time" isn't plausible here anymore, so I'm closing this. Nasty cache effects certainly played a role, but they were just a flea on the dog ;-) -- resolution: -> fixed stage: commit review -> reso

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: I reran the code in msg312188. Ints as before, string deletion +- linear up to 100 million, much better than before. millions old stringsnew strings of items create delete create delete 4 1.55.36 1.7.38 8 3.18.76

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
Tim Peters added the comment: Raymond, please read my very recent comment one above yours. A (overall) quadratic-time algorithm (O(A**2) where A is the number of arenas) in obmalloc.c is (to my eyes) probably the _primary_ cause of the sloth here. That's been fixed for 3.8, but I don't hav

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Raymond Hettinger
Raymond Hettinger added the comment: Can we close this now? ISTM the issue has less to do with sets and more to do with memory allocation quirks and that on modern CPUs random memory accesses are slower than sequential memory accesses. It is not a bug that sets are unordered collections th

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
Change by Tim Peters : -- stage: resolved -> commit review ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: ht

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
Tim Peters added the comment: Looks likely that the _major_ cause of the quadratic-time delete behavior was due to that obmalloc used a linear-time method to keep its linked list of usable arenas sorted in order of number of free pools. When a pool became unused, its arena's count of free p

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Inada Naoki
Change by Inada Naoki : -- resolution: wont fix -> stage: resolved -> status: closed -> open versions: +Python 3.9 -Python 3.7, Python 3.8 ___ Python tracker ___

[issue32846] Deletion of large sets of strings is extra slow

2019-03-11 Thread Inada Naoki
Inada Naoki added the comment: I thought compact set implementation similar to dict sicne Python 3.6 may fix this issue. But as discussion with Raymond on Python-Dev ML, I conclude the merits of compact implementation is not significant enough. I abandoned compact set branch. Now I don't hav

[issue32846] Deletion of large sets of strings is extra slow

2018-02-19 Thread INADA Naoki
INADA Naoki added the comment: @Luis, would you try dict instead of set? It's little larger than set, but delete elements by insertion order. But I don't think builtin data structure can be optimized for such workload. Maybe, LMBD[1] or some other KVS can help you. [1]: https://lmdb.readthed

[issue32846] Deletion of large sets of strings is extra slow

2018-02-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Reminds of some experiments someone did a while ago as part of the GIL removal attempts where the ref count integers are all kept in a separate array. The intent there was to be able to do locking on a single array rather than on individual decref cells. Th

[issue32846] Deletion of large sets of strings is extra slow

2018-02-17 Thread Luis Pedro Coelho
Luis Pedro Coelho added the comment: I think some of this conversation is going off-topic, but there is no disk-swapping in my case. I realize ours is not a typical setup, but our normal machines have 256GB of RAM and the "big memory" compute nodes are >=1TB. Normally, swap is outright disab

[issue32846] Deletion of large sets of strings is extra slow

2018-02-16 Thread INADA Naoki
INADA Naoki added the comment: > Dict is (surprisingly) little smaller than set. I'm sorry, I was wrong. dict is smaller than set only when len(d) is small. (typical case for namespace dict) In case of massive keys, dict is larger than set. -- ___

[issue32846] Deletion of large sets of strings is extra slow

2018-02-16 Thread INADA Naoki
INADA Naoki added the comment: One possible advice; try dict instead of set. Dict is (surprisingly) little smaller than set. And dict cleans items in insertion order when the dict is deleted. -- ___ Python tracker

[issue32846] Deletion of large sets of strings is extra slow

2018-02-16 Thread Tim Peters
Tim Peters added the comment: > Surprisingly, deleting a very large set takes much longer than creating it. Luis, that's not surprising ;-) When you create it, it's mostly the case that there's a vast chunk of raw memory from which many pieces are passed out in address order (to hold all the

[issue32846] Deletion of large sets of strings is extra slow

2018-02-16 Thread Terry J. Reedy
Terry J. Reedy added the comment: Luís, what you really need for the problem you outline is an immutable set with one operation, 'in', aside from create and delete. If specialized to strings only, it could internally stores only character sequences, rather than Python objects. I suspect som

[issue32846] Deletion of large sets of strings is extra slow

2018-02-16 Thread Raymond Hettinger
Raymond Hettinger added the comment: Thanks for the info. Here is what I surmise from the data. * The shape of the curve matches the expected memory latency curve for random memory accesses for a given working set size. The graph of your measurements looks similar in shape to this one:

[issue32846] Deletion of large sets of strings is extra slow

2018-02-16 Thread Luís Pedro Coelho
Luís Pedro Coelho added the comment: Original poster here. The benchmark is artificial, but the problem setting is not. I did have a problem that is roughly: interesting = set(line.strip() for line in open(...)) for line in open(...): key,rest = line.split('\t', 1) if

[issue32846] Deletion of large sets of strings is extra slow

2018-02-15 Thread Raymond Hettinger
Raymond Hettinger added the comment: Random idea: For some range of array sizes (bigger than L3 cache), there might be a net benefit to sorting L1-sized clumps of pointers before making the Py_DECREF calls. Possibly, the cost of sorting would be offset by improved memory access patterns. O

[issue32846] Deletion of large sets of strings is extra slow

2018-02-15 Thread Raymond Hettinger
Raymond Hettinger added the comment: The 4th graph in this article may be showing the reason for the growth in random access times as the size gets bigger: https://www.extremetech.com/extreme/188776-how-l1-and-l2-cpu-caches-work-and-why-theyre-an-essential-part-of-modern-chips -- ___

[issue32846] Deletion of large sets of strings is extra slow

2018-02-15 Thread Raymond Hettinger
Raymond Hettinger added the comment: > In former examples (list, ordered dict) objects are iterated and deleted >in order of creation. But in the set of strings items are deleted in > non-predicable random order. I suppose the non-linear effect is > related to memory management. That makes se

[issue32846] Deletion of large sets of strings is extra slow

2018-02-15 Thread Serhiy Storchaka
Change by Serhiy Storchaka : Added file: https://bugs.python.org/file47447/bench_del.py ___ Python tracker ___ ___ Python-bugs-list mailing l

[issue32846] Deletion of large sets of strings is extra slow

2018-02-15 Thread INADA Naoki
Change by INADA Naoki : -- nosy: +inada.naoki ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyt

[issue32846] Deletion of large sets of strings is extra slow

2018-02-15 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: For lists and dicts the time of deletion is virtually linear. 1 int 5.2 9.8 2 int 5.8 9.7 4 int 5.6 9.8 8 int 5.8 10.0 16 int 5.5 9.7 32 int 5.4 9.6 64 int 5.6 9.0 128 int 5.6 8.7 1 str 7.6 13.3 2 str 8.0 13.8 4 str 7

[issue32846] Deletion of large sets of strings is extra slow

2018-02-15 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- nosy: +serhiy.storchaka ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https:

[issue32846] Deletion of large sets of strings is extra slow

2018-02-14 Thread Terry J. Reedy
New submission from Terry J. Reedy : https://metarabbit.wordpress.com/2018/02/05/pythons-weak-performance-matters/, a blog post on cpython speed, clains "deleting a set of 1 billion strings takes >12 hours". (No other details provided.) I don't have the 100+ gigabytes of ram needed to confirm