[issue42368] Make set ordered

2020-11-16 Thread Dan Gheorghe Haiduc


New submission from Dan Gheorghe Haiduc :

Now that dicts are ordered[1], would it by any chance make sense to also order 
sets?

A use case that I ran into is trying to reproducibly get a random.choice from a 
set (after calling random.seed). 

At first I did not need reproducibility, and I just called 
random.choice(list(my_set)). 

Later when I did need it, it was difficult to find out what was wrong. Then I 
realized that sets are unordered, and that order is not dependent on 
random.seed.

It seems there are also some other confused newbies out there.[2][3]

Thank you for the powerful language that is Python!

[1] https://www.python.org/dev/peps/pep-0468
[2] https://stackoverflow.com/q/11929701/235463
[3] https://stackoverflow.com/q/36317520/235463

--
components: Interpreter Core
messages: 381088
nosy: danuker
priority: normal
severity: normal
status: open
title: Make set ordered
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue42368>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42368] Make set ordered

2020-11-16 Thread Dan Gheorghe Haiduc


Dan Gheorghe Haiduc  added the comment:

Oops. The reference number [2] in the previous message should instead be 
https://stackoverflow.com/q/6614447/235463 .

--

___
Python tracker 
<https://bugs.python.org/issue42368>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42368] Make set ordered

2020-11-16 Thread Dan Gheorghe Haiduc


Dan Gheorghe Haiduc  added the comment:

rhettinger wrote [1]:

> The existing setobject code has been finely tuned and micro-optimized over 
> the years, giving it excellent performance on workloads we care about.

This worries me also. But I suppose the tuning should make itself visible in 
benchmarks. Does Python have "canonical" benchmarks or  speed tests like 
PyPy[2]?

I am not sure how useful this is, but I made a naive and synthetic 
line_profiler benchmark of a naive implementation of set through a dict (see 
attached file).

It resulted in roughly the following differences:

* Initialization from a 1M-sized list: 41-66% slower
* Inserting an item until doubling size: about the same (perhaps faster, due to 
the size I've chosen not triggering rebuilding of a dict hash table)
* Deletion: 7-11% slower
* Membership test: 2-15% slower

Running them in the opposite order (first dict, then set) gave me the ranges.

I have not considered memory usage (I have not profiled memory before). But I 
suspect this would be larger, since a dict would keep values in addition to 
keys.

Additionally, initializing smaller structures (length = 100) seems to be 
slower; the initialization takes 2x longer (100% slower), but the O(1) 
operations take about the same.

I suspect methane's implementation linked by xtreak is better (but I have not 
tried it).

Profiled with:
kernprof -l set_test.py
python -m line_profiler set_test.py.lprof

[1] https://mail.python.org/pipermail/python-dev/2019-February/156475.html

[2] https://speed.pypy.org/

--
Added file: https://bugs.python.org/file49602/set_test.py

___
Python tracker 
<https://bugs.python.org/issue42368>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com