[Python-Dev] Re: Should set objects maintain insertion order too?
On Tue, 24 Dec 2019 12:08:33 +0900 "Stephen J. Turnbull" wrote: > David Mertz writes: > > > Even though I was the first person in this thread to suggest > > collections.OrderedSet, I'm "meh" about it now. As I read more and played > > with the sortedcollections package, it seemed to me that while I might want > > a set that iterated in a determinate and meaningful order moderately often, > > insertion order would make up a small share of those use cases. > > On the other hand, insertion order is one of the most prominent of the > determinate meaningful orders where you would have to do ugly things > to use "sorted" to get that order. Any application where you have an > unreliable message bus feeding a queue (so that you might get > duplicate objects but it's bad to process the same object twice) would > be a potential application of insertion-ordered sets. In that case you probably want a separate persistent "seen" set. Because your queue can have been drained by the time a duplicate object arrives. (which means you probably want something more efficient, such as a sequence number) Regards Antoine. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QANSVP7LU4OVMGKIGM25CJ4W24YYYVBD/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2019-12-20 - 2019-12-27) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open7203 ( +3) closed 43688 (+26) total 50891 (+29) Open issues with patches: 2824 Issues opened (21) == #39111: Misleading documentation for NotImplemented https://bugs.python.org/issue39111 reopened by steven.daprano #39112: Misleading documentation for tuple https://bugs.python.org/issue39112 opened by sberens #39114: Python 3.9.0a2 changed how finally/return is traced https://bugs.python.org/issue39114 opened by nedbat #39116: StreamReader.readexactly() raises GeneratorExit on ProactorEve https://bugs.python.org/issue39116 opened by twisteroid ambassador #39117: Performance regression for making bound methods https://bugs.python.org/issue39117 opened by rhettinger #39121: gzip header write OS field https://bugs.python.org/issue39121 opened by wungad #39122: Environment variable PYTHONUSERBASE is not set during customiz https://bugs.python.org/issue39122 opened by sarmar11 #39123: PyThread_xxx() not available when using limited API https://bugs.python.org/issue39123 opened by VZ #39125: Type signature of @property not shown in help() https://bugs.python.org/issue39125 opened by McSinyx #39126: Non-bmp (astral) unicode characters confuse the editor https://bugs.python.org/issue39126 opened by dmaxime #39127: _Py_HashPointer's void * argument should be const https://bugs.python.org/issue39127 opened by petdance #39128: Document happy eyeball parameters in loop.create_connection si https://bugs.python.org/issue39128 opened by xtreak #39129: Incorrect import of TimeoutError while creating happy eyeballs https://bugs.python.org/issue39129 opened by xtreak #39130: Dict is reversable from v3.8 and should say that in the doc https://bugs.python.org/issue39130 opened by khalidmammadov #39131: signing needs two serialisation passes https://bugs.python.org/issue39131 opened by jap #39133: threading lib. working improperly on idle window https://bugs.python.org/issue39133 opened by Pyjeet #39134: can't construct dataclass as ABC (or runtime check as data pro https://bugs.python.org/issue39134 opened by cybertreiber #39136: Typos in whatsnew file and docs https://bugs.python.org/issue39136 opened by xtreak #39137: create_unicode_buffer() gives different results on Windows vs https://bugs.python.org/issue39137 opened by lazka #39138: import a pycapsule object that's attached on many modules https://bugs.python.org/issue39138 opened by yorkie #39139: Reference to depricated collections.abc class in collections i https://bugs.python.org/issue39139 opened by khalidmammadov Most recent 15 issues with no replies (15) == #39137: create_unicode_buffer() gives different results on Windows vs https://bugs.python.org/issue39137 #39136: Typos in whatsnew file and docs https://bugs.python.org/issue39136 #39130: Dict is reversable from v3.8 and should say that in the doc https://bugs.python.org/issue39130 #39127: _Py_HashPointer's void * argument should be const https://bugs.python.org/issue39127 #39123: PyThread_xxx() not available when using limited API https://bugs.python.org/issue39123 #39122: Environment variable PYTHONUSERBASE is not set during customiz https://bugs.python.org/issue39122 #39116: StreamReader.readexactly() raises GeneratorExit on ProactorEve https://bugs.python.org/issue39116 #39104: ProcessPoolExecutor hangs on shutdown nowait with pickling fai https://bugs.python.org/issue39104 #39101: IsolatedAsyncioTestCase freezes when exception is raised https://bugs.python.org/issue39101 #39100: email.policy.SMTP throws AttributeError on invalid header https://bugs.python.org/issue39100 #39098: OSError: handle closed, ProcessPoolExecutor shutdown(wait=Fals https://bugs.python.org/issue39098 #39092: Csv sniffer doesn't attempt to determine and set escape charac https://bugs.python.org/issue39092 #39089: Update IDLE's credits https://bugs.python.org/issue39089 #39088: test_concurrent_futures crashed with python.core core dump on https://bugs.python.org/issue39088 #39072: Azure Pipelines: git clone failed with: OpenSSL SSL_read: Conn https://bugs.python.org/issue39072 Most recent 15 issues waiting for review (15) = #39139: Reference to depricated collections.abc class in collections i https://bugs.python.org/issue39139 #39131: signing needs two serialisation passes https://bugs.python.org/issue39131 #39130: Dict is reversable from v3.8 and should say that in the doc https://bugs.python.org/issue39130 #39129: Incorrect import of TimeoutError while creating happy eyeballs https://bugs.python.org/issue39129 #39127: _Py_HashPointer's void * argument should be const https://bugs.python.org/issue39127 #39121: gzip header writ
[Python-Dev] Re: Should set objects maintain insertion order too?
[Nick Coghlan ] > I took Larry's request a slightly different way: he has a use case where > he wants order preservation (so built in sets aren't good), but combined > with low cost duplicate identification and elimination and removal of > arbitrary elements (so lists and collections.deque aren't good). Organising > a work queue that way seems common enough ... Is it? I confess I haven't thought of a plausible use case. Larry didn't really explain his problem, just suggested that an ordered set would be "a solution" to it. The problem: whether there are duplicates "in the queue" is a question about an implementation detail, hard for me to translate to a question about the _task_ to be solved. For example, suppose "the task" is to make a deep copy of a web site. A "job" consists of following a URL, sucking down the page text, and adding new jobs for contained URLs on the page. We probably don't want to suck down the page text multiple times for a given URL, but checking whether a URL is currently already in the job queue is asking a question about an implementation detail that misses the point: we want to know whether that URL has already been chased, period. Whether the URL is currently in the queue is irrelevant to that. The only logical relationship is that a job that has finished _was_ in the queue at some time before the job finished. So, ya, I've seen and implemented lots of work queues along these lines - but an OrderedSet would be an "attractive nuisance" (offhandedly appearing to solve a problem it doesn't actually address): jobs = some_kind_of_queue() finished_jobs = set() ... while jobs: job = jobs.get() if job in finished_jobs: continue try: work on the job possibly adding (sub)jobs to the `jobs` queue except TransientExceptions: jobs.put(job) # try again later else: finished_jobs.add(job) Clear? There is a queue here, and a set, but they can't be combined in a useful way: the set contents have nothing to do with what's currently in the queue - the set consists of jobs that have been successfully completed; the queue doesn't care whether it contains duplicates, and merely skips over jobs that have been completed by the time the queue spits them out (which strikes me as more robust design than duplicating logic everywhere a job is added to a queue to try to prevent adding already-done jobs to begin with). Similarly, in other places I've used a "finished" flag on the object instead of a set, or a dict mapping jobs to info about job status and results. But in all these cases, the vital info associated with a job really has little to do with whether the job is currently sitting on a queue. If "is it currently on the queue?" _is_ a common question to ask, perhaps someone could give a semi-realistic example? ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2XDEVSO5S5ZLVZTLX43UHBOJWMXYJIIB/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Should set objects maintain insertion order too?
> On Dec 27, 2019, at 19:48, Tim Peters wrote: > > So, ya, I've seen and implemented lots of work queues along these > lines - but an OrderedSet would be an "attractive nuisance" > (offhandedly appearing to solve a problem it doesn't actually > address): > > jobs = some_kind_of_queue() > finished_jobs = set() > ... > while jobs: > job = jobs.get() > if job in finished_jobs: > continue > try: > work on the job > possibly adding (sub)jobs to the `jobs` queue > except TransientExceptions: > jobs.put(job) # try again later > else: > finished_jobs.add(job) Well, if an OrderedSet were designed to gracefully handle resizes during iteration, something like this may make sense: jobs = OrderedSet(initial_jobs) for job in jobs: new_jobs = process(job) jobs |= new_jobs ... # jobs is now a set of every job processed A dictionary with None values comes close if you replace the union line with a jobs.update(new_jobs) call (and ignore resizing issues), but it breaks because repeated jobs are shuffled to the end of the sequence and would be processed again. Brandt ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SXGD5G5EY4YMXDG42AU7OSNVCUUU25DI/ Code of Conduct: http://python.org/psf/codeofconduct/