[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-27 Thread Antoine Pitrou
On Tue, 24 Dec 2019 12:08:33 +0900
"Stephen J. Turnbull"  wrote:
> David Mertz writes:
> 
>  > Even though I was the first person in this thread to suggest
>  > collections.OrderedSet, I'm "meh" about it now. As I read more and played
>  > with the sortedcollections package, it seemed to me that while I might want
>  > a set that iterated in a determinate and meaningful order moderately often,
>  > insertion order would make up a small share of those use cases.  
> 
> On the other hand, insertion order is one of the most prominent of the
> determinate meaningful orders where you would have to do ugly things
> to use "sorted" to get that order.  Any application where you have an
> unreliable message bus feeding a queue (so that you might get
> duplicate objects but it's bad to process the same object twice) would
> be a potential application of insertion-ordered sets.

In that case you probably want a separate persistent "seen" set.
Because your queue can have been drained by the time a duplicate object
arrives.

(which means you probably want something more efficient, such as a
sequence number)

Regards

Antoine.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QANSVP7LU4OVMGKIGM25CJ4W24YYYVBD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Summary of Python tracker Issues

2019-12-27 Thread Python tracker


ACTIVITY SUMMARY (2019-12-20 - 2019-12-27)
Python tracker at https://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open7203 ( +3)
  closed 43688 (+26)
  total  50891 (+29)

Open issues with patches: 2824 


Issues opened (21)
==

#39111: Misleading documentation for NotImplemented
https://bugs.python.org/issue39111  reopened by steven.daprano

#39112: Misleading documentation for tuple
https://bugs.python.org/issue39112  opened by sberens

#39114: Python 3.9.0a2 changed how finally/return is traced
https://bugs.python.org/issue39114  opened by nedbat

#39116: StreamReader.readexactly() raises GeneratorExit on ProactorEve
https://bugs.python.org/issue39116  opened by twisteroid ambassador

#39117: Performance regression for making bound methods
https://bugs.python.org/issue39117  opened by rhettinger

#39121: gzip header write OS field
https://bugs.python.org/issue39121  opened by wungad

#39122: Environment variable PYTHONUSERBASE is not set during customiz
https://bugs.python.org/issue39122  opened by sarmar11

#39123: PyThread_xxx() not available when using limited API
https://bugs.python.org/issue39123  opened by VZ

#39125: Type signature of @property not shown in help()
https://bugs.python.org/issue39125  opened by McSinyx

#39126: Non-bmp (astral) unicode characters confuse the editor
https://bugs.python.org/issue39126  opened by dmaxime

#39127: _Py_HashPointer's void * argument should be const
https://bugs.python.org/issue39127  opened by petdance

#39128: Document happy eyeball parameters in loop.create_connection si
https://bugs.python.org/issue39128  opened by xtreak

#39129: Incorrect import of TimeoutError while creating happy eyeballs
https://bugs.python.org/issue39129  opened by xtreak

#39130: Dict is reversable from v3.8 and should say that in the doc
https://bugs.python.org/issue39130  opened by khalidmammadov

#39131: signing needs two serialisation passes
https://bugs.python.org/issue39131  opened by jap

#39133: threading lib. working improperly on idle window
https://bugs.python.org/issue39133  opened by Pyjeet

#39134: can't construct dataclass as ABC (or runtime check as data pro
https://bugs.python.org/issue39134  opened by cybertreiber

#39136: Typos in whatsnew file and docs
https://bugs.python.org/issue39136  opened by xtreak

#39137: create_unicode_buffer() gives different results on Windows vs 
https://bugs.python.org/issue39137  opened by lazka

#39138: import a pycapsule object that's attached on many modules
https://bugs.python.org/issue39138  opened by yorkie

#39139: Reference to depricated collections.abc class in collections i
https://bugs.python.org/issue39139  opened by khalidmammadov



Most recent 15 issues with no replies (15)
==

#39137: create_unicode_buffer() gives different results on Windows vs 
https://bugs.python.org/issue39137

#39136: Typos in whatsnew file and docs
https://bugs.python.org/issue39136

#39130: Dict is reversable from v3.8 and should say that in the doc
https://bugs.python.org/issue39130

#39127: _Py_HashPointer's void * argument should be const
https://bugs.python.org/issue39127

#39123: PyThread_xxx() not available when using limited API
https://bugs.python.org/issue39123

#39122: Environment variable PYTHONUSERBASE is not set during customiz
https://bugs.python.org/issue39122

#39116: StreamReader.readexactly() raises GeneratorExit on ProactorEve
https://bugs.python.org/issue39116

#39104: ProcessPoolExecutor hangs on shutdown nowait with pickling fai
https://bugs.python.org/issue39104

#39101: IsolatedAsyncioTestCase freezes when exception is raised
https://bugs.python.org/issue39101

#39100: email.policy.SMTP throws AttributeError on invalid header
https://bugs.python.org/issue39100

#39098: OSError: handle closed, ProcessPoolExecutor shutdown(wait=Fals
https://bugs.python.org/issue39098

#39092: Csv sniffer doesn't attempt to determine and set escape charac
https://bugs.python.org/issue39092

#39089: Update IDLE's credits
https://bugs.python.org/issue39089

#39088: test_concurrent_futures crashed with python.core core dump on 
https://bugs.python.org/issue39088

#39072: Azure Pipelines: git clone failed with: OpenSSL SSL_read: Conn
https://bugs.python.org/issue39072



Most recent 15 issues waiting for review (15)
=

#39139: Reference to depricated collections.abc class in collections i
https://bugs.python.org/issue39139

#39131: signing needs two serialisation passes
https://bugs.python.org/issue39131

#39130: Dict is reversable from v3.8 and should say that in the doc
https://bugs.python.org/issue39130

#39129: Incorrect import of TimeoutError while creating happy eyeballs
https://bugs.python.org/issue39129

#39127: _Py_HashPointer's void * argument should be const
https://bugs.python.org/issue39127

#39121: gzip header writ

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-27 Thread Tim Peters
[Nick Coghlan ]
> I took Larry's request a slightly different way: he has a use case where
> he wants order preservation (so built in sets aren't good), but combined
> with low cost duplicate identification and elimination and removal of
> arbitrary elements (so lists and collections.deque aren't good). Organising
> a work queue that way seems common enough ...

Is it?  I confess I haven't thought of a plausible use case.  Larry
didn't really explain his problem, just suggested that an ordered set
would be "a solution" to it.

The problem:  whether there are duplicates "in the queue" is a
question about an implementation detail, hard for me to translate to a
question about the _task_ to be solved.

For example, suppose "the task" is to make a deep copy of a web site.
A "job" consists of following a URL, sucking down the page text, and
adding new jobs for contained URLs on the page.

We probably don't want to suck down the page text multiple times for a
given URL, but checking whether a URL is currently already in the job
queue is asking a question about an implementation detail that misses
the point:  we want to know whether that URL has already been chased,
period.  Whether the URL is currently in the queue is irrelevant to
that.  The only logical relationship is that a job that has finished
_was_ in the queue at some time before the job finished.

So, ya, I've seen and implemented lots of work queues along these
lines - but an OrderedSet would be an "attractive nuisance"
(offhandedly appearing to solve a problem it doesn't actually
address):

jobs = some_kind_of_queue()
finished_jobs = set()
...
while jobs:
job = jobs.get()
if job in finished_jobs:
continue
try:
work on the job
possibly adding (sub)jobs to the `jobs` queue
except TransientExceptions:
jobs.put(job)  # try again later
else:
finished_jobs.add(job)

Clear?  There is a queue here, and a set, but they can't be combined
in a useful way:  the set contents have nothing to do with what's
currently in the queue - the set consists of jobs that have been
successfully completed; the queue doesn't care whether it contains
duplicates, and merely skips over jobs that have been completed by the
time the queue spits them out (which strikes me as more robust design
than duplicating logic everywhere a job is added to a queue to try to
prevent adding already-done jobs to begin with).

Similarly, in other places I've used a "finished" flag on the object
instead of a set, or a dict mapping jobs to info about job status and
results.  But in all these cases, the vital info associated with a job
really has little to do with whether the job is currently sitting on a
queue.

If "is it currently on the queue?" _is_ a common question to ask,
perhaps someone could give a semi-realistic example?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2XDEVSO5S5ZLVZTLX43UHBOJWMXYJIIB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-27 Thread Brandt Bucher
> On Dec 27, 2019, at 19:48, Tim Peters  wrote:
> 
> So, ya, I've seen and implemented lots of work queues along these
> lines - but an OrderedSet would be an "attractive nuisance"
> (offhandedly appearing to solve a problem it doesn't actually
> address):
> 
>  jobs = some_kind_of_queue()
>  finished_jobs = set()
>  ...
>  while jobs:
>  job = jobs.get()
>  if job in finished_jobs:
>  continue
>  try:
>  work on the job
>  possibly adding (sub)jobs to the `jobs` queue
>  except TransientExceptions:
>  jobs.put(job)  # try again later
>  else:
>  finished_jobs.add(job)

Well, if an OrderedSet were designed to gracefully handle resizes during 
iteration, something like this may make sense:

jobs = OrderedSet(initial_jobs)
for job in jobs:
  new_jobs = process(job)
  jobs |= new_jobs
... # jobs is now a set of every job processed

A dictionary with None values comes close if you replace the union line with a 
jobs.update(new_jobs) call (and ignore resizing issues), but it breaks because 
repeated jobs are shuffled to the end of the sequence and would be processed 
again.

Brandt
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SXGD5G5EY4YMXDG42AU7OSNVCUUU25DI/
Code of Conduct: http://python.org/psf/codeofconduct/