[issue37935] Improve performance of pathlib.scandir()

2019-08-23 Thread Shai


New submission from Shai :

I recently have taken a look in the source code of the pathlib module, and I 
saw something weird there:

when the module used the scandir function, it first converted its iterator into 
a list and then used it in a for loop. The list wasn't used anywhere else, so I 
think the conversion to list is just a waste of performance.

In addition, I noticed that the scandir iterator is never closed (it's not used 
in a with statement and its close method isn't called). I know that the 
iterator is closed automatically when it's garbaged collected, but according to 
the docs, it's advisable to close it explicitly.

I've created a pull request that fixes these issues:
PR 15331

In the PR, I changed the code so the scandir iterator is used directly instead 
of being converted into a list and I wrapped its usage in a with statement to 
close resources properly.

--
components: Library (Lib)
messages: 350354
nosy: Shai
priority: normal
pull_requests: 15142
severity: normal
status: open
title: Improve performance of pathlib.scandir()
type: performance
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue37935>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37935] Improve performance of pathlib.scandir()

2019-08-26 Thread Shai


Shai  added the comment:

>From the docs (https://docs.python.org/3/library/os.html#os.scandir.close):

"This is called automatically when the iterator is exhausted or garbage 
collected, or when an error happens during iterating. However it is advisable 
to call it explicitly or use the with statement.".

The iterator is indeed closed properly, but the docs state that it's still 
advisable to close it explicitly, which is why I wrapped it in a with statement.

However, the more important change is that the iterator is no longer converted 
into a list, which should reduce the iterations from 2N to N, when N is the 
number of entries in the directory (one N when converting to list and another 
one when iterating it). This should enhance the performance of the functions 
that use scandir.

--

___
Python tracker 
<https://bugs.python.org/issue37935>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37935] Improve performance of pathlib.scandir()

2019-08-26 Thread Shai


Shai  added the comment:

I'm new to contributing here. I've never done benchmarking before.

I'd appreciate it if you could provide a guide to benchmarking.
You could look at the changes I made in the pull request (PR 15331). They're 
easy to follow and I think that removing a useless call to list() should 
enhance the performance, but I'd like to have benchmarking to back this up, so 
if someone more experienced could do this or at least provide a link to a 
guide, I'd really appreciate it.

--

___
Python tracker 
<https://bugs.python.org/issue37935>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31946] mailbox.MH.add loses status info from other formats

2017-11-04 Thread Shai Berger

New submission from Shai Berger :

In mailbox.py in the stdlib, the functions MH.add and MH.__setitem__ take a 
message object and dump it to a file in the MH folder, which is good and well. 
However, they only call self._dump_sequences() if the message was already an 
MHMessage.

Since in the MH format, status details (whether the message was read, replied 
or flagged) are saved in these sequences, this effectively loses this 
information.

This means that, if "folder" is an MH folder and "message" is a message of any 
class other than MHMessage, 

   folder.add(message)

loses the information, while

   folder.add(MHMEssage(message))

retains it. This seems surprising and suboptimal.

--
components: Library (Lib), email
messages: 305572
nosy: barry, r.david.murray, shai
priority: normal
severity: normal
status: open
title: mailbox.MH.add loses status info from other formats
type: behavior
versions: Python 3.6

___
Python tracker 
<https://bugs.python.org/issue31946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17108] import silently prefers package over module when both available

2013-02-02 Thread Shai Berger

New submission from Shai Berger:

Consider the following directory structure:

a-\
  __init__.py
  b.py
  b-|
__init__.py

Now, in Python (I checked 2.7.3 and 3.2.3, haven't seen the issue mentioned 
anywhere so I suspect it is also in later Pythons), if you import a.b, you 
always get the package (that is, the b folder), and the module (b.py) is 
silently ignored. I tested by putting the line """print("I'm a package")""" in 
a/b/__init__.py and """print("I'm a module")""" in a/b.py.

This becomes a real problem with tools which find modules dynamically, like 
test harnesses.

I'd expect that in such cases, Python should "avoid the temptation to guess", 
and raise an ImportError.

Thanks, Shai.

--
components: Interpreter Core
messages: 181225
nosy: shai
priority: normal
severity: normal
status: open
title: import silently prefers package over module when both available
type: behavior
versions: Python 2.7, Python 3.2

___
Python tracker 
<http://bugs.python.org/issue17108>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17108] import silently prefers package over module when both available

2013-02-02 Thread Shai Berger

Shai Berger added the comment:

Thanks for the quick response.

If this isn't changing, I'd definitely want better documentation. In 
particular, the rationale behind this should be explained.

I submitted the bug because a co-worker unintentionally caused a whole suite of 
tests to be ignored.

Thanks again,
Shai.

--

___
Python tracker 
<http://bugs.python.org/issue17108>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17108] import silently prefers package over module when both available

2013-02-03 Thread Shai Berger

Shai Berger added the comment:

Hi,

> the reason this stuff can't change [... is] backward compatibility.

Thanks, but this is still unclear to me. The required fix for code that would 
break because of the change I propose, is removal of dead code which looks 
misleadingly alive. 

Is the backward-compatibility requirement really that strict?

--

___
Python tracker 
<http://bugs.python.org/issue17108>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17108] import silently prefers package over module when both available

2013-02-03 Thread Shai Berger

Shai Berger added the comment:

Oh, sure, this was unclear of me. I thought you were talking about Python 3.4. 
I wasn't really expecting this to be fixed in the stable branches.

Thanks,
Shai.

--

___
Python tracker 
<http://bugs.python.org/issue17108>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13936] datetime.time(0, 0, 0) evaluates to False despite being a valid time

2014-03-04 Thread Shai Berger

Shai Berger added the comment:

Just got bit by this.

Tim Peters said: """
It is odd, but really no odder than "zero values" of other types evaluating to 
false in Boolean contexts.
"""

I disagree. Midnight is not a "zero value", it is just a value. It does not 
have any special qualities analogous to those of 0, "", or the empty set. Time 
values cannot be added or multiplied. Midnight evaluting to false makes as much 
sense as date(1,1,1) -- the minimal valid date value -- evaluating to false 
(and it doesn't).

It makes perfect sense for timedelta(0) to evaluate to false, and it does. time 
is different.

Also, while I appreciate this will never be fixed for Python2, the same 
behavior exists in Python3, where there may still be room for improvement.

I second Danilo Bergen's request. Please reopen.

--
nosy: +shai

___
Python tracker 
<http://bugs.python.org/issue13936>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27286] str object got multiple values for keyword argument

2016-12-01 Thread Shai Berger

Shai Berger added the comment:

Following the last comment, and just as clarification for anyone else running 
into this and thinking like me: The bumped code is not included in v3.5.2, and 
v3.5.3 hasn't been released yet. Should it be undone?

No, because the bump which was encountered by John Ehresman on Debian Testing 
has also made it into Ubuntu 16.04LTS. Undoing it, at this point, is liable to 
bring even worse breakage than the original change caused.

http://changelogs.ubuntu.com/changelogs/pool/main/p/python3.5/python3.5_3.5.2-2ubuntu0~16.04.1/changelog

--
nosy: +shai

___
Python tracker 
<http://bugs.python.org/issue27286>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18009] os.write.__doc__ is misleading

2013-05-18 Thread Shai Berger

New submission from Shai Berger:

At least on posix systems, os.write says it takes a string, but in fact it 
barfs on strings -- it needs bytes.

$ python
Python 3.3.1 (default, May  6 2013, 16:18:33) 
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print(os.write.__doc__)
write(fd, string) -> byteswritten

Write a string to a file descriptor.
>>> os.write(1, "hello")
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'str' does not support the buffer interface
>>> os.write(1, b"hello")
hello5
>>>

--
messages: 189535
nosy: shai
priority: normal
severity: normal
status: open
title: os.write.__doc__ is misleading
type: behavior
versions: Python 3.3

___
Python tracker 
<http://bugs.python.org/issue18009>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com