Re: [Python-Dev] Fuzzing the Python standard library

2018-07-18 Thread Steve Holden
On Tue, Jul 17, 2018 at 11:44 PM, Paul G  wrote:

> In many languages numeric types can't hold arbitrarily large values, and I
> for one hadn't really previously recognized that if you read in a numeric
> value with an exponent that it would be represented *exactly* in memory
> (and thus one object with a very compact representation can take up huge
> amounts of memory). It's also not *inconceivable* that under the hood
> Python would represent fractions.Fraction("1.64E664644") "lazily" in
> some fashion so that it did not consume all the memory on disk.
>
> ​Sooner or later you are going to need the digits of the number to perform
a computation. Exactly when would you propose the deferred evaluation
should take place?

There are already occasional inquiries about the effects of creation of
such large numbers and their unexpected effects, so they aren't completely
unknown. At the same time, this isn't exactly a mainstream "bug", as
evidenced by the fact that such issues
​

​are relatively rare.
​

> It seems to me that "Hey by the way the size of this thing is unbounded
> and because of exponents small strings can expand to huge objects" is a
> good tip.
>
> ​Not an unreasonable suggestion. Perhaps you could draft a documentation
change - personally I'm not even sure where the best place for the warning
would be.
​
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Const access to CPython objects outside of GIL?

2018-07-18 Thread Radim Řehůřek
Thanks for your feedback everyone. Given the overwhelmingly negative
response, we'll drop this line of investigation.

If more people bring up the same request in the future (unlikely), feel
free to reach out to us for some extra set of hands. Given the initial
poking, I still think a "reasonable subset" might be "reasonably easy";
IMHO more a process/maintenance/ROI question than a strictly technical one.

On Tue, Jul 17, 2018 at 10:09 PM, Tim Peters  wrote:

> Note:  the kind of people who find the GIL extremely intrusive tend
> instead to work on ways to eliminate the GIL entirely.  They typically give
> up after a few years of intense pain ;-)
>

If you mean "writing an alternative Python interpreter", that's not of any
interest. If you mean "eliminating GIL from mission-critical parts of the
code" -- we've done that many times, with good success and only moderate
pain. The current "const" question was a probe about the cost of bringing
the worlds a little closer.

Radim
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Const access to CPython objects outside of GIL?

2018-07-18 Thread Chris Angelico
On Wed, Jul 18, 2018 at 8:18 PM, Radim Řehůřek
 wrote:
> Thanks for your feedback everyone. Given the overwhelmingly negative
> response, we'll drop this line of investigation.
>
> If more people bring up the same request in the future (unlikely), feel free
> to reach out to us for some extra set of hands. Given the initial poking, I
> still think a "reasonable subset" might be "reasonably easy"; IMHO more a
> process/maintenance/ROI question than a strictly technical one.

The trouble would be defining that "reasonable subset", which would
end up having a very large number of words in it. For example,
accessing a Python list after any sort of size change could crash the
interpreter hard (as the buffer will have been reallocated). I'm
fairly sure you can safely read from a tuple so long as you retain a
ref to the tuple itself, though, so you may find that there are
options there.

Maybe, depending on your needs, the best solution might be to NOT
access Python objects at all. Instead, have an API for changing info
that is referenced outside of the GIL, and then the key info gets
grabbed in a form that doesn't require Python. That would require some
changes in the Python code (function calls rather than list
manipulation), but would be 100% guaranteed safe. But you've probably
already thought of that, so this is a case where that doesn't work :)

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzzing the Python standard library

2018-07-18 Thread Brett Cannon
On Tue, 17 Jul 2018 at 15:41 Nathaniel Smith  wrote:

> On Tue, Jul 17, 2018 at 9:44 AM, Jussi Judin  wrote:
> > * Exceptions that are something else than the documented ones. These
> usually indicate an internal implementation issue. For example one would
> not expect an UnicodeDecodeError from netrc.netrc() function when the
> documentation[3] promises netrc.NetrcParseError and there is no way to pass
> properly sanitized file object to the netrc.netrc().
>
> My main advice would be, before mass-filing bugs make sure that you
> and the maintainers agree on what a bug is :-). For example, I can see
> the argument that invalid encodings in netrc should be reported as
> NetrcParseError, but there are also many APIs where it's normal to get
> something like a TypeError even if that's not a documented exception,
> and it's very annoying as a maintainer to suddenly get 20 bugs where
> you don't even agree that they're bugs and just have to wade through
> and close them all. Any assistance you can give with triaging and
> prioritizing the bugs is also super helpful, since volunteer
> maintainers aren't necessarily prepared for sudden influxes of issues.
>

That was my initial reaction to that first bullet point as well. If the
exception isn't at least explicitly raised then it shouldn't be considered
a documentation problem, and even then I don't know if I would expect an
explicit mentioning of ValueError if the docs say e.g. "only values within
this range are expected" as that implicitly suggests ValueError will be
used.


>
> In general this sounds like cool work, and help improving Python's
> quality is always welcome. Just be careful that it's actually helpful
> :-).
>

It's definitely a balancing act. :)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzzing the Python standard library

2018-07-18 Thread Ivan Pozdeev via Python-Dev

On 17.07.2018 19:44, Jussi Judin wrote:

Hi,

I have been fuzzing[1] various parts of Python standard library for Python 3.7 
with python-afl[2] to find out internal implementation issues that exist in the 
library. What I have been looking for are mainly following:

* Exceptions that are something else than the documented ones. These usually 
indicate an internal implementation issue. For example one would not expect an 
UnicodeDecodeError from netrc.netrc() function when the documentation[3] 
promises netrc.NetrcParseError and there is no way to pass properly sanitized 
file object to the netrc.netrc().
* Differences between values returned by C and Python versions of some 
functions. quopri module may have these.
* Unexpected performance and memory allocation issues. These can be somewhat 
controversial to fix, if at all, but at least in some cases from end-user perspective it 
can be really nasty if for example fractions.Fraction("1.64E664644") 
results in hundreds of megabytes of memory allocated and takes very long to calculate. I 
gave up waiting for that function call to finish after 5 minutes.

As this is going to result in a decent amount of bug reports (currently I only filed 
one[4], although that audio processing area has much more issues to file), I would 
like to ask your opinion on filing these bug reports. Should I report all issues 
regarding some specific module in one bug report, or try to further split them into 
more fine grained reports that may be related? These different types of errors are 
specifically noticeable in zipfile module that includes a lot of different exception 
and behavioral types on invalid data 
 . 
And in case of sndhdr module, there are multiple modules with issues (aifc, sunau, 
wave) that then show up also in sndhdr when they are used. Or are some of you willing 
to go through the crashes that pop up and help with the report filing?


I'm not from the core team, so will recite best practices from my own 
experience.


Bugs should be reported "one per root cause" aka 1bug report=1fix. It's 
permissible to report separately, especially if you're not sure if they 
are the same bug (then add a prominent link), but since this is a 
volunteer project, you really should be doing any diplicate checks 
_before_ reporting. Since you'll be checking existing tickets before 
reporting each new one anyway, that'll automatically include _your own_ 
previous tickets ;-)
For ditto bugs in multiple places, it's better to err on the side of 
fewer tickets -- this will both be less work for everyone and give a 
more complete picture. If something proves to warrant a separate ticket, 
it can be split off later.



The code and more verbose description for this is available from 
. It works by default on some 
GNU/Linux systems only (I use Debian testing), as it relies on /dev/shm/ being 
available and uses shell scripts as wrappers that rely on various tools that may not 
be installed on all systems by default.

As a bonus, as this uses coverage based fuzzing, it also opens up the possibility of 
automatically creating a regression test suite for each of the fuzzed modules to ensure 
that the existing functionality (input files under /corpus/ directory) 
does not suddenly result in additional exceptions and that it is more easy to test 
potential bug fixes (crash inducing files under /crashes/ directory).

As a downside, this uses two quite specific tools (afl, python-afl) that have 
further dependencies (Cython) inside them, I doubt the viability of integrating 
this type of testing as part of normal Python verification process. As a 
difference to libFuzzer based fuzzing that is already integrated in Python[5], 
this instruments the actual (and only the) Python code and not the actions that 
the interpreter does in the background. So this should result in better fuzzer 
coverage for Python code that is used with the downside that when C functions 
are called, they are complete black boxes to the fuzzer.

I have mainly run these fuzzer instances at most for several hours per module 
with 4 instances and stopped running no-issue modules after there have been no 
new coverage discovered after more than 10 minutes. Also I have not really 
created high quality initial input files, so I wouldn't be surprised if there 
are more issues lurking around that could be found with throwing more CPU and 
higher quality fuzzers at the problem.

[1]: https://en.wikipedia.org/wiki/Fuzzing
[2]: https://github.com/jwilk/python-afl
[3]: https://docs.python.org/3/library/netrc.html
[4]: https://bugs.python.org/issue34088
[5]: https://github.com/python/cpython/tree/3.7/Modules/_xxtestfuzz



--
Regards,
Ivan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://

[Python-Dev] Status of Python CIs (buildbots, Travis CI, AppVeyor): july 2018

2018-07-18 Thread Victor Stinner
Hi,

It seems like my latest status of Python CIs was already one year ago!

   https://mail.python.org/pipermail/python-dev/2017-June/148511.html

Since last year, Zachary Ware (with the help of others, but I forgot
names, sorry!) migrated our buildbot server from buildbot 0.8 (Python
2.7) to buildbot 0.9 (Python 3.4). The new buildbot version has a very
different web UI:

   https://buildbot.python.org/all/#/

It took me time to get used to it, but now I prefer the new UI
especially to see the result of a single build. The page loads faster
and it's easier to access data. I also like the readable list of all
builders:

   https://buildbot.python.org/all/#/builders


The buildbot "warnings" step now contains test failures and test
errors for a quick overview of bugs. Example:

FAIL: test_threads (test.test_gdb.PyBtTests)
Re-running failed tests in verbose mode
Re-running test 'test_gdb' in verbose mode
FAIL: test_threads (test.test_gdb.PyBtTests)

I also modified libregrtest (our test runner: python3 -m test) to
display a better tests summary at the end, especially when there is a
least one failure. Truncated example:
---
== Tests result: FAILURE then SUCCESS ==

378 tests OK.

10 slowest tests:
- test_multiprocessing_spawn: 1 min 57 sec
- test_concurrent_futures: 1 min 36 sec
- test_nntplib: 30 sec 275 ms
- (...)

28 tests skipped:
test_crypt (...)

1 re-run test:
test_threading

Total duration: 4 min 59 sec
Tests result: FAILURE then SUCCESS
---

"FAILURE then SUCCESS" means that at least one test failed, but then
all re-run tests succeeded. "1 re-run test: test_threading" is the
list of tests that failed previously. That's also a new feature.


Last May, we worked hard to fix many random test failures on all CIs
before Python 3.7 final release. Today, the number of tests which fail
randomly is *very* low. Since the the beginning of the year, I fixed
bugs in more than 35 test files. The most complex issues were in
multiprocessing tests: the most common random failures should now be
fixed.

Many memory and reference leaks have been fixed. I also started to fix
leaks of Windows handles:

https://github.com/python/cpython/pull/7827

I added new keys to test.pythoninfo: Py_DEBUG, C compiler version,
gdbm version, memory allocator, etc.

The test.bisect tool has been optimized to be usable on test_asyncio,
one of the test which has the most test cases and methods.


I spent a lot of time to fix each test failure even when a test only
failed once on one specific CI on a specific pull request. I increased
many timeouts to make fragile tests more "reliable" (reduce the risk
of failures on slow buildbots). Some timeouts are just too strict for
no good reason.

Python CIs are not perfect, but random failures should now be more rare.


Mailing list for email notifications when a buildbot fails. That's my
main source to detect regressions and tests which fail randomly:
https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/

Buildbot builders:
http://buildbot.python.org/all/#/builders

Travis CI build history:
https://travis-ci.org/python/cpython/builds

AppVeyor build history:
https://ci.appveyor.com/project/python/cpython/history

My notes on Python CIs:
http://pythondev.readthedocs.io/ci.html


Thanks Zachary Ware for maintaining our buildbot servers, thanks Pablo
Galindo Salgado who helped me to triage buildbot failures (on the
buildbot-status mailing list), thanks all other developers who helped
me to fix random test failures and make our test suite more stable!

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of Python CIs (buildbots, Travis CI, AppVeyor): july 2018

2018-07-18 Thread Barry Warsaw
On Jul 18, 2018, at 13:54, Victor Stinner  wrote:

> Last May, we worked hard to fix many random test failures on all CIs
> before Python 3.7 final release.

Thank you thank you thank you Victor for work on keeping the buildbots happy!

-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of Python CIs (buildbots, Travis CI, AppVeyor): july 2018

2018-07-18 Thread Eric Snow
On Wed, Jul 18, 2018 at 3:16 PM Barry Warsaw  wrote:
> On Jul 18, 2018, at 13:54, Victor Stinner  wrote:
> > Last May, we worked hard to fix many random test failures on all CIs
> > before Python 3.7 final release.
>
> Thank you thank you thank you Victor for work on keeping the buildbots happy!

Yes, thank you Victor (and friends).  Your work on this makes a
concrete difference.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2to3 for python annotations

2018-07-18 Thread Philippe Fremy
Le 17/07/2018 à 22:34, Guido van Rossum a écrit :
> On Tue, Jul 17, 2018 at 1:17 PM, Jelle Zijlstra
> mailto:jelle.zijls...@gmail.com>> wrote:
>
>
>
> 2018-07-17 12:37 GMT-07:00 Philippe Fremy  >:
>
> Hi,
>
> While contributing to pyannotate, I became familiar enough
> with 2to3
> fixers to be able to convert Python 2 style annotations to
> Python 3.
>
> Is it something that would be interesting to put into python
> 2to3 ? If
> so I would propose a PR for this.
>
> [...]
>
>
> I think as an optional fixer it would be a fine contribution.
I'll work on something then.

Out of curiosity, is anybody else seeing an interest in this ?

>
> Also I apologize for not yet reviewing
> https://github.com/dropbox/pyannotate/pull/74 (which I presume is yours?).
This is mine indeed. You said that it would take time to review and I
said that I would be patient, so ... I am patient, no worries.

Cheers,

Philippe



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com