[issue41299] Python3 threading.Event().wait time is twice as large as Python27

2021-06-06 Thread Ryan Hileman


Ryan Hileman  added the comment:

I just ran into this. GetTickCount64() is a bad choice even without improving 
the Windows timer resolution, as every mutex wait will have 16ms of jitter. 
Here are some lock.acquire(timeout=0.001) times measured with 
time.perf_counter():

elapsed=21.215ms
elapsed=30.960ms
elapsed=21.686ms
elapsed=30.998ms
elapsed=30.794ms

Here's the same lock.acquire(timeout=0.001) with CPython patched to use 
QueryPerformanceCounter() instead of GetTickCount64(). Notice this is less 
overhead than even the original post's Python 2.x times.

elapsed=9.554ms
elapsed=14.516ms
elapsed=13.985ms
elapsed=13.434ms
elapsed=13.724ms

Here's the QueryPerformanceCounter() test in a timeBeginPeriod(1) block:

elapsed=1.135ms
elapsed=1.204ms
elapsed=1.189ms
elapsed=1.052ms
elapsed=1.052ms

I'd like to submit a PR to fix the underlying issue by switching to 
QueryPerformanceCounter() in EnterNonRecursiveMutex().

QueryInterruptTime() is a bad candidate because it's only supported on Windows 
10, and CPython still supports Windows 8. Improvements based on 
QueryPerformanceCounter() can be backported to at least 3.8 (3.8 dropped 
Windows XP support, which was the last Windows version where 
QueryPerformanceCounter() could fail).

I checked and the only other use of GetTickCount64() seems to be in 
time.monotonic(). Honestly I would vote to change time.monotonic() to 
QueryPerformanceCounter() as well, as QueryPerformanceCounter() can no longer 
fail on any Windows newer than XP (which is no longer supported by Python), but 
that's probably a topic for a new BPO.

--
nosy: +lunixbochs2
versions:  -Python 3.10, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue41299>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41299] Python3 threading.Event().wait time is twice as large as Python27

2021-06-06 Thread Ryan Hileman


Change by Ryan Hileman :


--
versions: +Python 3.10, Python 3.11, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue41299>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44328] time.monotonic() should use QueryPerformanceCounter() on Windows

2021-06-06 Thread Ryan Hileman


New submission from Ryan Hileman :

Related to https://bugs.python.org/issue41299#msg395220

Presumably `time.monotonic()` on Windows historically used GetTickCount64() 
because QueryPerformanceCounter() could fail. However, that hasn't been the 
case since Windows XP: 
https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter

> On systems that run Windows XP or later, the function will always succeed and 
> will thus never return zero

I've run into issues with this when porting python-based applications to 
Windows. On other platforms, time.monotonic() was a decent precision so I used 
it. When I ported to Windows, I had to replace all of my time.monotonic() calls 
with time.perf_counter(). I would pretty much never knowingly call 
time.monotonic() if I knew ahead of time it could be quantized to 16ms.

My opinion is that the GetTickCount64() monotonic time code in CPython should 
be removed entirely and only the QueryPerformanceCounter() path should be used.

I also think some of the failure checks could be removed from 
QueryPerformanceCounter() / QueryPerformanceFrequency(), as they're documented 
to never fail in modern Windows and CPython has been dropping support for older 
versions of Windows, but that's less of a firm opinion.

--
components: Library (Lib), Windows
messages: 395221
nosy: lunixbochs2, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: time.monotonic() should use QueryPerformanceCounter() on Windows
type: performance
versions: Python 3.10, Python 3.11, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue44328>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41299] Python3 threading.Event().wait time is twice as large as Python27

2021-06-06 Thread Ryan Hileman


Change by Ryan Hileman :


--
keywords: +patch
pull_requests: +25157
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/26568

___
Python tracker 
<https://bugs.python.org/issue41299>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41299] Python3 threading.Event().wait time is twice as large as Python27

2021-06-06 Thread Ryan Hileman


Ryan Hileman  added the comment:

Ok, I filed a PR for this. I used pytime's interface to avoid duplicating the 
QueryPerformanceFrequency() code.

I found a StackOverflow answer that says QueryPerformance functions will only 
fail if you pass in an unaligned pointer: https://stackoverflow.com/a/27258700

Per that, I used Py_FatalError to catch this case, as there is probably 
something wrong with the compiler at that point, and the other recovery options 
would be likely to result in incorrect program behavior (e.g. dead lock).

--

___
Python tracker 
<https://bugs.python.org/issue41299>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44328] time.monotonic() should use a different clock source on Windows

2021-06-06 Thread Ryan Hileman


Ryan Hileman  added the comment:

I found these two references:
- 
https://stackoverflow.com/questions/35601880/windows-timing-drift-of-performancecounter-c
- https://bugs.python.org/issue10278#msg143209

Which suggest QueryPerformanceCounter() may be bad because it can drift. 
However, these posts are fairly old and the StackOverflow post also says the 
drift is small on newer hardware / Windows.

Microsoft's current stance is that QueryPerformanceCounter() is good: 
https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

> Guidance for acquiring time stamps
> Windows has and will continue to invest in providing a reliable and efficient 
> performance counter. When you need time stamps with a resolution of 1 
> microsecond or better and you don't need the time stamps to be synchronized 
> to an external time reference, choose QueryPerformanceCounter

I looked into how a few other languages provide monotonic time on Windows:

Golang seems to read the interrupt time (presumably equivalent to 
QueryInterruptTime) directly by address. 
https://github.com/golang/go/blob/a3868028ac8470d1ab7782614707bb90925e7fe3/src/runtime/sys_windows_amd64.s#L499

Rust uses QueryPerformanceCounter: 
https://github.com/rust-lang/rust/blob/38ec87c1885c62ed8c66320ad24c7e535535e4bd/library/std/src/time.rs#L91

V8 uses QueryPerformanceCounter after checking for old CPUs: 
https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672

Ruby uses QueryPerformanceCounter: 
https://github.com/ruby/ruby/blob/44cff500a0ad565952e84935bc98523c36a91b06/win32/win32.c#L4712

C# implements QueryPerformanceCounter on other platforms using CLOCK_MONOTONIC, 
indicating that they should be roughly equivalent: 
https://github.com/dotnet/runtime/blob/01b7e73cd378145264a7cb7a09365b41ed42b240/src/coreclr/pal/src/misc/time.cpp#L175

Swift originally used QueryPerformanceCounter, but switched to 
QueryUnbiasedInterruptTime() because they didn't want to count time the system 
spent asleep: 
https://github.com/apple/swift-corelibs-libdispatch/commit/766d64719cfdd07f97841092bec596669261a16f

--

Note that none of these languages use GetTickCount64(). Swift is an interesting 
counter point, and I noticed QueryUnbiasedInterruptTime() is available on 
Windows 8 while QueryInterruptTime() is new as of Windows 10. The "Unbiased" 
just refers to whether it advances during sleep.

I'm not actually sure whether time.monotonic() in Python counts time spent 
asleep, or whether that's desirable. Some kinds of timers using monotonic time 
should definitely freeze during sleep so they don't cause a flurry of activity 
on wake, but others definitely need to roughly track wall clock time, even 
during sleep.

Perhaps the long term answer would be to introduce separate "asleep" and 
"awake" monotonic clocks in Python, and possibly deprecate perf_counter() if 
it's redundant after this (as I think it's aliased to monotonic() on 
non-Windows platforms anyway).

--
title: time.monotonic() should use QueryPerformanceCounter() on Windows -> 
time.monotonic() should use a different clock source on Windows

___
Python tracker 
<https://bugs.python.org/issue44328>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44328] time.monotonic() should use a different clock source on Windows

2021-06-09 Thread Ryan Hileman


Ryan Hileman  added the comment:

Great information, thanks!

> Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid 
> solution. It uses the performance counter to interpolate a timestamp between 
> interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if 
> it's available via GetProcAddress()

My personal vote is to use the currently most common clock source (QPC) for now 
for monotonic(), because it's the same across Windows versions and the most 
likely to produce portable monotonic timestamps between apps/languages on the 
same system. It's also the easiest patch, as there's already a code path for 
QPC.

(As someone building multi-app experiences around Python, I don't want to check 
the Windows version to see which time base Python is using. I'd feel better 
about switching to QITP() if/when Python drops Windows 8 support.)

A later extension of this idea (maybe behind a PEP) could be to survey the 
existing timers available on each platform and consider whether it's worth 
extending `time` to expose them all, and unify cross-platform the ones that are 
exposed (e.g. better formalize/document which clocks will advance while the 
machine is asleep on each platform).

--

___
Python tracker 
<https://bugs.python.org/issue44328>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44328] time.monotonic() should use a different clock source on Windows

2021-06-12 Thread Ryan Hileman


Ryan Hileman  added the comment:

I think a lot of that is based on very outdated information. It's worth reading 
this article: 
https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

I will repeat Microsoft's current recommendation (from that article):

> Windows has and will continue to invest in providing a reliable and efficient 
> performance counter. When you need time stamps with a resolution of 1 
> microsecond or better and you don't need the time stamps to be synchronized 
> to an external time reference, choose QueryPerformanceCounter, 
> KeQueryPerformanceCounter, or KeQueryInterruptTimePrecise. When you need 
> UTC-synchronized time stamps with a resolution of 1 microsecond or better, 
> choose GetSystemTimePreciseAsFileTime or KeQuerySystemTimePrecise.

(Based on that, it may also be worth replacing time.time()'s 
GetSystemTimeAsFileTime with GetSystemTimePreciseAsFileTime in CPython, as 
GetSystemTimePreciseAsFileTime is available in Windows 8 and newer)

PEP 418:

> It has a much higher resolution, but has lower long term precision than 
> GetTickCount() and timeGetTime() clocks. For example, it will drift compared 
> to the low precision clocks.

Microsoft on drift (from the article above):

> To reduce the adverse effects of this frequency offset error, recent versions 
> of Windows, particularly Windows 8, use multiple hardware timers to detect 
> the frequency offset and compensate for it to the extent possible. This 
> calibration process is performed when Windows is started.

Modern Windows also automatically detects and works around stoppable TSC, as 
well as several other issues:

> Some processors can vary the frequency of the TSC clock or stop the 
> advancement of the TSC register, which makes the TSC unsuitable for timing 
> purposes on these processors. These processors are said to have non-invariant 
> TSC registers. (Windows will automatically detect this, and select an 
> alternative time source for QPC).

It seems like Microsoft considers QPC to be a significantly better time source 
now, than when PEP 418 was written.

Another related conversation is whether Python can just expose all of the 
Windows clocks directly (through clock_gettime enums?), as that gives anyone 
who really wants full control over their timestamps a good escape hatch.

--

___
Python tracker 
<https://bugs.python.org/issue44328>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44328] time.monotonic() should use a different clock source on Windows

2021-06-13 Thread Ryan Hileman


Ryan Hileman  added the comment:

> The monotonic clock should thus be based on QueryUnbiasedInterruptTime

My primary complaint here is that Windows is the only major platform with a low 
resolution monotonic clock. Using QueryUnbiasedInterruptTime() on older OS 
versions wouldn't entirely help that, so we have the same consistency issue 
(just on a smaller scale). I would personally need to still use 
time.perf_counter() instead of time.monotonic() due to this, but I'm not 
totally against it.

> For consistency, an external deadline (e.g. for SIGINT support) should work 
> the same way.

Have there been any issues filed about the deadline behaviors across system 
suspend?

> which I presume includes most users of Python 3.9+

Seems like Windows 7 may need to be considered as well, as per vstinner's 
bpo-32592 mention?

> starting with Windows 8, WaitForSingleObject() and WaitForMultipleObjects() 
> exclude time when the system is suspended

Looks like Linux (CLOCK_MONOTONIC) and macOS (mach_absolute_time()) already 
don't track suspend time in time.monotonic(). I think that's enough to suggest 
that long-term Windows shouldn't either, but I don't know how to reconcile that 
with my desire for Windows not to be the only platform with low resolution 
monotonic time by default.

> then the change to use QueryPerformanceCounter() to resolve bpo-41299 should 
> be reverted. The deadline should instead be computed with 
> QueryUnbiasedInterruptTime()

I don't agree with this, as it would regress the fix. This is more of a topic 
for bpo-41299, but I tested QueryUnbiasedInterruptTime() and it exhibits the 
same 16ms jitter as GetTickCount64() (which I expected), so non-precise 
interrupt time can't solve this issue. I do think 
QueryUnbiasedInterruptTimePrecise() would be a good fit. I think making this 
particular timeout unbiased (which would be a new behavior) should be a lower 
priority than making it not jitter.

> For Windows we could implement the following clocks:

I think that list is great and making those enums work with clock_gettime on 
Windows sounds like a very clear improvement to the timing options available. 
Having the ability to query each clock source directly would also reduce the 
impact if time.monotonic() does not perfectly suit a specific application.

---

I think my current positions after writing all of this are:

- I would probably be in support of a 3.11+ change for time.monotonic() to use 
QueryUnbiasedInterruptTime() pre-Windows 10, and dynamically use 
QueryUnbiasedInterruptTimePrecise() on Windows 10+. Ideally the Windows 
clock_gettime() code lands in the same release, so users can directly pick 
their time source if necessary. This approach also helps my goal of making 
time.monotonic()'s suspend behavior more uniform across platforms.

- Please don't revert bpo-41299 (especially the backports), as it does fix the 
issue and tracking suspend time is the same (not a regression) as the previous 
GetTickCount64() code. I think the lock timeouts should stick with QPC 
pre-Windows-10 to fix the jitter, but could use 
QueryUnbiasedInterruptTimePrecise() on Windows 10+ (which needs the same 
runtime check as the time.monotonic() change, thus could probably land in the 
same patch set).

- I'm honestly left with more questions than I started after diving into the 
GetSystemTimePreciseAsFileTime() rabbit hole. I assume it's not a catastrophic 
issue? Maybe it's a situation where adding the clock_gettime() enums would 
sufficiently help anyone who cares about the exact behavior during clock 
modification. I don't have strong opinions about it, besides it being a shame 
that Windows currently has lower precision timestamps in general. Could be 
worth doing a survey of other languages' choices, but any further discussion 
can probably go to bpo-19007.

--

___
Python tracker 
<https://bugs.python.org/issue44328>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41299] Python3 threading.Event().wait time is twice as large as Python27

2021-06-14 Thread Ryan Hileman


Ryan Hileman  added the comment:

I agree with not throwing fatal errors, but that check is unlikely to actually 
be hit, and you removed the startup checks covering the underlying clocks here: 
https://github.com/python/cpython/commit/ae6cd7cfdab0599139002c526953d907696d9eef

I think if the time source is broken, a user would probably prefer an exception 
or fatal error to the deadlock they will otherwise get (as time not advancing 
means it's impossible to timeout), so it doesn't make sense to remove the check 
without doing something else about it.

There are three places win_perf_counter_frequency() can fail: 
https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Python/pytime.c#L974

I mention the QueryPerformanceFrequency error case here (stack misalignment): 
https://bugs.python.org/issue41299#msg395237

The other option, besides startup checks, would be to better parameterize the 
timer used (mentioned in bpo-44328): Prefer QueryUnbiasedInterruptTimePrecise 
if available (Win 10+ via GetProcAddress), then QPC, then 
QueryUnbiasedInterruptTime (which has the original bug wrt jitter but should 
never be used as QPC is unlikely to be broken).

--

___
Python tracker 
<https://bugs.python.org/issue41299>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41299] Python3 threading.Event().wait time is twice as large as Python27

2021-06-14 Thread Ryan Hileman


Ryan Hileman  added the comment:

Perhaps the simplest initial fix would be to move that check down to 
PyThread__init_thread() in the same file. I'm not sure what the cpython 
convention for that kind of init error is, would it just be the same 
Py_FatalError block or is there a better pattern?

--

___
Python tracker 
<https://bugs.python.org/issue41299>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44328] time.monotonic() should use a different clock source on Windows

2021-06-14 Thread Ryan Hileman


Ryan Hileman  added the comment:

> It shouldn't behave drastically different just because the user closed the 
> laptop lid for an hour

I talked to someone who's been helping with the Go time APIs and it seems like 
that holds pretty well for interactive timeouts, but makes no sense for network 
related code. If you lost a network connection (with, say, a 30 second timeout) 
due to the lid being closed, you don't want to wait 30 seconds after opening 
the lid for the application to realize it needs to reconnect. (However there's 
probably no good way to design Python's locking system around both cases, so 
it's sufficient to say "lock timers won't advance during suspend" and make the 
application layer work around that on its own in the case of network code)

> Try changing EnterNonRecursiveMutex() to break out of the loop in this case

This does work, but unfortunately a little too well - in a single test I saw 
several instances where that approach returned _earlier_ than the timeout.

I assume the reason for this loop is the call can get interrupted with a "needs 
retry" state. If so, you'd still see 16ms of jitter anytime that happens as 
long as it's backed by a quantized time source.

--

___
Python tracker 
<https://bugs.python.org/issue44328>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41299] Python3 threading.Event().wait time is twice as large as Python27

2021-06-14 Thread Ryan Hileman


Ryan Hileman  added the comment:

> Do you think that pytime.c has the bug? I don't think so.

No, a misaligned stack would be an issue in the caller or compiler, not 
pytime.c. I have hit misaligned stack in practice, but it should be rare enough 
to check on init only.

> In theory yes, in practice we got zero error reports. So it sounds like it 
> cannot happen.
> I don't think that it's a good practice to start to add checks in all 
> functions using a clock "just in case" the clock might fail.

My read is that as long as we're not confident enough to remove those checks 
from pytime.c, a caller should assume they're reachable. If the pytime checks 
need to stay, adding a Windows only pytime init check to make sure that locks 
won't deadlock sounds fine to me.

--

___
Python tracker 
<https://bugs.python.org/issue41299>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-08 Thread Ryan Hileman


Ryan Hileman  added the comment:

traceback's `tb_code` attribute also allows you to bypass the 
`object.__getattr__` audit event for `__code__`.

Perhaps accessing a traceback object's `tb_code` and `tb_frame` should both 
raise an `object.__getattr__` event?

--
nosy: +lunixbochs2

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-08 Thread Ryan Hileman


Ryan Hileman  added the comment:

I'm definitely not proposing to hook all of object.__getattr__, as my intuition 
says that would be very slow. I simply refer to "object.__getattr__" as the 
event name used by a couple of rare event audit hooks. This is how getting 
__code__ is emitted: 
https://github.com/python/cpython/blob/7301979b23406220510dd2c7934a21b41b647119/Objects/funcobject.c#L250

However, there's not much point in the sys._getframe and func.__code__ family 
of audit hooks right now as tracebacks expose the same information (and may 
even do so accidentally). I am personally interested in these hooks for non 
sandbox reasons in a production application that cares about perf, FWIW.

I think this would be implemented by extending the traceback object's getters 
to include tb_code and tb_frame: 
https://github.com/python/cpython/blob/7301979b23406220510dd2c7934a21b41b647119/Python/traceback.c#L156-L159

I project it won't have any noticeable perf impact (especially if the audit 
hook is written in C), as most reasons to inspect a traceback object will be 
exceptional and not in critical paths.

I'd be happy to write a proposed patch if that would help.

--

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-08 Thread Ryan Hileman


Ryan Hileman  added the comment:

Oops, by tb_code I meant traceback.tb_frame.f_code. So you can get to a frame 
from traceback.tb_frame (without triggering audit) or sys._getframe (which has 
an audit hook already), and you can get to __code__ from a frame via 
frame.f_code (without triggering audit).

Here's a patch for both frame.f_code and traceback.tb_frame:
https://github.com/lunixbochs/cpython/commit/2334a00c833874b7a2427e88abc9b51315bb010c

---

Benchmarks follow this section, made using the commit I linked (and the parent 
commit without the patch for comparison). My takeaways from playing around:

1. You probably shouldn't install a Python audit hook if you care about 
performance.
2. C audit hook performance impact shows up in microbenchmarking but only have 
a small impact on real workloads (see the traceback.format_tb benchmark at the 
end).
3. Performance impact of this change when you _don't_ have an audit hook 
installed is very small.
4. This seems to mostly impact debugging and test code. A quick check of the 
stdlib shows:
- traceback.tb_frame usage seems to be entirely in debugger, traceback, and 
testing code: https://github.com/python/cpython/search?l=Python&p=3&q=tb_frame
- frame.f_code primarily has similar debug use (dis, warnings, profiling, 
inspect): https://github.com/python/cpython/search?l=Python&p=3&q=f_code

Attached (c_audit_ext.zip) is the empty C audit hook I used for the benchmarks. 
`python3 setup.py build_ext` builds a `c_audit` module which registers an empty 
audit hook on import.


 frame.f_code object.__getattr__ audit hook

# Testing frame.f_code impact (no audit hook installed):
./python.exe -m timeit -s 'frame = sys._getframe()' -- 'frame.f_code'

with patch 2334a00c833874b7a2427e88abc9b51315bb010c
2000 loops, best of 5: 19.1 nsec per loop
2000 loops, best of 5: 18.7 nsec per loop
2000 loops, best of 5: 19.1 nsec per loop

without patch 2334a00c833874b7a2427e88abc9b51315bb010c
2000 loops, best of 5: 17 nsec per loop
2000 loops, best of 5: 16.7 nsec per loop
2000 loops, best of 5: 17 nsec per loop

# Testing frame.f_code impact (C audit hook installed):
python.exe -m timeit -s 'import c_audit; frame = sys._getframe()' -- 
'frame.f_code'

with patch 2334a00c833874b7a2427e88abc9b51315bb010c
500 loops, best of 5: 66.1 nsec per loop
500 loops, best of 5: 66.1 nsec per loop
500 loops, best of 5: 66.5 nsec per loop

without patch 2334a00c833874b7a2427e88abc9b51315bb010c
2000 loops, best of 5: 16.9 nsec per loop
2000 loops, best of 5: 16.9 nsec per loop
2000 loops, best of 5: 16.8 nsec per loop

# Testing frame.f_code impact (pure Python audit hook installed):
./python.exe -m timeit -s 'frame = sys._getframe(); sys.addaudithook(lambda *a: 
None)' -- 'frame.f_code'

with patch 2334a00c833874b7a2427e88abc9b51315bb010c
50 loops, best of 5: 1.02 usec per loop
50 loops, best of 5: 1.04 usec per loop
50 loops, best of 5: 1.02 usec per loop

without patch 2334a00c833874b7a2427e88abc9b51315bb010c
2000 loops, best of 5: 16.8 nsec per loop
2000 loops, best of 5: 17.1 nsec per loop
2000 loops, best of 5: 16.8 nsec per loop


 tb.tb_frame object.__getattr__ audit hook

# Testing tb.tb_frame impact (no audit hook installed)
./python.exe -m timeit -s "$(echo -e "try: a\nexcept Exception as e: tb = 
e.__traceback__")" -- 'tb.tb_frame'

with patch 2334a00c833874b7a2427e88abc9b51315bb010c
2000 loops, best of 5: 19.2 nsec per loop
2000 loops, best of 5: 18.9 nsec per loop
2000 loops, best of 5: 18.9 nsec per loop

without patch 2334a00c833874b7a2427e88abc9b51315bb010c
2000 loops, best of 5: 17 nsec per loop
2000 loops, best of 5: 16.7 nsec per loop
2000 loops, best of 5: 16.8 nsec per loop

# Testing tb.tb_frame impact (C audit hook installed)
./python.exe -m timeit -s "$(echo -e "import c_audit\ntry: a\nexcept Exception 
as e: tb = e.__traceback__")" -- 'tb.tb_frame'

with patch 2334a00c833874b7a2427e88abc9b51315bb010c
500 loops, best of 5: 64.8 nsec per loop
500 loops, best of 5: 64.8 nsec per loop
500 loops, best of 5: 64.8 nsec per loop

without patch 2334a00c833874b7a2427e88abc9b51315bb010c
2000 loops, best of 5: 16.7 nsec per loop
2000 loops, best of 5: 16.9 nsec per loop
2000 loops, best of 5: 16.9 nsec per loop

# Testing tb.tb_frame impact (pure Python audit hook installed)
./python.exe -m timeit -s "$(echo -e "sys.addaudithook(lambda *a: None)\ntry: 
a\nexcept Exception as e: tb = e.__traceback__")" -- 'tb.tb_frame'

with patch 2334a00c833874b7a2427e88abc9b51315bb010c
50 loops, best of 5: 1.04 usec per loop

[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-09 Thread Ryan Hileman


Change by Ryan Hileman :


--
keywords: +patch
pull_requests: +23010
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24182

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-09 Thread Ryan Hileman


Ryan Hileman  added the comment:

PR submitted, waiting on CLA process.

I added documentation at the field sites, but the audit event table generation 
does not handle attributes or object.__getattr__ very well at all, so I'm not 
updating the audit table for now.

The `.. audit-event:: object.__getattr__ obj,name frame-objects` sphinx 
directive right now just inserts a canned string """Raises an :ref:`auditing 
event ` object.__getattr__ with arguments obj,name.""", which would 
need additional boilerplate to describe these attributes properly. It also only 
adds a footnote style link to the audit table under __getattr__, and even moves 
object.__getattribute__ from the first [1] link position to a later number 
which is IMO is more confusing than not even linking them.

I think to make attributes look good in the table we would need a special 
sphinx directive for audited object.__getattr__ attributes, for example by 
modifying the template generator to fit each attribute on its own line under  
object.__getattr__ in the table.

For now I did not use the audit-event sphinx directive and manually inserted 
strings like this near the original attribute description in the docs: 
"""Accessing ``f_code`` raises an :ref:`auditing event ` 
``object.__getattr__`` with arguments ``obj`` and ``"f_code"``."""

I think audit table improvements should be handled in a separate issue, and by 
someone more familiar with that part of the doc generator, as cleaning it up 
looks like maybe a bigger scope than the current contribution.

--

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-21 Thread Ryan Hileman


Ryan Hileman  added the comment:

I just found out that generator object variants have their own code attributes. 
I investigated the stdlib usage and it seems to be for debug / dis only, so 
adding these attributes shouldn't impact performance.

I updated the PR to now cover the following attributes:

PyTracebackObject.tb_frame
PyFrameObject.f_code
PyGenObject.gi_code
PyCoroObject.cr_code
PyAsyncGenObject.ag_code

I have also attached a `check_hooks.py` file which allows for quick visual 
inspection that all of the hooks are working (It prints each attribute name, 
then accesses it. Expected output is an AUDIT line after each attribute 
printed.)

--
Added file: https://bugs.python.org/file49755/check_hooks.py

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-21 Thread Ryan Hileman


Ryan Hileman  added the comment:

My personal motivation is not to unilaterally prevent access to globals, but to 
close a simpler gap in the audit system that affects a currently deployed high 
performance production system (which is not trying to be a sandbox). I am also 
already using a C audit hook for my purposes.

If you are referencing vstinner's first message, please remember to read their 
follow up https://bugs.python.org/msg384988 where they seem to have changed 
their mind in support of the patch.

The audit attributes I'm chasing here are fairly small in scope, and 
overwhelmingly only used in debug code. I believe adding them is in the spirit 
of the original PEP. I have also done extensive testing and CPython C and 
stdlib code analysis as part of this effort.

If you agree with the original PEP authors that __code__ and sys._getframe() 
are worth auditing, then I believe this is a natural extension of that concept. 
My patch improves upon the PEP by increasing the audit coverage to every way I 
can see of getting a frame and code object from basic CPython types.

This is a simple patch with clear performance metrics. I don't see any reason 
to expand the scope of this in the future unless CPython adds another basic 
object type along the same lines (e.g. a new async function type, a new 
traceback type, or a new frame type).

--

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-21 Thread Ryan Hileman


Ryan Hileman  added the comment:

My understanding as per the outline in PEP 551 as well as PEP 578, is that the 
audit system is meant primarily to observe the behavior of code rather than to 
have good sandbox coverage / directly prevent behavior.

I am using audit hooks to observe the behavior of third party Python, and I 
identified an indicator of shady behavior which includes code and frame object 
access (which includes sys._getframe(), and __code__, both of which are part of 
the original PEP 578).

I looked into it further and realized the CPython's auditing for those 
attributes/objects is superficial. I understand that auditing isn't perfect, 
and I'm not trying to change that. This patch just seems to me like a really 
basic and obvious extension of the existing __code__ and getframe audit points.



I ask that if your main hesitation is the impact of future audit hooks, we use 
this opportunity to establish a basic written precedent we can reference in the 
future about which kind of audit hook modifications are likely to be accepted 
without, say, another PEP.

One possible set of criteria:
 - The added hooks should be justified as functionally identical to something 
the existing PEP(s) suggested.
 - Performance should be measured and it should have very little impact on 
stdlib or general code.
 - The requester should be expected to justify the change, e.g. how it closes 
an obvious gap in an existing PEP 578 hook.

And my answers for those criteria:
 - These are functionally equivalent to the existing PEP 578 hooks for 
sys._getframe() and function.__code__ - they operate on similar types of 
objects and are used for accessing the exact same information.
 - Performance impact here appears to be only for debugging code, and 
performance impact on debugging code is infinitesimal when no audit hook is 
active.
 - I am auditing code for trivial usage of Python frames and code objects, and 
I can't do that sufficiently with the existing hooks (especially so now that 
I'm publicly documenting this gap).



If the primary complaint is maintenance burden, would it be preferable to add 
an attribute audit flag to PyMemberDef instead of using one-off PyGetSetDef 
functions? e.g.:

static PyMemberDef frame_memberlist[] = {
{"f_code",  T_OBJECT,   OFF(f_code),  
READONLY|AUDIT_ACCESS},
}

That would definitely simplify the implementation.

If these additions aren't worth it, I would almost recommend removing or 
deprecating the existing __code__ and sys._getframe() audit hooks instead, as I 
find them to be not very useful without this patch.

--

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-21 Thread Ryan Hileman


Ryan Hileman  added the comment:

How's this for maintainable?

https://github.com/lunixbochs/cpython/commit/2bf1cc93d19a49cbed09b45f7dbb00212229f0a1

--

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-22 Thread Ryan Hileman


Ryan Hileman  added the comment:

I agree that READ_RESTRICTED would work, and I'm strongly in support of 
refactoring my patch around that kind of flag, as it simplifies it quite a bit 
and the if statement is already there.

However, using the seemingly legacy RESTRICTED flag names for audit is 
confusing in my opinion:

- The audit subsystem does something entirely different from the long 
deprecated "Restricted execution" feature (removed in 3.0?)
- Nothing in the stdlib uses RESTRICTED that I can see.
- The documentation for RESTRICTED flags (Doc/extending/newtypes.rst) doesn't 
say anything about the audit system for READ_RESTRICTED, and talks about 
restricted mode as though it still exists.
- RESTRICTED only supports __getattr__ (PY_WRITE_RESTRICTED does nothing at 
all, and there is no delattr equivalent). This doesn't actually matter for this 
patch, it's just confusing in the context of audit, as there are 
`object.__setattr__` and `object.__delattr__` audit points but no corresponding 
flags.

I think it could make sense to:
1. Alias READ_RESTRICTED to a new READ_AUDIT flag and use the latter instead, 
as it is more clear.
2. Update the newtype docs to mention READ_AUDIT and remove documentation for 
the the unused RESTRICTED flags.
3. Deprecate the non-functional RESTRICTED flags if that's possible?
4. Only cross the setattr/delattr audit flag bridge if a future refactor calls 
for it.

--

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-01-22 Thread Ryan Hileman


Ryan Hileman  added the comment:

Just updated the PR with another much simpler attempt, using a new READ_AUDIT 
flag (aliased to READ_RESTRICTED, and newtypes documentation updated).

I re-ran timings for the new build, and in all cases they match or slightly 
beat my previous reported timings.

--

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks

2021-02-22 Thread Ryan Hileman


Ryan Hileman  added the comment:

> Sounds good to me. We can deprecate RESTRICTED with no intention to 
remove it, since it's documented.
> Do you want to prepare a PR for this?

In case you missed it, the attached PR 24182 as of commit d3e998b is based on 
the steps I listed - I moved all of the proposed audited properties over to a 
new AUDIT_READ flag that is much simpler.

--

___
Python tracker 
<https://bugs.python.org/issue42800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com