[issue41299] Python3 threading.Event().wait time is twice as large as Python27
Ryan Hileman added the comment: I just ran into this. GetTickCount64() is a bad choice even without improving the Windows timer resolution, as every mutex wait will have 16ms of jitter. Here are some lock.acquire(timeout=0.001) times measured with time.perf_counter(): elapsed=21.215ms elapsed=30.960ms elapsed=21.686ms elapsed=30.998ms elapsed=30.794ms Here's the same lock.acquire(timeout=0.001) with CPython patched to use QueryPerformanceCounter() instead of GetTickCount64(). Notice this is less overhead than even the original post's Python 2.x times. elapsed=9.554ms elapsed=14.516ms elapsed=13.985ms elapsed=13.434ms elapsed=13.724ms Here's the QueryPerformanceCounter() test in a timeBeginPeriod(1) block: elapsed=1.135ms elapsed=1.204ms elapsed=1.189ms elapsed=1.052ms elapsed=1.052ms I'd like to submit a PR to fix the underlying issue by switching to QueryPerformanceCounter() in EnterNonRecursiveMutex(). QueryInterruptTime() is a bad candidate because it's only supported on Windows 10, and CPython still supports Windows 8. Improvements based on QueryPerformanceCounter() can be backported to at least 3.8 (3.8 dropped Windows XP support, which was the last Windows version where QueryPerformanceCounter() could fail). I checked and the only other use of GetTickCount64() seems to be in time.monotonic(). Honestly I would vote to change time.monotonic() to QueryPerformanceCounter() as well, as QueryPerformanceCounter() can no longer fail on any Windows newer than XP (which is no longer supported by Python), but that's probably a topic for a new BPO. -- nosy: +lunixbochs2 versions: -Python 3.10, Python 3.9 ___ Python tracker <https://bugs.python.org/issue41299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41299] Python3 threading.Event().wait time is twice as large as Python27
Change by Ryan Hileman : -- versions: +Python 3.10, Python 3.11, Python 3.9 ___ Python tracker <https://bugs.python.org/issue41299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44328] time.monotonic() should use QueryPerformanceCounter() on Windows
New submission from Ryan Hileman : Related to https://bugs.python.org/issue41299#msg395220 Presumably `time.monotonic()` on Windows historically used GetTickCount64() because QueryPerformanceCounter() could fail. However, that hasn't been the case since Windows XP: https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter > On systems that run Windows XP or later, the function will always succeed and > will thus never return zero I've run into issues with this when porting python-based applications to Windows. On other platforms, time.monotonic() was a decent precision so I used it. When I ported to Windows, I had to replace all of my time.monotonic() calls with time.perf_counter(). I would pretty much never knowingly call time.monotonic() if I knew ahead of time it could be quantized to 16ms. My opinion is that the GetTickCount64() monotonic time code in CPython should be removed entirely and only the QueryPerformanceCounter() path should be used. I also think some of the failure checks could be removed from QueryPerformanceCounter() / QueryPerformanceFrequency(), as they're documented to never fail in modern Windows and CPython has been dropping support for older versions of Windows, but that's less of a firm opinion. -- components: Library (Lib), Windows messages: 395221 nosy: lunixbochs2, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: time.monotonic() should use QueryPerformanceCounter() on Windows type: performance versions: Python 3.10, Python 3.11, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue44328> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41299] Python3 threading.Event().wait time is twice as large as Python27
Change by Ryan Hileman : -- keywords: +patch pull_requests: +25157 stage: -> patch review pull_request: https://github.com/python/cpython/pull/26568 ___ Python tracker <https://bugs.python.org/issue41299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41299] Python3 threading.Event().wait time is twice as large as Python27
Ryan Hileman added the comment: Ok, I filed a PR for this. I used pytime's interface to avoid duplicating the QueryPerformanceFrequency() code. I found a StackOverflow answer that says QueryPerformance functions will only fail if you pass in an unaligned pointer: https://stackoverflow.com/a/27258700 Per that, I used Py_FatalError to catch this case, as there is probably something wrong with the compiler at that point, and the other recovery options would be likely to result in incorrect program behavior (e.g. dead lock). -- ___ Python tracker <https://bugs.python.org/issue41299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44328] time.monotonic() should use a different clock source on Windows
Ryan Hileman added the comment: I found these two references: - https://stackoverflow.com/questions/35601880/windows-timing-drift-of-performancecounter-c - https://bugs.python.org/issue10278#msg143209 Which suggest QueryPerformanceCounter() may be bad because it can drift. However, these posts are fairly old and the StackOverflow post also says the drift is small on newer hardware / Windows. Microsoft's current stance is that QueryPerformanceCounter() is good: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps > Guidance for acquiring time stamps > Windows has and will continue to invest in providing a reliable and efficient > performance counter. When you need time stamps with a resolution of 1 > microsecond or better and you don't need the time stamps to be synchronized > to an external time reference, choose QueryPerformanceCounter I looked into how a few other languages provide monotonic time on Windows: Golang seems to read the interrupt time (presumably equivalent to QueryInterruptTime) directly by address. https://github.com/golang/go/blob/a3868028ac8470d1ab7782614707bb90925e7fe3/src/runtime/sys_windows_amd64.s#L499 Rust uses QueryPerformanceCounter: https://github.com/rust-lang/rust/blob/38ec87c1885c62ed8c66320ad24c7e535535e4bd/library/std/src/time.rs#L91 V8 uses QueryPerformanceCounter after checking for old CPUs: https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672 Ruby uses QueryPerformanceCounter: https://github.com/ruby/ruby/blob/44cff500a0ad565952e84935bc98523c36a91b06/win32/win32.c#L4712 C# implements QueryPerformanceCounter on other platforms using CLOCK_MONOTONIC, indicating that they should be roughly equivalent: https://github.com/dotnet/runtime/blob/01b7e73cd378145264a7cb7a09365b41ed42b240/src/coreclr/pal/src/misc/time.cpp#L175 Swift originally used QueryPerformanceCounter, but switched to QueryUnbiasedInterruptTime() because they didn't want to count time the system spent asleep: https://github.com/apple/swift-corelibs-libdispatch/commit/766d64719cfdd07f97841092bec596669261a16f -- Note that none of these languages use GetTickCount64(). Swift is an interesting counter point, and I noticed QueryUnbiasedInterruptTime() is available on Windows 8 while QueryInterruptTime() is new as of Windows 10. The "Unbiased" just refers to whether it advances during sleep. I'm not actually sure whether time.monotonic() in Python counts time spent asleep, or whether that's desirable. Some kinds of timers using monotonic time should definitely freeze during sleep so they don't cause a flurry of activity on wake, but others definitely need to roughly track wall clock time, even during sleep. Perhaps the long term answer would be to introduce separate "asleep" and "awake" monotonic clocks in Python, and possibly deprecate perf_counter() if it's redundant after this (as I think it's aliased to monotonic() on non-Windows platforms anyway). -- title: time.monotonic() should use QueryPerformanceCounter() on Windows -> time.monotonic() should use a different clock source on Windows ___ Python tracker <https://bugs.python.org/issue44328> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44328] time.monotonic() should use a different clock source on Windows
Ryan Hileman added the comment: Great information, thanks! > Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid > solution. It uses the performance counter to interpolate a timestamp between > interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if > it's available via GetProcAddress() My personal vote is to use the currently most common clock source (QPC) for now for monotonic(), because it's the same across Windows versions and the most likely to produce portable monotonic timestamps between apps/languages on the same system. It's also the easiest patch, as there's already a code path for QPC. (As someone building multi-app experiences around Python, I don't want to check the Windows version to see which time base Python is using. I'd feel better about switching to QITP() if/when Python drops Windows 8 support.) A later extension of this idea (maybe behind a PEP) could be to survey the existing timers available on each platform and consider whether it's worth extending `time` to expose them all, and unify cross-platform the ones that are exposed (e.g. better formalize/document which clocks will advance while the machine is asleep on each platform). -- ___ Python tracker <https://bugs.python.org/issue44328> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44328] time.monotonic() should use a different clock source on Windows
Ryan Hileman added the comment: I think a lot of that is based on very outdated information. It's worth reading this article: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps I will repeat Microsoft's current recommendation (from that article): > Windows has and will continue to invest in providing a reliable and efficient > performance counter. When you need time stamps with a resolution of 1 > microsecond or better and you don't need the time stamps to be synchronized > to an external time reference, choose QueryPerformanceCounter, > KeQueryPerformanceCounter, or KeQueryInterruptTimePrecise. When you need > UTC-synchronized time stamps with a resolution of 1 microsecond or better, > choose GetSystemTimePreciseAsFileTime or KeQuerySystemTimePrecise. (Based on that, it may also be worth replacing time.time()'s GetSystemTimeAsFileTime with GetSystemTimePreciseAsFileTime in CPython, as GetSystemTimePreciseAsFileTime is available in Windows 8 and newer) PEP 418: > It has a much higher resolution, but has lower long term precision than > GetTickCount() and timeGetTime() clocks. For example, it will drift compared > to the low precision clocks. Microsoft on drift (from the article above): > To reduce the adverse effects of this frequency offset error, recent versions > of Windows, particularly Windows 8, use multiple hardware timers to detect > the frequency offset and compensate for it to the extent possible. This > calibration process is performed when Windows is started. Modern Windows also automatically detects and works around stoppable TSC, as well as several other issues: > Some processors can vary the frequency of the TSC clock or stop the > advancement of the TSC register, which makes the TSC unsuitable for timing > purposes on these processors. These processors are said to have non-invariant > TSC registers. (Windows will automatically detect this, and select an > alternative time source for QPC). It seems like Microsoft considers QPC to be a significantly better time source now, than when PEP 418 was written. Another related conversation is whether Python can just expose all of the Windows clocks directly (through clock_gettime enums?), as that gives anyone who really wants full control over their timestamps a good escape hatch. -- ___ Python tracker <https://bugs.python.org/issue44328> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44328] time.monotonic() should use a different clock source on Windows
Ryan Hileman added the comment: > The monotonic clock should thus be based on QueryUnbiasedInterruptTime My primary complaint here is that Windows is the only major platform with a low resolution monotonic clock. Using QueryUnbiasedInterruptTime() on older OS versions wouldn't entirely help that, so we have the same consistency issue (just on a smaller scale). I would personally need to still use time.perf_counter() instead of time.monotonic() due to this, but I'm not totally against it. > For consistency, an external deadline (e.g. for SIGINT support) should work > the same way. Have there been any issues filed about the deadline behaviors across system suspend? > which I presume includes most users of Python 3.9+ Seems like Windows 7 may need to be considered as well, as per vstinner's bpo-32592 mention? > starting with Windows 8, WaitForSingleObject() and WaitForMultipleObjects() > exclude time when the system is suspended Looks like Linux (CLOCK_MONOTONIC) and macOS (mach_absolute_time()) already don't track suspend time in time.monotonic(). I think that's enough to suggest that long-term Windows shouldn't either, but I don't know how to reconcile that with my desire for Windows not to be the only platform with low resolution monotonic time by default. > then the change to use QueryPerformanceCounter() to resolve bpo-41299 should > be reverted. The deadline should instead be computed with > QueryUnbiasedInterruptTime() I don't agree with this, as it would regress the fix. This is more of a topic for bpo-41299, but I tested QueryUnbiasedInterruptTime() and it exhibits the same 16ms jitter as GetTickCount64() (which I expected), so non-precise interrupt time can't solve this issue. I do think QueryUnbiasedInterruptTimePrecise() would be a good fit. I think making this particular timeout unbiased (which would be a new behavior) should be a lower priority than making it not jitter. > For Windows we could implement the following clocks: I think that list is great and making those enums work with clock_gettime on Windows sounds like a very clear improvement to the timing options available. Having the ability to query each clock source directly would also reduce the impact if time.monotonic() does not perfectly suit a specific application. --- I think my current positions after writing all of this are: - I would probably be in support of a 3.11+ change for time.monotonic() to use QueryUnbiasedInterruptTime() pre-Windows 10, and dynamically use QueryUnbiasedInterruptTimePrecise() on Windows 10+. Ideally the Windows clock_gettime() code lands in the same release, so users can directly pick their time source if necessary. This approach also helps my goal of making time.monotonic()'s suspend behavior more uniform across platforms. - Please don't revert bpo-41299 (especially the backports), as it does fix the issue and tracking suspend time is the same (not a regression) as the previous GetTickCount64() code. I think the lock timeouts should stick with QPC pre-Windows-10 to fix the jitter, but could use QueryUnbiasedInterruptTimePrecise() on Windows 10+ (which needs the same runtime check as the time.monotonic() change, thus could probably land in the same patch set). - I'm honestly left with more questions than I started after diving into the GetSystemTimePreciseAsFileTime() rabbit hole. I assume it's not a catastrophic issue? Maybe it's a situation where adding the clock_gettime() enums would sufficiently help anyone who cares about the exact behavior during clock modification. I don't have strong opinions about it, besides it being a shame that Windows currently has lower precision timestamps in general. Could be worth doing a survey of other languages' choices, but any further discussion can probably go to bpo-19007. -- ___ Python tracker <https://bugs.python.org/issue44328> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41299] Python3 threading.Event().wait time is twice as large as Python27
Ryan Hileman added the comment: I agree with not throwing fatal errors, but that check is unlikely to actually be hit, and you removed the startup checks covering the underlying clocks here: https://github.com/python/cpython/commit/ae6cd7cfdab0599139002c526953d907696d9eef I think if the time source is broken, a user would probably prefer an exception or fatal error to the deadlock they will otherwise get (as time not advancing means it's impossible to timeout), so it doesn't make sense to remove the check without doing something else about it. There are three places win_perf_counter_frequency() can fail: https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Python/pytime.c#L974 I mention the QueryPerformanceFrequency error case here (stack misalignment): https://bugs.python.org/issue41299#msg395237 The other option, besides startup checks, would be to better parameterize the timer used (mentioned in bpo-44328): Prefer QueryUnbiasedInterruptTimePrecise if available (Win 10+ via GetProcAddress), then QPC, then QueryUnbiasedInterruptTime (which has the original bug wrt jitter but should never be used as QPC is unlikely to be broken). -- ___ Python tracker <https://bugs.python.org/issue41299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41299] Python3 threading.Event().wait time is twice as large as Python27
Ryan Hileman added the comment: Perhaps the simplest initial fix would be to move that check down to PyThread__init_thread() in the same file. I'm not sure what the cpython convention for that kind of init error is, would it just be the same Py_FatalError block or is there a better pattern? -- ___ Python tracker <https://bugs.python.org/issue41299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44328] time.monotonic() should use a different clock source on Windows
Ryan Hileman added the comment: > It shouldn't behave drastically different just because the user closed the > laptop lid for an hour I talked to someone who's been helping with the Go time APIs and it seems like that holds pretty well for interactive timeouts, but makes no sense for network related code. If you lost a network connection (with, say, a 30 second timeout) due to the lid being closed, you don't want to wait 30 seconds after opening the lid for the application to realize it needs to reconnect. (However there's probably no good way to design Python's locking system around both cases, so it's sufficient to say "lock timers won't advance during suspend" and make the application layer work around that on its own in the case of network code) > Try changing EnterNonRecursiveMutex() to break out of the loop in this case This does work, but unfortunately a little too well - in a single test I saw several instances where that approach returned _earlier_ than the timeout. I assume the reason for this loop is the call can get interrupted with a "needs retry" state. If so, you'd still see 16ms of jitter anytime that happens as long as it's backed by a quantized time source. -- ___ Python tracker <https://bugs.python.org/issue44328> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41299] Python3 threading.Event().wait time is twice as large as Python27
Ryan Hileman added the comment: > Do you think that pytime.c has the bug? I don't think so. No, a misaligned stack would be an issue in the caller or compiler, not pytime.c. I have hit misaligned stack in practice, but it should be rare enough to check on init only. > In theory yes, in practice we got zero error reports. So it sounds like it > cannot happen. > I don't think that it's a good practice to start to add checks in all > functions using a clock "just in case" the clock might fail. My read is that as long as we're not confident enough to remove those checks from pytime.c, a caller should assume they're reachable. If the pytime checks need to stay, adding a Windows only pytime init check to make sure that locks won't deadlock sounds fine to me. -- ___ Python tracker <https://bugs.python.org/issue41299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: traceback's `tb_code` attribute also allows you to bypass the `object.__getattr__` audit event for `__code__`. Perhaps accessing a traceback object's `tb_code` and `tb_frame` should both raise an `object.__getattr__` event? -- nosy: +lunixbochs2 ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: I'm definitely not proposing to hook all of object.__getattr__, as my intuition says that would be very slow. I simply refer to "object.__getattr__" as the event name used by a couple of rare event audit hooks. This is how getting __code__ is emitted: https://github.com/python/cpython/blob/7301979b23406220510dd2c7934a21b41b647119/Objects/funcobject.c#L250 However, there's not much point in the sys._getframe and func.__code__ family of audit hooks right now as tracebacks expose the same information (and may even do so accidentally). I am personally interested in these hooks for non sandbox reasons in a production application that cares about perf, FWIW. I think this would be implemented by extending the traceback object's getters to include tb_code and tb_frame: https://github.com/python/cpython/blob/7301979b23406220510dd2c7934a21b41b647119/Python/traceback.c#L156-L159 I project it won't have any noticeable perf impact (especially if the audit hook is written in C), as most reasons to inspect a traceback object will be exceptional and not in critical paths. I'd be happy to write a proposed patch if that would help. -- ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: Oops, by tb_code I meant traceback.tb_frame.f_code. So you can get to a frame from traceback.tb_frame (without triggering audit) or sys._getframe (which has an audit hook already), and you can get to __code__ from a frame via frame.f_code (without triggering audit). Here's a patch for both frame.f_code and traceback.tb_frame: https://github.com/lunixbochs/cpython/commit/2334a00c833874b7a2427e88abc9b51315bb010c --- Benchmarks follow this section, made using the commit I linked (and the parent commit without the patch for comparison). My takeaways from playing around: 1. You probably shouldn't install a Python audit hook if you care about performance. 2. C audit hook performance impact shows up in microbenchmarking but only have a small impact on real workloads (see the traceback.format_tb benchmark at the end). 3. Performance impact of this change when you _don't_ have an audit hook installed is very small. 4. This seems to mostly impact debugging and test code. A quick check of the stdlib shows: - traceback.tb_frame usage seems to be entirely in debugger, traceback, and testing code: https://github.com/python/cpython/search?l=Python&p=3&q=tb_frame - frame.f_code primarily has similar debug use (dis, warnings, profiling, inspect): https://github.com/python/cpython/search?l=Python&p=3&q=f_code Attached (c_audit_ext.zip) is the empty C audit hook I used for the benchmarks. `python3 setup.py build_ext` builds a `c_audit` module which registers an empty audit hook on import. frame.f_code object.__getattr__ audit hook # Testing frame.f_code impact (no audit hook installed): ./python.exe -m timeit -s 'frame = sys._getframe()' -- 'frame.f_code' with patch 2334a00c833874b7a2427e88abc9b51315bb010c 2000 loops, best of 5: 19.1 nsec per loop 2000 loops, best of 5: 18.7 nsec per loop 2000 loops, best of 5: 19.1 nsec per loop without patch 2334a00c833874b7a2427e88abc9b51315bb010c 2000 loops, best of 5: 17 nsec per loop 2000 loops, best of 5: 16.7 nsec per loop 2000 loops, best of 5: 17 nsec per loop # Testing frame.f_code impact (C audit hook installed): python.exe -m timeit -s 'import c_audit; frame = sys._getframe()' -- 'frame.f_code' with patch 2334a00c833874b7a2427e88abc9b51315bb010c 500 loops, best of 5: 66.1 nsec per loop 500 loops, best of 5: 66.1 nsec per loop 500 loops, best of 5: 66.5 nsec per loop without patch 2334a00c833874b7a2427e88abc9b51315bb010c 2000 loops, best of 5: 16.9 nsec per loop 2000 loops, best of 5: 16.9 nsec per loop 2000 loops, best of 5: 16.8 nsec per loop # Testing frame.f_code impact (pure Python audit hook installed): ./python.exe -m timeit -s 'frame = sys._getframe(); sys.addaudithook(lambda *a: None)' -- 'frame.f_code' with patch 2334a00c833874b7a2427e88abc9b51315bb010c 50 loops, best of 5: 1.02 usec per loop 50 loops, best of 5: 1.04 usec per loop 50 loops, best of 5: 1.02 usec per loop without patch 2334a00c833874b7a2427e88abc9b51315bb010c 2000 loops, best of 5: 16.8 nsec per loop 2000 loops, best of 5: 17.1 nsec per loop 2000 loops, best of 5: 16.8 nsec per loop tb.tb_frame object.__getattr__ audit hook # Testing tb.tb_frame impact (no audit hook installed) ./python.exe -m timeit -s "$(echo -e "try: a\nexcept Exception as e: tb = e.__traceback__")" -- 'tb.tb_frame' with patch 2334a00c833874b7a2427e88abc9b51315bb010c 2000 loops, best of 5: 19.2 nsec per loop 2000 loops, best of 5: 18.9 nsec per loop 2000 loops, best of 5: 18.9 nsec per loop without patch 2334a00c833874b7a2427e88abc9b51315bb010c 2000 loops, best of 5: 17 nsec per loop 2000 loops, best of 5: 16.7 nsec per loop 2000 loops, best of 5: 16.8 nsec per loop # Testing tb.tb_frame impact (C audit hook installed) ./python.exe -m timeit -s "$(echo -e "import c_audit\ntry: a\nexcept Exception as e: tb = e.__traceback__")" -- 'tb.tb_frame' with patch 2334a00c833874b7a2427e88abc9b51315bb010c 500 loops, best of 5: 64.8 nsec per loop 500 loops, best of 5: 64.8 nsec per loop 500 loops, best of 5: 64.8 nsec per loop without patch 2334a00c833874b7a2427e88abc9b51315bb010c 2000 loops, best of 5: 16.7 nsec per loop 2000 loops, best of 5: 16.9 nsec per loop 2000 loops, best of 5: 16.9 nsec per loop # Testing tb.tb_frame impact (pure Python audit hook installed) ./python.exe -m timeit -s "$(echo -e "sys.addaudithook(lambda *a: None)\ntry: a\nexcept Exception as e: tb = e.__traceback__")" -- 'tb.tb_frame' with patch 2334a00c833874b7a2427e88abc9b51315bb010c 50 loops, best of 5: 1.04 usec per loop
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Change by Ryan Hileman : -- keywords: +patch pull_requests: +23010 stage: -> patch review pull_request: https://github.com/python/cpython/pull/24182 ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: PR submitted, waiting on CLA process. I added documentation at the field sites, but the audit event table generation does not handle attributes or object.__getattr__ very well at all, so I'm not updating the audit table for now. The `.. audit-event:: object.__getattr__ obj,name frame-objects` sphinx directive right now just inserts a canned string """Raises an :ref:`auditing event ` object.__getattr__ with arguments obj,name.""", which would need additional boilerplate to describe these attributes properly. It also only adds a footnote style link to the audit table under __getattr__, and even moves object.__getattribute__ from the first [1] link position to a later number which is IMO is more confusing than not even linking them. I think to make attributes look good in the table we would need a special sphinx directive for audited object.__getattr__ attributes, for example by modifying the template generator to fit each attribute on its own line under object.__getattr__ in the table. For now I did not use the audit-event sphinx directive and manually inserted strings like this near the original attribute description in the docs: """Accessing ``f_code`` raises an :ref:`auditing event ` ``object.__getattr__`` with arguments ``obj`` and ``"f_code"``.""" I think audit table improvements should be handled in a separate issue, and by someone more familiar with that part of the doc generator, as cleaning it up looks like maybe a bigger scope than the current contribution. -- ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: I just found out that generator object variants have their own code attributes. I investigated the stdlib usage and it seems to be for debug / dis only, so adding these attributes shouldn't impact performance. I updated the PR to now cover the following attributes: PyTracebackObject.tb_frame PyFrameObject.f_code PyGenObject.gi_code PyCoroObject.cr_code PyAsyncGenObject.ag_code I have also attached a `check_hooks.py` file which allows for quick visual inspection that all of the hooks are working (It prints each attribute name, then accesses it. Expected output is an AUDIT line after each attribute printed.) -- Added file: https://bugs.python.org/file49755/check_hooks.py ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: My personal motivation is not to unilaterally prevent access to globals, but to close a simpler gap in the audit system that affects a currently deployed high performance production system (which is not trying to be a sandbox). I am also already using a C audit hook for my purposes. If you are referencing vstinner's first message, please remember to read their follow up https://bugs.python.org/msg384988 where they seem to have changed their mind in support of the patch. The audit attributes I'm chasing here are fairly small in scope, and overwhelmingly only used in debug code. I believe adding them is in the spirit of the original PEP. I have also done extensive testing and CPython C and stdlib code analysis as part of this effort. If you agree with the original PEP authors that __code__ and sys._getframe() are worth auditing, then I believe this is a natural extension of that concept. My patch improves upon the PEP by increasing the audit coverage to every way I can see of getting a frame and code object from basic CPython types. This is a simple patch with clear performance metrics. I don't see any reason to expand the scope of this in the future unless CPython adds another basic object type along the same lines (e.g. a new async function type, a new traceback type, or a new frame type). -- ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: My understanding as per the outline in PEP 551 as well as PEP 578, is that the audit system is meant primarily to observe the behavior of code rather than to have good sandbox coverage / directly prevent behavior. I am using audit hooks to observe the behavior of third party Python, and I identified an indicator of shady behavior which includes code and frame object access (which includes sys._getframe(), and __code__, both of which are part of the original PEP 578). I looked into it further and realized the CPython's auditing for those attributes/objects is superficial. I understand that auditing isn't perfect, and I'm not trying to change that. This patch just seems to me like a really basic and obvious extension of the existing __code__ and getframe audit points. I ask that if your main hesitation is the impact of future audit hooks, we use this opportunity to establish a basic written precedent we can reference in the future about which kind of audit hook modifications are likely to be accepted without, say, another PEP. One possible set of criteria: - The added hooks should be justified as functionally identical to something the existing PEP(s) suggested. - Performance should be measured and it should have very little impact on stdlib or general code. - The requester should be expected to justify the change, e.g. how it closes an obvious gap in an existing PEP 578 hook. And my answers for those criteria: - These are functionally equivalent to the existing PEP 578 hooks for sys._getframe() and function.__code__ - they operate on similar types of objects and are used for accessing the exact same information. - Performance impact here appears to be only for debugging code, and performance impact on debugging code is infinitesimal when no audit hook is active. - I am auditing code for trivial usage of Python frames and code objects, and I can't do that sufficiently with the existing hooks (especially so now that I'm publicly documenting this gap). If the primary complaint is maintenance burden, would it be preferable to add an attribute audit flag to PyMemberDef instead of using one-off PyGetSetDef functions? e.g.: static PyMemberDef frame_memberlist[] = { {"f_code", T_OBJECT, OFF(f_code), READONLY|AUDIT_ACCESS}, } That would definitely simplify the implementation. If these additions aren't worth it, I would almost recommend removing or deprecating the existing __code__ and sys._getframe() audit hooks instead, as I find them to be not very useful without this patch. -- ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: How's this for maintainable? https://github.com/lunixbochs/cpython/commit/2bf1cc93d19a49cbed09b45f7dbb00212229f0a1 -- ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: I agree that READ_RESTRICTED would work, and I'm strongly in support of refactoring my patch around that kind of flag, as it simplifies it quite a bit and the if statement is already there. However, using the seemingly legacy RESTRICTED flag names for audit is confusing in my opinion: - The audit subsystem does something entirely different from the long deprecated "Restricted execution" feature (removed in 3.0?) - Nothing in the stdlib uses RESTRICTED that I can see. - The documentation for RESTRICTED flags (Doc/extending/newtypes.rst) doesn't say anything about the audit system for READ_RESTRICTED, and talks about restricted mode as though it still exists. - RESTRICTED only supports __getattr__ (PY_WRITE_RESTRICTED does nothing at all, and there is no delattr equivalent). This doesn't actually matter for this patch, it's just confusing in the context of audit, as there are `object.__setattr__` and `object.__delattr__` audit points but no corresponding flags. I think it could make sense to: 1. Alias READ_RESTRICTED to a new READ_AUDIT flag and use the latter instead, as it is more clear. 2. Update the newtype docs to mention READ_AUDIT and remove documentation for the the unused RESTRICTED flags. 3. Deprecate the non-functional RESTRICTED flags if that's possible? 4. Only cross the setattr/delattr audit flag bridge if a future refactor calls for it. -- ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: Just updated the PR with another much simpler attempt, using a new READ_AUDIT flag (aliased to READ_RESTRICTED, and newtypes documentation updated). I re-ran timings for the new build, and in all cases they match or slightly beat my previous reported timings. -- ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42800] Traceback objects allow accessing frame objects without triggering audit hooks
Ryan Hileman added the comment: > Sounds good to me. We can deprecate RESTRICTED with no intention to remove it, since it's documented. > Do you want to prepare a PR for this? In case you missed it, the attached PR 24182 as of commit d3e998b is based on the steps I listed - I moved all of the proposed audited properties over to a new AUDIT_READ flag that is much simpler. -- ___ Python tracker <https://bugs.python.org/issue42800> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com