[Python-Dev] Re: Proliferation of tstate arguments.
On 17/03/2020 7:00 pm, Steve Dower wrote: On 17Mar2020 1803, Chris Angelico wrote: On Wed, Mar 18, 2020 at 3:50 AM Mark Shannon wrote: The accessibility of a thread-local variable is a strict superset of that of a function-local variable. Therefore storing the thread state in a thread-local variable is at least as capable as passing thread-state as a parameter. And by that logic, globals are even more capable. I don't understand your point. Isn't the purpose of the tstate parameters to avoid the problem of being unable to have multiple tstates within the same OS thread? I think I've missed something here. You haven't. Separating the Python thread from the "physical" thread is indeed the point. That seems like a strawman argument; maybe I'm misunderstanding Steve. It seems to me that Steve is implying that using thread-local variables somehow prevents that separation. It does not. The point I'm making is that adding `tstate` parameters everywhere is unnecessary. Using a thread local variable is much less intrusive and is at least as capable. Cheers, Mark. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CLRCMGUC5LO3JOFGB2QG6DCJRTVVC3A4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Proliferation of tstate arguments.
Chris Angelico [mailto:ros...@gmail.com]: > And by that logic, globals are even more capable. I don't understand your > point. Isn't the purpose of the tstate parameters to avoid the problem of > being unable to have multiple tstates within the same OS thread? I think I've > missed something here. The point is that because threads can't preempt themselves, this: /*1*/ { Py_something(other_tstate); } and this: /*2*/ { PyInterpreterState* old_tstate = tstate; tstate = other_state; Py_something(); tstate = old_tstate; } is effectively equivalent, provided tstate is thread-local. Both will work equally well from the hypothetical C callback that wants to use a different subinterpreter. That wouldn't be true if tstate was process-wide, because that would be a race condition, some other thread might change tstate. But if tstate is thread-local, there's no race condition. Obviously /*1*/ is cleaner code than /*2*/, but /*2*/ is perfectly sound all the same. regards, Anders ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NAD6DEOU2EYLBKOK3QACSP3MUMJ4NTOH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Proliferation of tstate arguments.
On Wed, 18 Mar 2020 13:35:16 + Anders Munch wrote: > Chris Angelico [mailto:ros...@gmail.com]: > > And by that logic, globals are even more capable. I don't understand your > > point. Isn't the purpose of the tstate parameters to avoid the problem of > > being unable to have multiple tstates within the same OS thread? I think > > I've > > missed something here. > > The point is that because threads can't preempt themselves, this: > > /*1*/ > { > Py_something(other_tstate); > } > > and this: > > /*2*/ > { >PyInterpreterState* old_tstate = tstate; >tstate = other_state; >Py_something(); >tstate = old_tstate; > } > > is effectively equivalent, provided tstate is thread-local. Both will work > equally well from the hypothetical C callback that wants to use a different > subinterpreter. This example is mixing up the notion of interpreter state and thread state. Here is a more realistic example: struct UserData { PyInterpreterState* interp; }; // Callback given to a third-party C library void my_c_callback(void* opaque) { struct UserData* user_data = (struct UserData*) opaque; PyGILState_STATE gstate; gstate = PyInterpreter_GILStateEnsure(user_data->interp); // ... PyInterpreter_GILStateEnsure(user_data->interp, gstate); } Of course this implies the existence of a PyInterpreter_GILState API that currently doesn't exist. In this model, the "thread-local" thread state is a *per-interpreter* thread-local. A *process-wide* thread-local thread state cannot work if we want -- in some distant future -- to be able to run several subinterpreters concurrently. In any case, the amount of disagreement and/or misunderstanding in this discussion is a strong hint that it needs a PEP to hash things out and explain them clearly, IMHO. Regards Antoine. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EKZJYO75QBDKZA4A2RR43PKDYEJRD7HC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Proliferation of tstate arguments.
Antoine Pitrou [mailto:solip...@pitrou.net]: > This example is mixing up the notion of interpreter state and thread state. Sorry about that, I was making a more general point and not paying so much attention to the specific names. The general point is this: A thread-local variable can work as an implied parameter without creating synchronisation problems. You do not have to worry about painting yourself into an architectural corner by using thread-local variables, because anything you can do with an explicit parameter, you could also have done with an implied one in thread-local memory. Correctness is not at stake. Either approach will work. regards, Anders ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CUZQ4NYBPY25M5ISJ5JT66XMW7KWP4DJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Proliferation of tstate arguments.
> On 17 Mar 2020, at 16:43, Mark Shannon wrote: > > > > On 17/03/2020 3:38 pm, Steve Dower wrote: >> On 17Mar2020 1447, Mark Shannon wrote: >>> On 16/03/2020 3:04 pm, Victor Stinner wrote: In short, the answer is yes. >>> >>> I said "no" then and gave reasons. AFAICT no one has faulted my reasoning. >> I said "yes" then and was also not faulted. > > I'll do that now then ;) > > The accessibility of a thread-local variable is a strict superset of that of > a function-local variable. > > Therefore storing the thread state in a thread-local variable is at least as > capable as passing thread-state as a parameter. > >>> Let me reiterate why using a thread-local variable is better than passing >>> the thread state down the C stack. >>> >>> 1. Using a thread-local variable for the thread state requires much smaller >>> changes to the code base. >> Using thread-local variables enforces a threading model on the host >> application, rather than working with the existing threading model. So >> anyone embedding is forced into *significantly* more code as a result. > > Putting a value in a function-local variable enforces stronger restrictions > than putting it in a thread-local variable. > > I am proposing that we *don't* change the API. How does that make more work > for anyone using the API? Are you saying that all interpreters can use the same thread-local-variable for tstate? Or do you need N thread-local-variables for N interpreters? Barry > >> We can (and should) maintain a public-facing API that uses TLS to pass the >> current thread state around - we have compatibility constraints. But we can >> also add private versions that take the thread state (once you've started >> trying/struggling to really embed CPython, you'll happily take a dependency >> on "private" APIs). > > Again, I am requesting that we *don't* change the API. > Not changing the API maintains backwards compatibility better than changing > it, surely. > >> If the only available API requires TLS, then you're likely to see the caller >> wrap it all up in a function that updates TLS before calling. Or >> alternatively, introduce dedicated threads for running Python snippets on, >> and all the (dead)locking that results (yes, I've done both). > > All the platforms that we support have thread-local storage. > If a platform doesn't have threads at all, then thread-local just degenerates > to a global. > >> Our goal as core CPython developers should be to sacrifice our own effort to >> reduce the effort needed by our users, not to do things that make our own >> lives easier but harm them. > > Indeed. We might want to speed Python up a bit as well :) > >>> 2. Using a thread-local variable is less error prone. When passing tstate >>> as a parameter, what happens if the tstate argument is from a different >>> thread or is NULL? Are you adding checks for those cases? >>> What are the performance implications of adding those checks? >> Undefined behaviour is totally acceptable here. We can assert in debug >> builds - developers who make use of this can test with debug builds. > > I'm not sure what your point is about undefined behaviour. > >>> 3. Using a thread-local variable is likely to be a little bit faster. >>> Passing an argument down the stack increases register pressure and spills. >>> Accessing a thread-local is slower at the point of access, but the cost is >>> incurred only when it is needed, so is cheaper overall. >> Compilers can optimise parameters/locals in ways that are far more efficient >> than they can do for anything stored outside the call stack. Especially for >> internal calls. Going through public/exported functions is a little more >> restricted in terms of optimisations, but if we identify an issue here then >> we can work on that then. > > Please skip the patronizing "how compilers work" stuff. > I know how register allocators work. > >> [OTHER POST] >>> Just to be clear, this is what I mean by a thread local variable: >>> https://godbolt.org/z/dpSo-Q >> Showing what one particular compiler generates for one particular situation >> is terrible information (I won't bother calling it "evidence"). > > The particular situation is the use of a thread-local variable, which is the > point under discussion. > > Here's the links for clang and MSVC: > > https://godbolt.org/z/YnbbqD > https://www.godbolt.ms/z/9nQEqf > One motivation is to ease the implementation of subinterpreters (PEP 554). But PEP 554 describes more than public API than the implementation. >>> >>> I don't see how this eases the implementation of subinterpreters. >>> Surely it makes it harder by causing merge conflicts. >> That's a very selfish point-of-view :) > > Why? Merge conflicts are a problem for everyone. > >> It eases it because many more operations need to know the current Python >> "thread" in order to access things that used to be globals, such as >> PyTypeObject instance
[Python-Dev] Re: Proliferation of tstate arguments.
Mark Shannon wrote: > The point I'm making is that adding `tstate` parameters everywhere is > unnecessary. Using a thread local variable is much less intrusive and is > at least as capable. Objectively speaking, what would be the specific difference(s) in behavior and performance between using a thread-local variable vs passing a tstate parameter? A more in-depth "pros vs cons" analysis of each approach might help to make the argument points more clear, as it seems some of the parties involved in the discussion are talking past each other to some degree, or at least aren't completely on the same page. It might also help to find a specific C-API function to benchmark, to show how substantial the supposed performance differences could be. Intuitively, it seems like passing around an extra parameter would add some penalty, but it's not clear to me as to how much that would *realistically* add in performance cost compared to accessing the threadstate through a thread-local variable. It seems like it depends on exactly how often `_ThreadState_GET()` would have to be called, but that's not at all obvious considering the scope of the changes. > Using a thread local variable is much less intrusive and is > at least as capable. I'm not sure that I'm clear on how the tstate parameter would be additionally intrusive, considering that the changes only affect the internal parts of the C-API. I would agree if it were to be arbitrarily added to public API since it would break backwards compatibility, but that doesn't seem to be the case here. From the perspective of being intrusive, it seems like it's adding some code churn and a potential merge conflict to resolve locally. That seems like an inconvenience to us rather than something that would be intrusive to users. Also, regarding the thread-local variable example demonstrated w/ godbolt (thanks for providing those :-) ), it seems like it would be very clear to access the *current* threadstate that was created by the *main* interpreter, but how would you go about resolving the following?: 1) A different threadstate from another interpreter and OS thread 2) A different threadstate created from the same OS thread, but from a different interpreter (as Antoine mentioned earlier) >From my understanding, those scenarios seem to be a significant concern in the context of subinterpreters. If we could also see example(s) which address those scenarios with a thread-local variable instead of a tstate parameter, I think it would allow for more objective comparison between them. Regards, Kyle Stanley On Wed, Mar 18, 2020 at 6:36 AM Mark Shannon wrote: > > > On 17/03/2020 7:00 pm, Steve Dower wrote: > > On 17Mar2020 1803, Chris Angelico wrote: > >> On Wed, Mar 18, 2020 at 3:50 AM Mark Shannon wrote: > >>> The accessibility of a thread-local variable is a strict superset of > >>> that of a function-local variable. > >>> > >>> Therefore storing the thread state in a thread-local variable is at > >>> least as capable as passing thread-state as a parameter. > >>> > >> > >> And by that logic, globals are even more capable. I don't understand > >> your point. Isn't the purpose of the tstate parameters to avoid the > >> problem of being unable to have multiple tstates within the same OS > >> thread? I think I've missed something here. > > > > You haven't. Separating the Python thread from the "physical" thread is > > indeed the point. > > That seems like a strawman argument; maybe I'm misunderstanding Steve. > It seems to me that Steve is implying that using thread-local variables > somehow prevents that separation. It does not. > > The point I'm making is that adding `tstate` parameters everywhere is > unnecessary. Using a thread local variable is much less intrusive and is > at least as capable. > > Cheers, > Mark. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/CLRCMGUC5LO3JOFGB2QG6DCJRTVVC3A4/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LI7BD63VNAUTNC4JQ7VU7UZLH7F4LEOC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Proliferation of tstate arguments.
Antoine Pitrou wrote: > In any case, the amount of disagreement and/or misunderstanding in this > discussion is a strong hint that it needs a PEP to hash things out and > explain them clearly, IMHO. Agreed; a PEP (even if it's just informational) would go a long way in helping to clear up some misunderstandings. Perhaps something along the lines of "Supporting concurrent subinterpreters in the C-API" that provides both a high-level overview and low-level information on some of the different structures involved and any needed changes [1], such as to PyThreadState, PyInterpreterState, PyGILState_STATE, etc. Optimally, it would explain some changes made and needed changes to their public and private APIs, as well as to the struct members. I'd be happy to help with something like this (particularly in reviewing and providing feedback on any unclear parts), but my own current understanding is likely not strong enough to lead the efforts. In fact, I very likely am in the camp of having some significant misunderstandings about the low-level implementation details. My experience with subinterpreters [2] is mostly limited to attempting to help with some tricky bugs. I'd call it an "area of interest", but it's very far from an "area of expertise". ;-) In the meantime for anyone looking for a basic overview on thread state and interpreter state, there's some general information documented in https://docs.python.org/3.9/c-api/init.html#high-level-api. --- [1] - By "needed changes", I am specifically referring to any changes to the C-API or internals that were made or are still needed to support concurrent subinterpreters. [2] - I have much more experience in the existing areas of concurrency in the standard library, such as with asyncio and concurrent.futures. Although it's relevant, it's very different in terms to implementation details. Also, I'm substantially more interested in the Python parts of the stdlib rather than the C internals or extension modules, but I certainly have an interest in learning more. Regards, Kyle Stanley On Wed, Mar 18, 2020 at 11:53 AM Antoine Pitrou wrote: > On Wed, 18 Mar 2020 13:35:16 + > Anders Munch wrote: > > Chris Angelico [mailto:ros...@gmail.com]: > > > And by that logic, globals are even more capable. I don't understand > your > > > point. Isn't the purpose of the tstate parameters to avoid the problem > of > > > being unable to have multiple tstates within the same OS thread? I > think I've > > > missed something here. > > > > The point is that because threads can't preempt themselves, this: > > > > /*1*/ > > { > > Py_something(other_tstate); > > } > > > > and this: > > > > /*2*/ > > { > >PyInterpreterState* old_tstate = tstate; > >tstate = other_state; > >Py_something(); > >tstate = old_tstate; > > } > > > > is effectively equivalent, provided tstate is thread-local. Both will > work > > equally well from the hypothetical C callback that wants to use a > different > > subinterpreter. > > This example is mixing up the notion of interpreter state and thread > state. > > Here is a more realistic example: > > struct UserData { > PyInterpreterState* interp; > }; > > // Callback given to a third-party C library > void my_c_callback(void* opaque) { > struct UserData* user_data = (struct UserData*) opaque; > PyGILState_STATE gstate; > gstate = PyInterpreter_GILStateEnsure(user_data->interp); > > // ... > > PyInterpreter_GILStateEnsure(user_data->interp, gstate); > } > > > Of course this implies the existence of a PyInterpreter_GILState API > that currently doesn't exist. In this model, the "thread-local" thread > state is a *per-interpreter* thread-local. A *process-wide* > thread-local thread state cannot work if we want -- in some distant > future -- to be able to run several subinterpreters concurrently. > > > In any case, the amount of disagreement and/or misunderstanding in this > discussion is a strong hint that it needs a PEP to hash things out and > explain them clearly, IMHO. > > Regards > > Antoine. > > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/EKZJYO75QBDKZA4A2RR43PKDYEJRD7HC/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/T4EC2LXRCGTIN222HNILLVKUSAOZ3BGA/ Code of Conduct: http://python.org/psf/codeofconduct/