[Python-Dev] Re: Pass the Python thread state to internal C functions

2019-11-14 Thread Victor Stinner
Le jeu. 14 nov. 2019 à 04:55, Larry Hastings  a écrit :
> I'm pretty sure you understand the sentence "Pulling it out of TLS was too 
> slow".  At the time CPython used the POSIX APIs for accessing thread local 
> storage, and I didn't know about and therefore did not try this "__thread" 
> GCC extension.  I do remember trying some other API that was purported to be 
> faster--maybe a GCC library function for faster TLS access?--but I didn't get 
> that to work either before I gave up on it out of frustration.

I asked for confirmation, since I was surprised. But when I looked at
assembly with my friend, we played with __thread not with
pthread_getspecific().

So thanks for confirming that "getting tstate" can be a performance
bottleneck: that's a very good reason to pass it explicitly.

> I also took the opportunity to pass my "reference count manager" data as a 
> separate parameter, which again was per-thread and again was a major win at 
> the time.

Another approach would be to pass a "PyContext*" pointer which
contains tstate, but also additional fields. But I chose to state with
a direct "PyThreadState* tstate" to avoid one indirection to every
tstate access. Currently, tstate seems to be enough for the current
code base.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SHPPBIERUHCAH5UFW6WAVOQ2Z2NEKAH3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Pass the Python thread state to internal C functions

2019-11-14 Thread Antoine Pitrou
On Wed, 13 Nov 2019 14:52:32 +0100
Victor Stinner  wrote:
> 
> #define _PyRuntimeState_GetThreadState(runtime) \
> 
> ((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current))
> #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime)
> 
> _PyThreadState_GET() uses "_Py_atomic_load_relaxed". I'm not used to
> C99 atomic conventions. The "memory_order_relaxed" documentation says:
> 
> "Relaxed operation: there are no synchronization or ordering
> constraints imposed on other reads or writes, only this operation's
> atomicity is guaranteed (see Relaxed ordering below)"
> 
> Note: I'm not even sure why Python currently uses an atomic operation.

Is it protected by a lock?  If not, you need to use an atomic.
Since it's theoretically possible to read the current thread state
without the GIL held (though not very useful), then an atomic is
required.

Regards

Antoine.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7VL3QKACQLDL3QCWKCTUHUCIERFNE6R7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Pass the Python thread state to internal C functions

2019-11-14 Thread Random832
On Thu, Nov 14, 2019, at 07:43, Antoine Pitrou wrote:
> On Wed, 13 Nov 2019 14:52:32 +0100
> Victor Stinner  wrote:
> > 
> > #define _PyRuntimeState_GetThreadState(runtime) \
> > 
> > ((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current))
> > #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime)
> > 
> > _PyThreadState_GET() uses "_Py_atomic_load_relaxed". I'm not used to
> > C99 atomic conventions. The "memory_order_relaxed" documentation says:
> > 
> > "Relaxed operation: there are no synchronization or ordering
> > constraints imposed on other reads or writes, only this operation's
> > atomicity is guaranteed (see Relaxed ordering below)"
> > 
> > Note: I'm not even sure why Python currently uses an atomic operation.
> 
> Is it protected by a lock?  If not, you need to use an atomic.
> Since it's theoretically possible to read the current thread state
> without the GIL held (though not very useful), then an atomic is
> required.

It sounds like you are saying PyRuntimeState_GetThreadState has two duties, 
then: "get this thread's thread state" (from the GIL holder - how do other 
threads get their own thread state), and "get the GIL-holding thread's thread 
state (from non-GIL holder thread).

The former shouldn't need atomic/overhead locking (unless the thread state can 
be written from other threads), even if the latter does.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4BNRSO47Z54MRR3ZS32W6DXYRVZ7U53W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Pass the Python thread state to internal C functions

2019-11-14 Thread Steve Dower

On 13Nov2019 1954, Larry Hastings wrote:


On 11/13/19 5:52 AM, Victor Stinner wrote:

Le mer. 13 nov. 2019 à 14:28, Larry Hastings  a écrit :

I did exactly that in the Gilectomy prototype.  Pulling it out of TLS was too 
slow,

What do you mean? Getting tstate from a TLS was a performance
bottleneck by itself? Reading a TLS variable seems to be quite
efficient.


I'm pretty sure you understand the sentence "Pulling it out of TLS was 
too slow".  At the time CPython used the POSIX APIs for accessing thread 
local storage, and I didn't know about and therefore did not try this 
"__thread" GCC extension.  I do remember trying some other API that was 
purported to be faster--maybe a GCC library function for faster TLS 
access?--but I didn't get that to work either before I gave up on it out 
of frustration.


Also, I dimly recall that I moved several things from globals into the 
ThreadState structure, and probably added one or two of my own.  So 
nearly every function call was referencing ThreadState at one point or 
another.  Passing it as a parameter was a definite win over calling the 
POSIX TLS APIs.


Passing it as a parameter is also a huge win for embedders, as it gets 
very complicated to merge locking/threading models when the host 
application has its own requirements.


Overall, I'm very supportive of passing context through parameters 
rather than implicitly through TLS.


(Though we've got a long way to go before it'll be possible for 
embedders to not be held hostage by CPython's threading model... one 
step at a time! :) )


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TLMDK7JZQIUWQUUKFHOPNEFQCJKFL5JM/
Code of Conduct: http://python.org/psf/codeofconduct/