Ping^2. On Tue, Oct 28, 2014 at 6:17 PM, Nathaniel Smith <n...@pobox.com> wrote: > Ping. > > On 19 Oct 2014 23:44, "Nathaniel Smith" <n...@pobox.com> wrote: >> >> Hi Jakub, >> >> Thanks for your feedback! See below. >> >> On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek <ja...@redhat.com> wrote: >> > On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote: >> >> Got total silence the last 4 times I posted this, and users have been >> >> bugging me about it offline, so trying again. >> >> >> >> This patch fixes a showstopper problem preventing the transparent use >> >> of OpenMP in scientific libraries, esp. with Python. Specifically, it >> >> is currently not possible to use GNU OpenMP -- even in a limited, >> >> temporary manner -- in any program that uses (or might use) fork() for >> >> parallelism, even if the fork() and the use of OpenMP occur at totally >> >> different times. This limitation is unique to GNU OpenMP -- every >> >> competing OpenMP implementation already contains something like this >> >> patch. While technically not fully POSIX-compliant (because POSIX >> >> gives much much weaker guarantees around fork() than any real Unix), >> >> the approach used in this patch (a) performs only POSIX-compliant >> >> operations when the host program is itself fully POSIX-compliant, and >> >> (b) actually works perfectly reliably in practice on all commonly used >> >> platforms I'm aware of. >> > >> > 1) gomp_we_are_forked in your patch will attempt to free the pool >> > of the thread that encounters it, which is racy; consider a program >> > after fork calling pthread_create several times, each thread >> > thusly created then ~ at the same time doing #pragma omp parallel >> > and the initial thread too. You really should clean up the pool >> > data structure only in the initial thread and nowhere else; >> > for native TLS (non-emulated, IE model) the best would be to have a >> > flag >> > in the gomp_thread_pool structure, >> > struct gomp_thread *thr = gomp_thread (); >> > if (thr && thr->thread_pool) >> > thr->thread_pool->after_fork = true; >> > should in that case be safe in the atfork child handler. >> > For !HAVE_TLS or emulated TLS not sure if it is completely safe, >> > it would call pthread_getspecific. Perhaps just don't register >> > atfork handler on those targets at all? >> >> Good point. The updated patch below takes a slightly different >> approach. I moved we_are_forked to the per-thread struct, and then I >> moved the setting of it into the *parent* process's fork handlers -- >> the before-fork handler toggles it to true, then the child spawns off >> and inherits this setting, and then the parent after-fork handler >> toggles it back again. (Since it's per-thread, there's no race >> condition here.) This lets us remove the child after-fork handler >> entirely, and -- since the parent handlers aren't subject to any >> restrictions on what they can call -- it works on all platforms >> regardless of the TLS implementation. >> >> > 2) can you explain why are you removing the cleanups from >> > gomp_free_pool_helper ? >> >> They aren't removed, but rather moved from the helper function (which >> runs in the helper threads) into gomp_free_thread_pool (which runs in >> the main thread) -- which makes it easier to run the appropriate >> cleanups even in the case where the helper threads aren't running. >> (But see below -- we might prefer to drop this part of the patch >> entirely.) >> >> > 3) you can call pthread_atfork many times (once for each pthread >> > that creates a thread pool), that is undesirable, you want to do that >> > only if the initial thread creates thread pool >> >> Good point. I've moved the pthread_atfork call to initialize_team, >> which is an __attribute__((constructor)). >> >> I am a little uncertain whether this is the best approach, though, >> because of the comment in team_destructor about wanting to correctly >> handle dlopen/dlclose. One of pthread_atfork's many (many) limitations >> is that there's no way to unregister handlers, so if dlopen/dlclose is >> important (is it?) then we can't call pthread_atfork from >> initialize_team. >> >> If this is a problem, then we could delay the pthread_atfork until >> e.g. the first thread pool is spawned -- would this be preferred? >> >> > 4) the testcase is clearly not portable enough, should be probably >> > limited >> > to *-*-linux* only, fork etc. will likely not work on many targets. >> >> I think it should work on pretty much any target that has fork(); we >> definitely care about having this functionality on e.g. OS X. I've >> added some genericish target specifications. >> >> > In any case, even with the patch, are you aware that you'll leak >> > megabytes >> > of thread stacks etc.? >> >> Well, err, I wasn't, no :-). Thanks for pointing that out. >> >> To me this does clinch the argument that a better approach would be >> the one I suggested in >> https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html >> i.e., of tracking whether any threadprivate variables were present, >> and if not then simply shutting down the thread pools before forking. >> But this would be a much more invasive change to gomp (I wouldn't know >> where to start). >> >> In the mean time, the current patch is still worthwhile. The cost is >> not that bad: I wouldn't think of it as "leaking" so much as "overhead >> of supporting OMP->fork->OMP". No-one forks a child which forks a >> child which forks a child etc., so the cost is pretty much bounded in >> practice. The most common use case is probably using fork() to spawn a >> set of worker processes, which will end up COW-sharing the thread >> stacks (which will just end up resting peacefully in swap). And when >> doing computational work where working set sizes are often in the >> gigabytes, spending a few megabytes is small change -- esp. compared >> to the current cost, which involves first wasting hours of programmer >> time figuring out why things are just locking up, and then (in many >> cases) having to rewrite the code entirely because there's no fix for >> this, you just have to redesign you parallelization architecture to >> either avoid OMP to avoid fork(). >> >> However, the thread stack issue does make me wonder if it's worth >> spending so much effort on cleaning up a few semaphores and mutexes. >> So I split the patch into two parts. The first enables the basic >> functionality and passes the test, but it doesn't even try to clean up >> the thread pool -- it just forgets that it existed and moves on. The >> second patch goes on top of the first, and adds the best-effort >> cleanup of synchronization objects and easily free-able heap. So patch >> #1 alone will do the job, and patch #2 is optional -- applying means >> we leak a bit yes, but does increase the chance of portability >> gremlins cropping up. >> >> Changelog for patch #1: >> >> 2014-10-19 Nathaniel J. Smith <n...@pobox.com> >> >> * libgomp.h (struct gomp_thread): New member we_are_forked. >> * team.c (gomp_thread_start): Add we_are_forked to gomp_thread >> initialization. >> (gomp_before_fork_callback) >> (gomp_after_fork_parent_callback): New functions. >> (initialize_team): Register atfork handlers. >> (gomp_team_start): Check for fork on entry, and clear thread state >> if found. >> * testsuite/libgomp.c/fork-1.c: New test. >> >> Changelog for patch #2: >> >> 2014-10-19 Nathaniel J. Smith <n...@pobox.com> >> >> * team.c (gomp_free_pool_helper): Move per-thread cleanup to main >> thread. >> (gomp_free_thread): Delegate implementation to... >> (gomp_free_thread_pool): ...this new function. Like old >> gomp_free_thread, but does per-thread cleanup, and has option to >> skip everything that involves interacting with actual threads, >> which is useful when called after fork. >> (gomp_team_start): Call gomp_free_thread_pool to release (some) >> resources after fork. >> >> -n >> >> -- >> Nathaniel J. Smith >> Postdoctoral researcher - Informatics - University of Edinburgh >> http://vorpus.org
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org