Brad King wrote: > Markus Duft wrote: >> cmakes implementation of how child processes are handled doesn't work >> reliably on multicore interix. it seems that every other SIGCHLD is lost > > Is this a known problem on that platform, independent of CMake?
it is independant of cmake, yes. it is not widely known, as (i guess) i'm one of the very few people really _using_ this platform for something productive (cross compiling to native win32 - hahaha - i know - don't tell me that cmake supports win32, we have a huge bunch of auto* based stuff that needs a POSIX env...). doing the bits to make cmake cross compile from interix to win32 using parity (parity.sf.net) is next on my agenda... > > The ProcessUNIX.c implementation is for POSIX platforms, which clearly > define SIGCHLD semantics. yeah - interix is (supposed to be) POSIX compliant, and hey - it works _most of the time_. what's the cause of my headaches is the few times it doesn't ... and all this (both of my problems) is only on multi-core machines. i am in the process of reporting those issues currently, but M$ support is something soo .... you know the deal ;) > >> somewhere on the way. i (printf-)debugged cmake a little during >> bootstrap, and it seems that at random points in time, SIGCHLD is lost, > > Can you print out the state of signal masks? how can i do that? i'm not really into that topic that much :) but i'll read some man pages to figure it out. > >> and cmake locks up in a select() call on the signal pipe (SIGCHLD is >> lost, so nobody will write on the signal pipe). > > The "signal pipe" approach is a standard way to implement race-free > handling of SIGCHLD while blocking in select(). > >> i thought of introducing some lame timeout when select()ing the signal >> pipe, then checking whether the process is still alive (wait()), and >> again selecting if it is. what do you think? > > If select() is broken (your second problem) then there is no point > in pursuing this code path further. Instead modify the polling > code path to use a non-blocking waitpid() instead of looking at > the signal pipe. it seems that i'm not hit by the select problem, as there is already a "select has lied" path somewhere in that code path that catches exactly my select() problem. but yes, maybe it would be easier to implement the waitpid() stuff in the non-blocking code path. i'll have a look at that. > >> the second problem i have is regarding a broken select(). i tried to >> work around it by setting KWSYSPE_USE_SELECT, which initially didn't >> work, because the code seems b0rked. it seems that there is a wrong >> timeout check in that code path. > > IIRC that path was contributed for BeOS support which AFAIK is not > really tested anymore. However, it looks correct at a quick glance. > >> first kwsysProcessGetTimeoutLeft is >> called, like in the select() code path, but directly after that, the >> timeoutLength members are checkd seperately once more. > > The call to GetTimeoutLeft fills the members of timeoutLength. > It also returns whether or not the timeout has already expired. > The caller is supposed to use timeoutLength after the call. > >> with this check it seems that all sub-processes "time out" immediately. > > At process start time we store an absolute TimeoutTime using the > starting wall clock time plus the process timeout length. Later > the GetTimeoutLeft subtracts the current time from the TimeoutTime. > Print out the starting time, the computed TimeoutTime, and the > timeoutLength that gets computed for each poll. i'll have a look at that one too. thanks for all the work :) Cheers, Markus > > -Brad _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://www.cmake.org/mailman/listinfo/cmake