I've seen this kind of problem before in other programs, but usually only on NFS-mounted filesystems. Generally on local UFS partitions the system calls are atomic. It would be simpler if we could use sigaction() and set the SA_RESTART flag for these signals, but the Solaris man pages don't mention stat() as being one of the restartable system calls. (But I'd bet that it is...)
-- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc Symas: Premier OpenSource Development and Support > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of > Kevin Nomura > Sent: Monday, November 19, 2001 1:07 PM > To: [EMAIL PROTECTED] > Subject: make-3.79 on solaris8 broken > > > Using make-3.79 under solaris 6 and solaris 8, I have been seeing > two intermittent problems. It seems to get worse with higher values > of -j. One is "No rule to make target xxx" when there is, in fact, > a rule to make target xxx. As befits an intermittent problem, the > make succeeds if rerun with no changes. > > The second problem is more insidious: make *quietly* fails to rebuild > some of its targets that are out of date. The symptom is link errors > with unsat symbols owing to the incomplete build. Again, rerunning > make picks these up and succeeds. Since this is a chronic problem for > us I spent this past weekend debugging it with make -d and have some > theories to offer. > > The first problem seems due to the stat() in remake.c not being protected > by a retry loop for EINTR. stat() on solaris is documented as failing > with EINTR. So, I fixed this, actually implementing the "safe_stat()" > function that has a prototype in make.h but no definition (!?). This > cleared up the "No rule" errors but not the unsat link problems. > > For the second problem with failed links, the -d trace surrounding one of > the files that should have been remade (but was not) looked like: > > Considering target file `../netcache/server/obj/td/wccp2.o'. > Looking for an implicit rule for > `../netcache/server/obj/td/wccp2.o'. > Trying pattern rule with stem `wccp2'. > Trying implicit prerequisite `../netcache/server/obj/td/wccp2.r'. > Got a SIGCHLD; 1 unreaped children. > Got a SIGCHLD; 2 unreaped children. > Trying pattern rule with stem `wccp2'. > Trying implicit prerequisite `../netcache/server/obj/td/wccp2.f'. > Trying pattern rule with stem `wccp2'. > Trying implicit prerequisite `../netcache/server/wccp2.c'. > Got a SIGCHLD; 3 unreaped children. > Trying pattern rule with stem `wccp2'. > Trying implicit prerequisite `../netcache/server/wccp2.cpp'. > Trying pattern rule with stem `wccp2'. > Trying implicit prerequisite `../netcache/server/wccp2.c'. > Trying pattern rule with stem `wccp2'. > Trying implicit prerequisite `../netcache/server/wccp2.c'. > Trying pattern rule with stem `wccp2'. > Trying implicit prerequisite `../netcache/server/obj/td/wccp2.c'. > Trying pattern rule with stem `wccp2'. > Trying implicit prerequisite > `../netcache/server/obj/td/wccp2.cc'. > Trying pattern rule with stem `wccp2'. > ... > No implicit rule found for `../netcache/server/obj/td/wccp2.o'. > ... > No commands for `../netcache/server/obj/td/wccp2.o' and > no prerequisites > actually changed. > No need to remake target `../netcache/server/obj/td/wccp2.o'. > > Seeing that a signal happened right about the time it was checking > the prerequisite `../netcache/server/wccp2.c' (the source file, which > does exist), I zeroed in on the readdir() in > dir.c:dir_contents_file_exists_p(). > Now, readdir() is not documented in solaris 6 or solaris 8 to > fail on EINTR. > But I put in a retry loop anyway and CAUGHT readdir failing on > EINTR, dozens > of times in the build in fact. So with stat() and readdir() (and > opendir() > and some others for good measure) guarded by retry loops, the > problems have > now subsided. > > So assuming these are in fact the causes of the problems I saw, I am > wondering whether solaris is in error for returning EINTR (e.g. is this > broken with respect to POSIX or some standard that Solaris claims > adherence to)? Should either or both of these be solved within make, > at least as a practical issue? > > Kevin Nomura > Network Appliance > > _______________________________________________ > Bug-make mailing list > [EMAIL PROTECTED] > http://mail.gnu.org/mailman/listinfo/bug-make > _______________________________________________ Bug-make mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-make