On Thu, 28 Jan 2016 16:10:52 -0800 Kay Schenk wrote: > > On 01/14/2016 09:48 AM, Kay Schenk wrote: >> On Thu, Jan 14, 2016 at 4:04 AM, [email protected] >> <mailto:[email protected]> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hello, >> >> some may have noticed our linux-32 buildbot fails quite often. [1] >> Here an analysis: (tl;dr jump to solutions) >> * always fails in first buildbot step: svn updating >> * failed step takes around 6 minutes, a successfull step uses ~37 >> minutes to complete >> * the commands in the step take much time and often a timeout >> triggers >> >> The commands and their timeouts (seconds) are: >> 1) svn --version (1200) >> 2) rm -rf >> /home/buildslave20/slave20/openoffice-linux32-nightly/build (120) >> 3) chmod -Rf u+rwx >> /home/buildslave20/slave20/openoffice-linux32-nightly/build >> (120) ah, why? >> 4) rm -rf >> /home/buildslave20/slave20/openoffice-linux32-nightly/build >> (120) huh, again? >> 5) svn info --xml --non-interactive --no-auth-cache (1200) >> 6) svn update --non-interactive --no-auth-cache (1200) >> 7) cp -R -P -p -v >> /home/buildslave20/slave20/openoffice-linux32-nightly/source >> /home/buildslave20/slave20/openoffice-linux32-nightly/build (120) >> 8) svn info --xml (1200) >> >> Their results: >> 1) Always finishes in ~15 seconds >> 2) No output, almost always fails with command timed out: 120 >> seconds >> without output, attempting to kill >> 3) No output, almost always fails with command timed out: 120 >> seconds >> without output, attempting to kill >> 4) No output, finishes sometimes. >> *if we fail here the build process is stopped and this the >> reason for >> the often failures* >> 5) Local command completes in a sec. >> 6) Can take a while depending in source changes. Gives tons of >> output, >> so timeout never triggers. >> 7) Takes *very* long (over 20 minutes) but never triggers timeout as >> '-v' the output spams the log. >> 8) Local command again takes a sec. >> >> Conclusions: >> *file operations don't have enough time to finish* >> >> Solutions: >> Edit 'svn updating' buildstep >> a) Remove rm and chmod commands and replace cp with >> 'rsync -q -t -p -r --delete >> /home/buildslave20/slave20/openoffice-linux32-nightly/source >> /home/buildslave20/slave20/openoffice-linux32-nightly/build' >> This is much faster as very few copies needed and it's delete is >> faster than rm command. But increase the timeout anyway just in >> case. >> (*preferred* solution but needs rsync on the box) >> b) increase the timeouts and shut up cp by removing '-v' >> c) remove unversioned files when updating and build in this folder >> d) Make rm and chmod verbose by adding '-v' (or -c' for chmod). >> Spam the >> log even more, but the timeouts won't trigger. >> Who doesn't like 50MB logfiles? Yes, the log for this step of >> every >> succesfull build is over 50MB currently! Starting build #127 [1] >> (before >> this build there was only a build folder but no source >> Not a serious solution! >> >> *I suggest we fix this soon because the huge log files will blow >> up a >> server sooner or later.* >> >> Regards Jochen >> >> [1] https://ci.apache.org/builders/openoffice-linux32-nightly >> >> note: on linux64 buildbot the file operations are *much* faster. cp >> takes 90 secs isn't verbose but in the 120 sec timeout limit. >> >> >> Thanks for the suggestions, I will look into this. >> >> > I just wanted to give a short update on this. > > * our Linux-32 and linux-64 buildbots use the same mechanisms for an > svn pull -- a "copy" -- so I left the 32-bit instructions as is 'copy' instructions differ in one detail Linux-32: cp -R -P -p -v /home/buildslave20/slave20/openoffice-linux32-nightly/source /home/buildslave20/slave20/openoffice-linux32-nightly/build Linux-64: cp -R -P -p /home/buildslave19/slave19/openofficeorg-nightly/source /home/buildslave19/slave19/openofficeorg-nightly/build
*-v* needs to go to reduce the log siz but we have to increase timeout further before we do this or copy will always fail https://ci.apache.org/builders/openoffice-linux32-nightly/builds/162/steps/svn/logs/stdio : > cp -R -P -p -v > /home/buildslave20/slave20/openoffice-linux32-nightly/source > /home/buildslave20/slave20/openoffice-linux32-nightly/build in dir > /home/buildslave20/slave20/openoffice-linux32-nightly (timeout 120 secs) ... humongous log ... > elapsedTime=1370.929525 program finished with exit code 0 seems 1200 won't be enough, note that the timeout for cp was still 120 On Thu, 28 Jan 2016 16:10:52 -0800 Kay Schenk wrote: > * I recently updated the timeout for the svn pull for linux-32 to > 1200 secs. To me it looked like this was set to 120 though it IS > supposed to default to 1200, but... timeouts in 'svn update' of build #162 (Jan 29 02:05) haven't changed from older builds > > * there are some other extra steps -- some removes -- that seem to > be tacked onto the svn step that are outside of our config commands > that ARE timing out and seem to NOT be governed by the total timeout > for this step, yet they time out in successful builds also. well, removes get an other try after a chmod. so the first remove can timeout without consequence when both removes fail the build fails, but succeeds the next day because most files are removed already > * there are some buildbot setup instructions that differ for our > linux-64 and linux-32 builds. maybe our instructions don't reach the buildbots or aren't updated? > > Detailed in: > My INFRA ticket to track Linux-32 buildbot problems: > > https://issues.apache.org/jira/browse/INFRA-10997 > > So, still a mystery to me at this point. checking time frame for other tasks is a good idea the difference of the same cp on Linux-32 and Linux-64 looks too big Linux-32: elapsedTime=1370.929525 Linux-64: elapsedTime=117.262038
signature.asc
Description: OpenPGP digital signature
