On Thu, 28 Jan 2016 16:10:52 -0800 Kay Schenk wrote:
>
> On 01/14/2016 09:48 AM, Kay Schenk wrote:
>> On Thu, Jan 14, 2016 at 4:04 AM, [email protected]
>> <mailto:[email protected]> <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     Hello,
>>
>>     some may have noticed our linux-32 buildbot fails quite often. [1]
>>     Here an analysis: (tl;dr jump to solutions)
>>     * always fails in first buildbot step: svn updating
>>     * failed step takes around 6 minutes, a successfull step uses ~37
>>     minutes to complete
>>     * the commands in the step take much time and often a timeout
>>     triggers
>>
>>     The commands and their timeouts (seconds) are:
>>     1) svn --version (1200)
>>     2) rm -rf
>>     /home/buildslave20/slave20/openoffice-linux32-nightly/build (120)
>>     3) chmod -Rf u+rwx
>>     /home/buildslave20/slave20/openoffice-linux32-nightly/build
>>     (120) ah, why?
>>     4) rm -rf
>>     /home/buildslave20/slave20/openoffice-linux32-nightly/build
>>     (120) huh, again?
>>     5) svn info --xml --non-interactive --no-auth-cache (1200)
>>     6) svn update --non-interactive --no-auth-cache (1200)
>>     7) cp -R -P -p -v
>>     /home/buildslave20/slave20/openoffice-linux32-nightly/source
>>     /home/buildslave20/slave20/openoffice-linux32-nightly/build (120)
>>     8) svn info --xml (1200)
>>
>>     Their results:
>>     1) Always finishes in ~15 seconds
>>     2) No output, almost always fails with command timed out: 120
>>     seconds
>>     without output, attempting to kill
>>     3) No output, almost always fails with command timed out: 120
>>     seconds
>>     without output, attempting to kill
>>     4) No output, finishes sometimes.
>>     *if we fail here the build process is stopped and this the
>>     reason for
>>     the often failures*
>>     5) Local command completes in a sec.
>>     6) Can take a while depending in source changes. Gives tons of
>>     output,
>>     so timeout never triggers.
>>     7) Takes *very* long (over 20 minutes) but never triggers timeout as
>>     '-v' the output spams the log.
>>     8) Local command again takes a sec.
>>
>>     Conclusions:
>>     *file operations don't have enough time to finish*
>>
>>     Solutions:
>>     Edit 'svn updating' buildstep
>>     a) Remove rm and chmod commands and replace cp with
>>     'rsync -q -t -p -r --delete
>>     /home/buildslave20/slave20/openoffice-linux32-nightly/source
>>     /home/buildslave20/slave20/openoffice-linux32-nightly/build'
>>       This is much faster as very few copies needed and it's delete is
>>     faster than rm command. But increase the timeout anyway just in
>>     case.
>>     (*preferred* solution but needs rsync on the box)
>>     b) increase the timeouts and shut up cp by removing '-v'
>>     c) remove unversioned files when updating and build in this folder
>>     d) Make rm and chmod verbose by adding '-v' (or -c' for chmod).
>>     Spam the
>>     log even more, but the timeouts won't trigger.
>>       Who doesn't like 50MB logfiles? Yes, the log for this step of
>>     every
>>     succesfull build is over 50MB currently! Starting build #127 [1]
>>     (before
>>     this build there was only a build folder but no source
>>       Not a serious solution!
>>
>>     *I suggest we fix this soon because the huge log files will blow
>>     up a
>>     server sooner or later.*
>>
>>     Regards Jochen
>>
>>     [1] https://ci.apache.org/builders/openoffice-linux32-nightly
>>
>>     note: on linux64 buildbot the file operations are *much* faster. cp
>>     takes 90 secs isn't verbose but in the 120 sec timeout limit.
>>
>>
>> ​Thanks for the suggestions, I will look into this. ​
>>
>>
> I just wanted to  give a short update on this.
>
> * our Linux-32 and linux-64 buildbots use the same mechanisms for an
> svn pull -- a "copy" -- so I left the 32-bit instructions as is
'copy' instructions differ in one detail
Linux-32: cp -R -P -p -v
/home/buildslave20/slave20/openoffice-linux32-nightly/source
/home/buildslave20/slave20/openoffice-linux32-nightly/build
Linux-64: cp -R -P -p
/home/buildslave19/slave19/openofficeorg-nightly/source
/home/buildslave19/slave19/openofficeorg-nightly/build

*-v* needs to go to reduce the log siz
but we have to increase timeout further before we do this or copy will
always fail

https://ci.apache.org/builders/openoffice-linux32-nightly/builds/162/steps/svn/logs/stdio
:
> cp -R -P -p -v
> /home/buildslave20/slave20/openoffice-linux32-nightly/source
> /home/buildslave20/slave20/openoffice-linux32-nightly/build in dir
> /home/buildslave20/slave20/openoffice-linux32-nightly (timeout 120 secs)
... humongous log ...
> elapsedTime=1370.929525 program finished with exit code 0
seems 1200 won't be enough, note that the timeout for cp was still 120

On Thu, 28 Jan 2016 16:10:52 -0800 Kay Schenk wrote:
> * I recently updated the timeout for the svn pull for linux-32 to
> 1200 secs. To me it looked like this was set to 120 though it IS
> supposed to default to 1200, but...
timeouts in 'svn update' of build #162 (Jan 29 02:05) haven't changed
from older builds
>
> * there are some other extra steps -- some removes -- that seem to
> be tacked onto the svn step that are outside of our config commands
> that ARE timing out and seem to NOT be governed by the total timeout
> for this step, yet they time out in successful builds also.
well, removes get an other try after a chmod.
so the first remove can timeout without consequence

when both removes fail the build fails, but succeeds the next day
because most files are removed already
> * there are some buildbot setup instructions that differ for our
> linux-64 and linux-32 builds.
maybe our instructions don't reach the buildbots or aren't updated?
>
> Detailed in:
> My INFRA ticket to track Linux-32 buildbot problems:
>
> https://issues.apache.org/jira/browse/INFRA-10997
>
> So, still a mystery to me at this point.
checking time frame for other tasks is a good idea
the difference of the same cp on Linux-32 and Linux-64 looks too big
Linux-32: elapsedTime=1370.929525
Linux-64: elapsedTime=117.262038

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to