Re: Failsafe: Killing self fork JVM. PING timeout elapsed.

Tibor Digana Fri, 22 Mar 2019 03:44:57 -0700

The base images are developed in https://github.com/carlossg/docker-maven,
right?
Who creates "/etc/mavenrc"?


On Thu, Mar 21, 2019 at 12:05 AM Jason Young <[email protected]>
wrote:

> Mikael, sorry I do not appear to have permission to view the link.
>
> I did some digging in the last couple of days. I see that the parent
> process reads from stdin. I could not find anywhere that we are using
> stdin. FWIW the failures nearly always happen at least 15m into a ~20m test
> run, so perf is a likely culprit.
>
> I see also that ForkedBooter reads commands from stdin in one thread, and
> uses an executor service to check for a past ping in
> ForkedBooter.listenToShutdownCommands(..). When it checks, it also sets
> pingDone to false. The executor is configured to run up to 2 threads
> concurrently to handle the workload, and is set to run at a fixed rate (not
> a fixed delay). If the test suite is busy with testing and GC and has lots
> of threads running, it's entirely possible that a thread won't have a
> chance to run for a long time (e.g. 5s). Maybe instead of a 30s delay, the
> VM gets around to checking for a ping every 35s over a long span of time.
> Because we're running at a "fixed rate" and not a "fixed delay", then after
> a couple of minutes we might be a full 30s behind schedule. It's possible
> the executor will create another thread to run the scheduled task because
> it's running behind schedule. This new thread checks for a ping, finds it,
> and sets pingDone to false. But then the original thread also runs, say, 2
> seconds afterwards, checks pingDone, and finds it is false.
>
> So to mitigate the problem, can we a) make the executor run only 1 thread
> and b) schedule the task at a fixed rate? For that matter, is there another
> scheduled executor we can reuse? I understand why checking for ping
> requires a separate executor. Should I ask in github?
>
> Regarding a previous question, I found out that Alpine's Maven package
> comes with an /etc/mavenrc that sets `MAVEN_OPTS="$MAVEN_OPTS -Xmx512m"`
> which cannot be undone by setting `MAVEN_OPTS` at the command line; you end
> up with e.g. `-Xmx1g -Xmx512m`. (Note this applies to the Maven (parent)
> process, not the surefire/failsafe (child) process.)
>
> On Wed, Mar 20, 2019 at 3:46 AM Bernd Eckenfels <[email protected]>
> wrote:
>
> > I guess a timeout caused by FullGC can happen with TCP as well.
> Increasing
> > the timeout might not be nice but does look like it would help in both
> > cases. (Problems with stdout are more related to unexpected JVM messages
> I
> > guess)
> >
> > Gruss
> > Bernd
> > --
> > http://bernd.eckenfels.net
> >
> > ________________________________
> > Von: Mikael Åsberg <[email protected]>
> > Gesendet: Mittwoch, März 20, 2019 9:40 AM
> > An: Maven Users List
> > Betreff: Re: Failsafe: Killing self fork JVM. PING timeout elapsed.
> >
> > These issues regarding communication with forked JVMs, won't they be
> > resolved once surefire moves to interprocess communication using
> > tcp/ip sockets? This happens to be the target feature to be included
> > in the next surefire 3.0.0 milestone:
> > https://issues.apache.org/jira/projects/SUREFIRE/versions/12344668
> >
> > There are soooo many issues relating to surefire reading stdout of
> > forked processes (which is my understanding that it is currently
> > doing). Many of us are really looking forward to the next milestone.
> >
> > On Tue, Mar 19, 2019 at 8:59 PM Jason Young <[email protected]>
> > wrote:
> > >
> > > Getting back to my original questions, I know that "ping" means to see
> > if a
> > > process is there, and "NOOP" implies it's not a command to do anything.
> > But
> > > what do the terms "ping" and "NOOP" mean in this context, i.e. how do
> the
> > > processes communicate? I assume they don't sonar. Do other processes
> also
> > > ping NOOPs? Can I PING Chrome with a NOOP from bash? Is it with TCP?
> > >
> > > I'm confused about what I should do regarding GC pauses. Previously I
> had
> > > code that would write the amount of remaining heap space (or something
> > like
> > > that) to stdout after every test to troubleshoot OOMEs. Can writing to
> > > stdout cause the communication failure somehow?
> > >
> > > On Wed, Mar 13, 2019 at 5:57 PM Jason Young <
> [email protected]>
> > > wrote:
> > >
> > > > I upgraded failsafe and surefire to 3.0.0-M3 as advised; we
> encountered
> > > > the same exception. (Still using -Xmx5g, will switch to OpenJ9 soon
> in
> > case
> > > > that helps.)
> > > >
> > > > BTW I also asked on StackOverflow previously, for anyone interested:
> > > >
> >
> https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed
> > > >
> > > > On Tue, Feb 26, 2019 at 6:40 PM Jason Young <
> > [email protected]>
> > > > wrote:
> > > >
> > > >> Thanks again for the information.
> > > >>
> > > >> We had increased the RAM to 3g some time ago to prevent OOMEs. More
> > > >> recently, I increased the RAM again to 5g for extra headroom since
> we
> > had
> > > >> more headroom available; the problem hasn't happened since, but it
> > hasn't
> > > >> been very long.
> > > >>
> > > >> We use a more customized image based on Alpine 3.8.2. The JDK and
> > Maven
> > > >> are obtained via apk.
> > > >>
> > > >> I will try upgrading failsafe (and surefire while I'm at it) sooner,
> > and
> > > >> probably do some experimentation with JVMs another time (not
> pressing
> > for
> > > >> me ATM).
> > > >>
> > > >> On Tue, Feb 26, 2019 at 12:20 PM Tibor Digana <
> [email protected]
> > >
> > > >> wrote:
> > > >>
> > > >>> >> I'll try to enable some logging about GC pauses to see what's up
> > > >>>
> > > >>> Pls do not keep such setting after tuning the GC because this may
> > > >>> sometime
> > > >>> break the interprocess communication between Maven process and
> > surefire
> > > >>> process.
> > > >>> It's worth to list GC information in a file and not in the console
> > logs.
> > > >>> This can be configured, I guess.
> > > >>>
> > > >>> >> Do you think the value is simply too low?
> > > >>>
> > > >>> GCing many objects may take some time and I remember we had a user
> > who
> > > >>> had
> > > >>> this problem a year or two ago.
> > > >>> We check every third NOOP (which is 3 x 10 sec) as a fix instead of
> > every
> > > >>> NOP. So 30 seconds looked satisfactory.
> > > >>> I think you use old version 2.20 or something like that. The fixes
> > for
> > > >>> docker have been done so far, so please use the latest version
> > 3.0.0-M3.
> > > >>> See this page
> > > >>>
> https://maven.apache.org/surefire/maven-surefire-plugin/docker.html,
> > we
> > > >>> used maven:3.5.3-jdk-8-alpine in this test. Which base image did
> you
> > use?
> > > >>>
> > > >>> Cheers
> > > >>> Tibor
> > > >>>
> > > >>> On Tue, Feb 26, 2019 at 5:24 PM Jason Young <
> > [email protected]>
> > > >>> wrote:
> > > >>>
> > > >>> > Thanks for the information. It's good to see someone understands
> a
> > > >>> little
> > > >>> > about this.
> > > >>> >
> > > >>> > Incidentally, we have been looking at other GCs and VMs for the
> > > >>> application
> > > >>> > in production environments, so I'll look into how these affect
> > tests as
> > > >>> > well. I'll try to enable some logging about GC pauses to see
> > what's up.
> > > >>> >
> > > >>> > How would `-Xmx3g` cause long GC cycles? Do you think the value
> is
> > > >>> simply
> > > >>> > too low?
> > > >>> >
> > > >>> > FWIW we're running the Maven build in an Alpine-based Docker
> > container.
> > > >>> >
> > > >>> > On Sat, Feb 23, 2019 at 6:36 AM Tibor Digana <
> > [email protected]>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Hi Jason,
> > > >>> > >
> > > >>> > > We spoke about this issue on our chat in ASF Slack:
> > > >>> > > "I think his tests have been paused for a long GC periods and
> > timed
> > > >>> out
> > > >>> > 3x
> > > >>> > > PING period = 30 seconds. After this period forked JVM supposed
> > the
> > > >>> Maven
> > > >>> > > process was killed by JenkinsCI and therefore all surefire
> > processes
> > > >>> are
> > > >>> > > killed as well and all the file handlers and memory
> consumptions
> > are
> > > >>> > > freed."
> > > >>> > >
> > > >>> > > "But I have to say that `-Xmx3g` may cause long GC cycles, see
> > > >>> > >
> > > >>> > >
> > > >>> >
> > > >>>
> >
> https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html
> > > >>> > > "
> > > >>> > >
> > > >>> > > You are using java-1.8-openjdk. I guess you should use
> > Shenandoah GC
> > > >>> > which
> > > >>> > > is an experimental algorithm in JVM 1.8. This would
> significantly
> > > >>> short
> > > >>> > > the GC cycles.
> > > >>> > >
> > > >>> > > We should of cource provide a new configuration parameter to
> give
> > > >>> you a
> > > >>> > > chance to prolong the PING.
> > > >>> > >
> > > >>> > > Cheers
> > > >>> > > Tibor
> > > >>> > >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> >
> > > >>> > Jason Young
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > >
> > > --
> > > Jason Young
> > > Software Engineer | PROCENTIVE
> > > [image: Phone] 715 245 8000 x7609
> > > [image: Mobile] 706 870 3540
> > > [image: Web] procentive.com
> > > Confidentiality Notice: This message is intended for the sole use of
> the
> > > individual and entity to which it is addressed, and may contain
> > information
> > > that is privileged, confidential and exempt from disclosure under
> > > applicable law. Any unauthorized review, use, disclosure or
> distribution
> > of
> > > this email message, including any attachment, is prohibited. If you are
> > not
> > > the intended recipient, please advise the sender by reply email and
> > destroy
> > > all copies of the original message.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Re: Failsafe: Killing self fork JVM. PING timeout elapsed.

Reply via email to