The base images are developed in https://github.com/carlossg/docker-maven, right? Who creates "/etc/mavenrc"?
On Thu, Mar 21, 2019 at 12:05 AM Jason Young <[email protected]> wrote: > Mikael, sorry I do not appear to have permission to view the link. > > I did some digging in the last couple of days. I see that the parent > process reads from stdin. I could not find anywhere that we are using > stdin. FWIW the failures nearly always happen at least 15m into a ~20m test > run, so perf is a likely culprit. > > I see also that ForkedBooter reads commands from stdin in one thread, and > uses an executor service to check for a past ping in > ForkedBooter.listenToShutdownCommands(..). When it checks, it also sets > pingDone to false. The executor is configured to run up to 2 threads > concurrently to handle the workload, and is set to run at a fixed rate (not > a fixed delay). If the test suite is busy with testing and GC and has lots > of threads running, it's entirely possible that a thread won't have a > chance to run for a long time (e.g. 5s). Maybe instead of a 30s delay, the > VM gets around to checking for a ping every 35s over a long span of time. > Because we're running at a "fixed rate" and not a "fixed delay", then after > a couple of minutes we might be a full 30s behind schedule. It's possible > the executor will create another thread to run the scheduled task because > it's running behind schedule. This new thread checks for a ping, finds it, > and sets pingDone to false. But then the original thread also runs, say, 2 > seconds afterwards, checks pingDone, and finds it is false. > > So to mitigate the problem, can we a) make the executor run only 1 thread > and b) schedule the task at a fixed rate? For that matter, is there another > scheduled executor we can reuse? I understand why checking for ping > requires a separate executor. Should I ask in github? > > Regarding a previous question, I found out that Alpine's Maven package > comes with an /etc/mavenrc that sets `MAVEN_OPTS="$MAVEN_OPTS -Xmx512m"` > which cannot be undone by setting `MAVEN_OPTS` at the command line; you end > up with e.g. `-Xmx1g -Xmx512m`. (Note this applies to the Maven (parent) > process, not the surefire/failsafe (child) process.) > > On Wed, Mar 20, 2019 at 3:46 AM Bernd Eckenfels <[email protected]> > wrote: > > > I guess a timeout caused by FullGC can happen with TCP as well. > Increasing > > the timeout might not be nice but does look like it would help in both > > cases. (Problems with stdout are more related to unexpected JVM messages > I > > guess) > > > > Gruss > > Bernd > > -- > > http://bernd.eckenfels.net > > > > ________________________________ > > Von: Mikael Åsberg <[email protected]> > > Gesendet: Mittwoch, März 20, 2019 9:40 AM > > An: Maven Users List > > Betreff: Re: Failsafe: Killing self fork JVM. PING timeout elapsed. > > > > These issues regarding communication with forked JVMs, won't they be > > resolved once surefire moves to interprocess communication using > > tcp/ip sockets? This happens to be the target feature to be included > > in the next surefire 3.0.0 milestone: > > https://issues.apache.org/jira/projects/SUREFIRE/versions/12344668 > > > > There are soooo many issues relating to surefire reading stdout of > > forked processes (which is my understanding that it is currently > > doing). Many of us are really looking forward to the next milestone. > > > > On Tue, Mar 19, 2019 at 8:59 PM Jason Young <[email protected]> > > wrote: > > > > > > Getting back to my original questions, I know that "ping" means to see > > if a > > > process is there, and "NOOP" implies it's not a command to do anything. > > But > > > what do the terms "ping" and "NOOP" mean in this context, i.e. how do > the > > > processes communicate? I assume they don't sonar. Do other processes > also > > > ping NOOPs? Can I PING Chrome with a NOOP from bash? Is it with TCP? > > > > > > I'm confused about what I should do regarding GC pauses. Previously I > had > > > code that would write the amount of remaining heap space (or something > > like > > > that) to stdout after every test to troubleshoot OOMEs. Can writing to > > > stdout cause the communication failure somehow? > > > > > > On Wed, Mar 13, 2019 at 5:57 PM Jason Young < > [email protected]> > > > wrote: > > > > > > > I upgraded failsafe and surefire to 3.0.0-M3 as advised; we > encountered > > > > the same exception. (Still using -Xmx5g, will switch to OpenJ9 soon > in > > case > > > > that helps.) > > > > > > > > BTW I also asked on StackOverflow previously, for anyone interested: > > > > > > > https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed > > > > > > > > On Tue, Feb 26, 2019 at 6:40 PM Jason Young < > > [email protected]> > > > > wrote: > > > > > > > >> Thanks again for the information. > > > >> > > > >> We had increased the RAM to 3g some time ago to prevent OOMEs. More > > > >> recently, I increased the RAM again to 5g for extra headroom since > we > > had > > > >> more headroom available; the problem hasn't happened since, but it > > hasn't > > > >> been very long. > > > >> > > > >> We use a more customized image based on Alpine 3.8.2. The JDK and > > Maven > > > >> are obtained via apk. > > > >> > > > >> I will try upgrading failsafe (and surefire while I'm at it) sooner, > > and > > > >> probably do some experimentation with JVMs another time (not > pressing > > for > > > >> me ATM). > > > >> > > > >> On Tue, Feb 26, 2019 at 12:20 PM Tibor Digana < > [email protected] > > > > > > >> wrote: > > > >> > > > >>> >> I'll try to enable some logging about GC pauses to see what's up > > > >>> > > > >>> Pls do not keep such setting after tuning the GC because this may > > > >>> sometime > > > >>> break the interprocess communication between Maven process and > > surefire > > > >>> process. > > > >>> It's worth to list GC information in a file and not in the console > > logs. > > > >>> This can be configured, I guess. > > > >>> > > > >>> >> Do you think the value is simply too low? > > > >>> > > > >>> GCing many objects may take some time and I remember we had a user > > who > > > >>> had > > > >>> this problem a year or two ago. > > > >>> We check every third NOOP (which is 3 x 10 sec) as a fix instead of > > every > > > >>> NOP. So 30 seconds looked satisfactory. > > > >>> I think you use old version 2.20 or something like that. The fixes > > for > > > >>> docker have been done so far, so please use the latest version > > 3.0.0-M3. > > > >>> See this page > > > >>> > https://maven.apache.org/surefire/maven-surefire-plugin/docker.html, > > we > > > >>> used maven:3.5.3-jdk-8-alpine in this test. Which base image did > you > > use? > > > >>> > > > >>> Cheers > > > >>> Tibor > > > >>> > > > >>> On Tue, Feb 26, 2019 at 5:24 PM Jason Young < > > [email protected]> > > > >>> wrote: > > > >>> > > > >>> > Thanks for the information. It's good to see someone understands > a > > > >>> little > > > >>> > about this. > > > >>> > > > > >>> > Incidentally, we have been looking at other GCs and VMs for the > > > >>> application > > > >>> > in production environments, so I'll look into how these affect > > tests as > > > >>> > well. I'll try to enable some logging about GC pauses to see > > what's up. > > > >>> > > > > >>> > How would `-Xmx3g` cause long GC cycles? Do you think the value > is > > > >>> simply > > > >>> > too low? > > > >>> > > > > >>> > FWIW we're running the Maven build in an Alpine-based Docker > > container. > > > >>> > > > > >>> > On Sat, Feb 23, 2019 at 6:36 AM Tibor Digana < > > [email protected]> > > > >>> > wrote: > > > >>> > > > > >>> > > Hi Jason, > > > >>> > > > > > >>> > > We spoke about this issue on our chat in ASF Slack: > > > >>> > > "I think his tests have been paused for a long GC periods and > > timed > > > >>> out > > > >>> > 3x > > > >>> > > PING period = 30 seconds. After this period forked JVM supposed > > the > > > >>> Maven > > > >>> > > process was killed by JenkinsCI and therefore all surefire > > processes > > > >>> are > > > >>> > > killed as well and all the file handlers and memory > consumptions > > are > > > >>> > > freed." > > > >>> > > > > > >>> > > "But I have to say that `-Xmx3g` may cause long GC cycles, see > > > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html > > > >>> > > " > > > >>> > > > > > >>> > > You are using java-1.8-openjdk. I guess you should use > > Shenandoah GC > > > >>> > which > > > >>> > > is an experimental algorithm in JVM 1.8. This would > significantly > > > >>> short > > > >>> > > the GC cycles. > > > >>> > > > > > >>> > > We should of cource provide a new configuration parameter to > give > > > >>> you a > > > >>> > > chance to prolong the PING. > > > >>> > > > > > >>> > > Cheers > > > >>> > > Tibor > > > >>> > > > > > >>> > > > > >>> > > > > >>> > -- > > > >>> > > > > >>> > Jason Young > > > >>> > > > > >>> > > > >> > > > >> > > > > > > -- > > > Jason Young > > > Software Engineer | PROCENTIVE > > > [image: Phone] 715 245 8000 x7609 > > > [image: Mobile] 706 870 3540 > > > [image: Web] procentive.com > > > Confidentiality Notice: This message is intended for the sole use of > the > > > individual and entity to which it is addressed, and may contain > > information > > > that is privileged, confidential and exempt from disclosure under > > > applicable law. Any unauthorized review, use, disclosure or > distribution > > of > > > this email message, including any attachment, is prohibited. If you are > > not > > > the intended recipient, please advise the sender by reply email and > > destroy > > > all copies of the original message. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > >
