[ 
https://issues.apache.org/jira/browse/SUREFIRE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Digana updated SUREFIRE-1302:
-----------------------------------

Hi Olivier,

I think configurable ping timeout may not necessarily save our soul.
If you think in terms of accuracy of timers in Maven process and forked
jvm, and GC this would lead to pretty long timeour, e.g. 30 or 60 seconds.
The Jenkins scheduler is faster, cca 5 sec and git clone is fast as well.
If we could synchronize both timers, we would do much better job. Include
JMX which would prolong the period checking by GC period when Maven process
is supposed to be idle or dead.

My problem with adding configuration parameter is a conflict with Version
3.0.
In 3.0 w have an ambition to customize the parameters with a programmatic
extension per parameter.
Add more and more parameters would delay 3.0 release and means having
unhappy users.
Version  3.0 has this ambition of extensions, requested by user group,
compatibility with Maven 3.0.x and JUnit 5 provider.
All of these are in progress in our branches.

Our PING code is simple. It is located in ForkedBooter.java.
There are two timers. Our ping timer has fixed rate scheduler. This timer
has period of 20 seconds. I think we can make much better code if we try to
synchronize Maven process ping with forked JVM at rate of 1 second. The
timer should be tolerant to GC pauses.
What is the location of long GC pauses? Is it forked JVM or Maven process?
If it is Maven process then I am afraid we cannot do anything about it.
If it is forked JVM, killing jvm itself is up to ForkedBooter running in
forked jvm and the code can be tolerant.
What happens if we have e.g. 1 second timer and GC pauses also this timer?
Would it invoke all elapsed events right after the GC has finished? We
should write the last tick of timer and counter of GC and retrieve delay
from JMX if counter has been incremented. I would say the period of 20
seconds of idle status should be prolonged internally and counting the
period restarted after PING has been received from Maven process which is
the synchronization.





On Wed, Apr 19, 2017 at 10:46 AM, Olivier Peyrusse (JIRA) <j...@apache.org>



> Surefire does not wait long enough for the forked VM and assumes it to be dead
> ------------------------------------------------------------------------------
>
>                 Key: SUREFIRE-1302
>                 URL: https://issues.apache.org/jira/browse/SUREFIRE-1302
>             Project: Maven Surefire
>          Issue Type: Request
>          Components: Maven Surefire Plugin
>    Affects Versions: 2.19.1
>            Reporter: Yuriy Zaplavnov
>            Assignee: Tibor Digana
>             Fix For: Backlog
>
>         Attachments: 
> surefire-tests-terminated-master-aa9330316038f6b46316ce36ff40714ffc7cf299.zip,
>  tests_log_01.txt, tests_log_02.txt
>
>
> This issue happens because surefire kills the forked container if it times 
> out waiting for the 'ping'.
> In org.apache.maven.surefire.booter.ForkedBooter class there is hardcoded 
> constant PING_TIMEOUT_IN_SECONDS  = 20 which is used in the following method:
> {code}
> private static ScheduledFuture<?> listenToShutdownCommands( CommandReader 
> reader )
>     {
>         reader.addShutdownListener( createExitHandler( reader ) );
>         AtomicBoolean pingDone = new AtomicBoolean( true );
>         reader.addNoopListener( createPingHandler( pingDone ) );
>         return JVM_TERMINATOR.scheduleAtFixedRate( createPingJob( pingDone, 
> reader ),
>                                                    0,PING_TIMEOUT_IN_SECONDS, 
> SECONDS );
>     }
> {code}
> to create ScheduledFuture.
> In some of the cases the forked container might respond a bit later than it's 
> expected and surefire kills it
> {code}
> private static Runnable createPingJob( final AtomicBoolean pingDone, final 
> CommandReader reader  )
>     {
>         return new Runnable()
>         {
>             public void run()
>             {
>                 boolean hasPing = pingDone.getAndSet( false );
>                 if ( !hasPing )
>                 {
>                     exit( 1, KILL, reader, true );
>                 }
>             }
>         };
>     }
> {code}
> As long as we need to terminate it anyway, It would be really helpful if the 
> problem could be solved making the PING_TIMEOUT_IN_SECONDS  configurable with 
> the ability to specify the value from maven-surefire-plugin. 
> It would help to configure this timeout based on needs and factors of the 
> projects where surefire runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to