[ 
https://issues.apache.org/jira/browse/SUREFIRE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171544#comment-16171544
 ] 

Christian Beikov commented on SUREFIRE-1264:
--------------------------------------------

I am trying to build Apache Calcite on Travis CI with a parallel configuration 
and am consistently running into a hang. I tried various versions of the 
Surefire plugin(1.18, 1.18.1, 1.19.1, 1.20.1) and all seem to have the same 
problem. I did a thread dump after a timeout, but I am not sure what the 
problem is. It seems to be related to this issue. For testing purposes I 
disabled every test but the seemingly problematic parameterized test and in 
fact, it hangs consistently: 
https://travis-ci.org/beikov/calcite/builds/277258873
Without the parallel configuration or when running with a single CPU, the 
problem doesn't seem to happen.

You can run this on a custom VM too. I configured a VirtualBox VM with 2 CPUs 
and 4GB RAM and ran the following commands to execute this locally

{code}
docker run --name travis-debug -dit travisci/ci-garnet:packer-1503972846 
/sbin/init
docker exec -it travis-debug bash -l
su - travis
git clone --branch=travis-test https://github.com/beikov/calcite.git
cd calcite
mvn -Dsurefire.useFile=false -Dcheckstyle.skip=true test
{code}

The exact configuration is

{code}
<threadCount>1</threadCount>
<perCoreThreadCount>true</perCoreThreadCount>
<parallel>both</parallel>
<forkMode>once</forkMode>
{code}

Here the stack traces..

{code}
"pool-1-thread-2" #13 daemon prio=5 os_prio=0 tid=0x00007f0694ff1000 nid=0xc97 
waiting on condition [0x00007f067c934000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000080082c78> (a 
java.util.concurrent.Semaphore$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
        at 
org.apache.maven.surefire.junitcore.pc.ThreadResourcesBalancer.acquirePermit(ThreadResourcesBalancer.java:76)
        at 
org.apache.maven.surefire.junitcore.pc.Scheduler.schedule(Scheduler.java:363)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:27)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at 
org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:410)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
"pool-1-thread-1" #12 daemon prio=5 os_prio=0 tid=0x00007f0694fef800 nid=0xc96 
waiting on condition [0x00007f067ca36000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000080702e60> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

"process reaper" #10 daemon prio=10 os_prio=0 tid=0x00007f06946c9800 nid=0xc94 
waiting on condition [0x00007f067ce6f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000008064cb98> (a 
java.util.concurrent.SynchronousQueue$TransferStack)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
"ping-30s" #9 daemon prio=5 os_prio=0 tid=0x00007f0694348000 nid=0xc92 waiting 
on condition [0x00007f067d055000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000805de2e8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
"surefire-forkedjvm-command-thread" #8 daemon prio=5 os_prio=0 
tid=0x00007f0694338800 nid=0xc91 runnable [0x00007f067d35f000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:255)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        - locked <0x000000008064ee88> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at 
org.apache.maven.surefire.booter.MasterProcessCommand.decode(MasterProcessCommand.java:115)
        at 
org.apache.maven.surefire.booter.CommandReader$CommandRunnable.run(CommandReader.java:391)
        at java.lang.Thread.run(Thread.java:748)
"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f0694146800 nid=0xc8f 
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f0694144000 
nid=0xc8e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f0694141000 
nid=0xc8d waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f069413f800 nid=0xc8c 
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f0694107000 nid=0xc8b in 
Object.wait() [0x00007f067e57a000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        - locked <0x000000008064f980> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f0694102800 
nid=0xc8a in Object.wait() [0x00007f067e67b000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
        - locked <0x000000008064fbb0> (a java.lang.ref.Reference$Lock)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00007f069400a000 nid=0xc85 waiting on 
condition [0x00007f069ab0b000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000080082c78> (a 
java.util.concurrent.Semaphore$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
        at 
org.apache.maven.surefire.junitcore.pc.ThreadResourcesBalancer.acquirePermit(ThreadResourcesBalancer.java:76)
        at 
org.apache.maven.surefire.junitcore.pc.Scheduler.schedule(Scheduler.java:363)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:27)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at 
org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:410)
        at 
org.apache.maven.surefire.junitcore.pc.InvokerStrategy.schedule(InvokerStrategy.java:54)
        at 
org.apache.maven.surefire.junitcore.pc.Scheduler.schedule(Scheduler.java:367)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:27)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at 
org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:410)
        at 
org.apache.maven.surefire.junitcore.pc.InvokerStrategy.schedule(InvokerStrategy.java:54)
        at 
org.apache.maven.surefire.junitcore.pc.Scheduler.schedule(Scheduler.java:367)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at 
org.apache.maven.surefire.junitcore.pc.ParallelComputerBuilder$PC$1.run(ParallelComputerBuilder.java:593)
        at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
        at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
        at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
        at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
        at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
        at 
org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:159)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
        at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
{code}

> Some tests can be lost when running in parallel with parameterized tests
> ------------------------------------------------------------------------
>
>                 Key: SUREFIRE-1264
>                 URL: https://issues.apache.org/jira/browse/SUREFIRE-1264
>             Project: Maven Surefire
>          Issue Type: Bug
>          Components: Junit 4.7+ (parallel) support
>    Affects Versions: 2.19.1
>         Environment: Maven 3.3.9
> Java 1.8.0_45 (Oracle)
> System: OS X
> Reproduced on Linux too, with both OpenJDK 7 and Java 7 from Oracle.
>            Reporter: Jean-Luc Derrien
>            Assignee: Tibor Digana
>             Fix For: 2.20.1
>
>         Attachments: failure-2.21-snapshot-SUREFIRE-1264.zip, pom.xml
>
>          Time Spent: 144h
>  Remaining Estimate: 24h
>
> Hello,
> It appears some tests can be lost when using the parallel mode with 
> parameterized tests. Here is a [small 
> project|https://github.com/jderrien/surefire-junit-tests/tree/simple-1] I've 
> used to try to reproduce the issue we have seen.
> This is not something that happens on every run, but it's quite frequent.
> With this loop, the problem should appear in less than 2 minutes, maybe on 
> the first run when (un)lucky:
> {code}
> time while true; do mvn clean test > last.log ; tail -25 last.log ; if [ 
> "$(grep -c 'Tests run: 12' last.log)" == "0" ]; then break; fi ; done
> {code}
> Normal run:
> {noformat}
> -------------------------------------------------------
>  T E S T S
> -------------------------------------------------------
> Running [p2]
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p2] - p2 - 
> sleeptime = 1 => start
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p2] - p2 - 
> sleeptime = 1 => stop
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p1] - p1 - 
> sleeptime = 2 => start
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p1] - p1 - 
> sleeptime = 2 => stop
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p0] - p0 - 
> sleeptime = 5 => start
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p0] - p0 - 
> sleeptime = 5 => stop
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.006 sec - 
> in [p2]
> Running [p2]
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec - 
> in [p2]
> Results :
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 0
> {noformat}
> When tests have been lost (here one test lost according to the output):
> {noformat}
> -------------------------------------------------------
>  T E S T S
> -------------------------------------------------------
> Running [p2]
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p0] - p0 - 
> sleeptime = 5 => start
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p2] - p2 - 
> sleeptime = 1 => start
> com.appnexus.viewability.core.surefireJunitTests.ATest.methodA1[p1] - p1 - 
> sleeptime = 2 => start
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 sec - 
> in [p2]
> Running [p2]
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 sec - 
> in [p2]
> Results :
> Tests run: 11, Failures: 0, Errors: 0, Skipped: 0
> {noformat}
> Note: there are 3 test classes and 18 tests on the [master 
> branch|https://github.com/jderrien/surefire-junit-tests/tree/master] and it's 
> even more frequent/easy to reproduce.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to