I did another run for "ant test" for 7.0.35 and when using APR I got the
following error the first time (never saw that before):

    [junit] Running org.apache.jasper.compiler.TestParser

...

    [junit] Jan 18, 2013 4:37:56 PM
org.apache.catalina.core.AprLifecycleListener init
    [junit] INFO: Loaded APR based Apache Tomcat Native library 1.1.24
using APR version 1.4.6.
    [junit] Jan 18, 2013 4:37:56 PM
org.apache.catalina.core.AprLifecycleListener init
    [junit] INFO: APR capabilities: IPv6 [true], sendfile [true], accept
filters [false], random [true].
    [junit] Jan 18, 2013 4:37:57 PM
org.apache.catalina.core.AprLifecycleListener initializeSSL
    [junit] INFO: OpenSSL successfully initialized (OpenSSL 1.0.1c 10
May 2012)
    [junit] Jan 18, 2013 4:37:57 PM org.apache.coyote.AbstractProtocol init
    [junit] INFO: Initializing ProtocolHandler ["http-apr-127.0.0.1-auto-2"]
    [junit] Jan 18, 2013 4:37:57 PM
org.apache.catalina.core.StandardService startInternal
    [junit] INFO: Starting service Tomcat
    [junit] Jan 18, 2013 4:37:57 PM
org.apache.catalina.core.StandardEngine startInternal
    [junit] INFO: Starting Servlet Engine: Apache Tomcat/7.0.35
    [junit] Jan 18, 2013 4:37:57 PM
org.apache.catalina.startup.ContextConfig getDefaultWebXmlFragment
    [junit] INFO: No global web.xml found
    [junit] Jan 18, 2013 4:37:57 PM org.apache.coyote.AbstractProtocol start
    [junit] INFO: Starting ProtocolHandler
["http-apr-127.0.0.1-auto-2-42652"]
    [junit] Jan 18, 2013 4:37:59 PM org.apache.coyote.AbstractProtocol pause
    [junit] INFO: Pausing ProtocolHandler
["http-apr-127.0.0.1-auto-2-42652"]
    [junit] Jan 18, 2013 4:37:59 PM
org.apache.catalina.core.StandardService stopInternal
    [junit] INFO: Stopping service Tomcat
    [junit] Jan 18, 2013 4:37:59 PM org.apache.coyote.AbstractProtocol stop
    [junit] INFO: Stopping ProtocolHandler
["http-apr-127.0.0.1-auto-2-42652"]
    [junit] Jan 18, 2013 4:37:59 PM
org.apache.tomcat.util.net.AprEndpoint$Poller doPoll
    [junit] SEVERE: Critical poller failure (restarting poller): [9] Bad
file number

            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


OS is Solaris 10, using Java 1.6.0_37.

The ant and Tomcat processes are still there to investigate, Tomcat is
looping on CPU in Thread 26. Stack is:

-----------------  lwp# 26 / thread# 26  --------------------
 fe024ccc apr_pool_destroy (8a6da0, b667f830, 0, 0, 16ed48, 13040d8) + 184
 fc00d700 * org/apache/tomcat/jni/Pool.destroy(J)V+1478
 fc00d6ac * org/apache/tomcat/jni/Pool.destroy(J)V+0
 fc005ab0 * org/apache/tomcat/util/net/AprEndpoint$Poller.destroy()V+68
(line 2347)
 fc005ab0 * org/apache/tomcat/util/net/AprEndpoint$Poller.doPoll(J)Z+292
(line 2688)
 fc005868 * org/apache/tomcat/util/net/AprEndpoint$Poller.run()V+345
(line 2629)

In the java world it is:

"http-apr-127.0.0.1-auto-2-Poller-0" daemon prio=3 tid=0x00645400
nid=0x1a runnable [0xb667f000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.tomcat.jni.Pool.destroy(Native Method)
        at
org.apache.tomcat.util.net.AprEndpoint$Poller.destroy(AprEndpoint.java:1177)
        at
org.apache.tomcat.util.net.AprEndpoint$Poller.doPoll(AprEndpoint.java:1362)
        - locked <0xe662d6a0> (a
org.apache.tomcat.util.net.AprEndpoint$Poller)
        at
org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1310)


main thread is at:

"main" prio=3 tid=0x00031400 nid=0x2 waiting for monitor entry [0xfe37e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xe662d6a0> (a
org.apache.tomcat.util.net.AprEndpoint$Poller)
        at java.lang.Thread.join(Thread.java:1194)
        - locked <0xe662d6a0> (a
org.apache.tomcat.util.net.AprEndpoint$Poller)
        at
org.apache.tomcat.util.net.AprEndpoint$Poller.destroy(AprEndpoint.java:1183)
        at
org.apache.tomcat.util.net.AprEndpoint.stopInternal(AprEndpoint.java:685)
        at
org.apache.tomcat.util.net.AbstractEndpoint.stop(AbstractEndpoint.java:664)
        at
org.apache.coyote.AbstractProtocol.stop(AbstractProtocol.java:488)
        at
org.apache.catalina.connector.Connector.stopInternal(Connector.java:1037)
        at
org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)


The code is looping in allocator_free, because it tries to traverse a
pointer chain that is cyclic:

(gdb) print *node
$5 = {next = 0x13040d8, ref = 0x678748, index = 1, free_index = 1,
first_avail = 0x8a6e50 "", endp = 0x8a8d88 ""}
(gdb) print *node->next
$6 = {next = 0x8a6d88, ref = 0x13040d8, index = 1, free_index = 109,
first_avail = 0x1304418 "\001\060D ", endp = 0x13060d8 ""}
(gdb) print *node->next->next
$7 = {next = 0x13040d8, ref = 0x678748, index = 1, free_index = 1,
first_avail = 0x8a6e50 "", endp = 0x8a8d88 ""}

So node->next->next == node.

This is an indication for a synchronization problem in the code using
the pool.

The allocator is:

(gdb) print *allocator
$8 = {max_index = 1, max_free_index = 0, current_free_index = 0, mutex =
0x32f5b8, owner = 0x32f578, free = {0x678748, 0x8a6d88, 0x0 <repeats 18
times>}}


and the pool is:

(gdb) print *pool
$9 = {parent = 0x49c478, child = 0x0, sibling = 0x0, ref = 0xe78d20,
cleanups = 0x0, free_cleanups = 0x0, allocator = 0x16ed48, subprocesses
= 0x0, abort_fn = 0,
  user_data = 0x0, tag = 0x0, active = 0x678748, self = 0x8a6d88,
self_first_avail = 0x8a6de0 "", pre_cleanups = 0x0}

It should be the "serverSockPool" in AprEndpoint$Poller.

Walking up the pool hierarchy:

(gdb) print *pool->parent
$11 = {parent = 0x479830, child = 0x1265170, sibling = 0x0, ref =
0x1018370, cleanups = 0x0, free_cleanups = 0x0, allocator = 0x16ed48,
subprocesses = 0x0, abort_fn = 0,
  user_data = 0x0, tag = 0x0, active = 0x49c460, self = 0x49c460,
self_first_avail = 0x49c4b8 "", pre_cleanups = 0x0}

(gdb) print *pool->parent->parent
$12 = {parent = 0x54ac40, child = 0x1018368, sibling = 0x0, ref =
0x54ac44, cleanups = 0x0, free_cleanups = 0x0, allocator = 0x16ed48,
subprocesses = 0x0, abort_fn = 0,
  user_data = 0x0, tag = 0x0, active = 0x479818, self = 0x479818,
self_first_avail = 0x479870 "", pre_cleanups = 0x0}

(gdb) print *pool->parent->parent->parent
$13 = {parent = 0x32f578, child = 0x479830, sibling = 0xb61a90, ref =
0x32f57c, cleanups = 0x54b4f8, free_cleanups = 0x0, allocator =
0x16ed48, subprocesses = 0x0,
  abort_fn = 0, user_data = 0x0, tag = 0x0, active = 0x54ac28, self =
0x54ac28, self_first_avail = 0x54ac80 "/dev/urandom", pre_cleanups = 0x0}

(gdb) print *pool->parent->parent->parent->parent
$14 = {parent = 0x0, child = 0x54ac40, sibling = 0x0, ref = 0x0,
cleanups = 0x32f5d8, free_cleanups = 0x0, allocator = 0x16ed48,
subprocesses = 0x0, abort_fn = 0,
  user_data = 0x0, tag = 0xfe032a60 "apr_global_pool", active =
0x32f560, self = 0x32f560, self_first_avail = 0x32f5b8 "", pre_cleanups
= 0x0}


Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to