Re: Bug that spans tomcat and tomcat-native

therealneworld Fri, 24 Jun 2016 16:52:42 -0700

On Fri, Jun 24, 2016 at 5:31 PM, Mark Thomas <ma...@apache.org> wrote:
> On 24/06/2016 21:52, therealnewo...@gmail.com wrote:
>
>>> I'm wondering if this is a problem that builds up over time. If I
>>> understood your previous posts correctly, running the big tests
>>> immediately gave ~700MB/s whereas running the small tests then the big
>>> tests resulting in ~350MB/s during the big tests. Are you able to
>>> experiment with this a little bit? For example, if you do big tests, 1M
>>> (~20%) small tests, big tests, 1M small tests, big tests etc. What is
>>> the data rate for the big tests after 0, 1M, 2M, 3M, 4M and 5M little tests.
>>
>> Sure I can try that. For the in between tests do you want me to run
>> those for a set amount of time or number of files? Like each smaller
>> batch like 20min and then 10min of large and then next smaller size?
>
> I was thinking set number of files.
>
> I would also be useful to know how many threads the executor has created
> at each point as well. (JMX should tell you that. You might need to
> adjust the executor so it doesn't stop idle threads.).


I saw your message about not stopping idle threads after I already
started things.

1st 100M test:
    851348MB/s
    Executor:
        largestPoolSize: 25
        poolSize: 25
1st 4k test
    Executor:
        largestPoolSize: 401
        poolSize: 401
2nd 100M test:
     460147MB/s
    Executor:
        largestPoolSize: 401
        poolSize: 79
2nd 4k test
    Executor:
        largestPoolSize: 414
        poolSize: 414
3rd 100M test:
    429127MB/s
    Executor:
        largestPoolSize: 414
        poolSize: 80
3rd 4k test:
    Executor:
        largestPoolSize: 414
        poolSize: 401
4th 100M test:
    437918MB/s
    Executor:
        largestPoolSize: 414
        poolSize: 86
4th 4k test:
    Executor:
        largestPoolSize: 414
        poolSize: 401
5th 100M test:
    464837MB/s
    Executor:
        largestPoolSize: 414
        poolSize: 87

It looks like the problem occurs right after the first set of 4k puts
and doesn't get any worse so what ever causes the issue happens early.
This is getting stranger and I really can not explain why calling
ERR_remove_thread_state reliably improves performance.

> Going back to your original description, you said you saw blocking
> during the call to ERR_clear_err(). Did you mean ERR_clear_error()?
> Either way, could you provide the full stack trace of an example blocked
> thread? And, ideally, the stack trace of the thread currently holding
> the lock? I'm still trying to understand what is going on here since
> based on my understanding of the code so far, the HashMap is bounded (to
> the number of threads) and should reach that limit fairly quickly.

Sorry, yes I did mean ERR_clear_error().

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f0f61ab989d in __GI___pthread_mutex_lock (mutex=0x2b49c58)
at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f0f3205f183 in int_thread_get (create=0) at err.c:446
#3  0x00007f0f3205f68d in int_thread_get_item (d=0x7f0ca89c7ce0) at err.c:491
#4  0x00007f0f32060094 in ERR_get_state () at err.c:1014
#5  0x00007f0f320602cf in ERR_clear_error () at err.c:747
#6  0x00007f0f325f3579 in ssl_socket_recv (sock=0x7f0dcc391980,
    buf=0x7f0eec067820
"lock->199808-Source_filename->rhino_perf_https_lt_100g_a-Loop->1-Count->11089487-11089488-11089489-11089490-11089491-11089492-11089493-11089494-11089495-11089496-11089497-11089498-11089499-11089500-11"...,
    len=0x7f0ca89c7ff0) at src/sslnetwork.c:401
#7  0x00007f0f325ece99 in Java_org_apache_tomcat_jni_Socket_recvbb
(e=<optimized out>, o=<optimized out>, sock=<optimized out>,
offset=<optimized out>, len=<optimized out>) at src/network.c:957

I tried getting more data but the jvm tends to dump core when gdb is
attached or is going to slow to actually cause the lock contention. I
can reliably see a thread waiting on this lock if I attach to a single
thread and randomly interrupt it and look at the back trace. When I
look at the mutex it has a different owner each time so different
threads are getting the lock.

I will play with this a bit more on Monday.

-nate

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Re: Bug that spans tomcat and tomcat-native

Reply via email to