Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 6:17 AM, Rémy Maucherat wrote: > 2016-06-24 12:08 GMT+02:00 Mark Thomas : > >> Thanks. >> >> I'm going to start some local performance testing to confirm I see >> similar results and, assuming I do, I'll start looking at fixing this >> for 1.2.x/9.0.x and back-porting. >> >> Hum, the fix that was submitted doesn't make sense IMO since writes can be > async, so I don't see a way besides adding the "error clear" thing after > each operation [and we'll remove it once OpenSSL 1.1 is there if it > actually fixes it]. That's assuming this issue is real [I actually never > noticed anything during my many ab runs and they use a lot of threads, so I > have a hard time believing it is significant enough ;) ]. > One thing about the system on which this is running is that it has a 10G nic. So the slow case is about 350MB/s and the fast one is 700MB/s so you would need a 10G interface or use loop back to even notice the issue assuming the CPU on the system can push that much encrypted data. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 11:18 AM, Mark Thomas wrote: > On 24/06/2016 11:17, Rémy Maucherat wrote: >> 2016-06-24 12:08 GMT+02:00 Mark Thomas : >> >>> Thanks. >>> >>> I'm going to start some local performance testing to confirm I see >>> similar results and, assuming I do, I'll start looking at fixing this >>> for 1.2.x/9.0.x and back-porting. >>> >> Hum, the fix that was submitted doesn't make sense IMO since writes can be >> async, so I don't see a way besides adding the "error clear" thing after >> each operation [and we'll remove it once OpenSSL 1.1 is there if it >> actually fixes it]. That's assuming this issue is real [I actually never >> noticed anything during my many ab runs and they use a lot of threads, so I >> have a hard time believing it is significant enough ;) ]. > > I haven't been able to reproduce anything like this yet. So far I have > only been testing with tc-native 1.2.x and Tomcat 9.0.x. I might need to > test with 1.1.x and Tomcat 7.0.x, the versions used by the OP. > > I'm having trouble understanding how this is happening. I could imagine > that HashMap becoming a problem if there was a high churn in Threads. > I'm thinking of something like bursty traffic levels and an executor > aggressively halting spare threads. I need to experiment with that as well. > > Nate, > > We need as much information as you can provide on how to reproduce this. > As a minimum we need to know: > - Connector configuration from server.xml > - Operating system > - How tc-native was built > - Exact versions for everything > > We need enough information to recreate the test and the results > that you obtained. Connector configuration: Keepalive is enabled. OS: Fedora 22 tc-native: tomcat-native-1.1.34-1.fc22.x86_64 tomcat: tomcat-7.0.68-3.fc22.noarch This issue was seen in older versions of tomcat: tomcat-native-1.1.30-2.fc21 and tomcat-7.0.54-3.fc21 All of the builds are the rpms released by fedora from their build machines. The test I ran performed about 5 million 4k requests and then did the large 100M requests and was able to see the issue immediately. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 11:37 AM, wrote: > On Fri, Jun 24, 2016 at 11:18 AM, Mark Thomas wrote: >> On 24/06/2016 11:17, Rémy Maucherat wrote: >>> 2016-06-24 12:08 GMT+02:00 Mark Thomas : >>> Thanks. I'm going to start some local performance testing to confirm I see similar results and, assuming I do, I'll start looking at fixing this for 1.2.x/9.0.x and back-porting. >>> Hum, the fix that was submitted doesn't make sense IMO since writes can be >>> async, so I don't see a way besides adding the "error clear" thing after >>> each operation [and we'll remove it once OpenSSL 1.1 is there if it >>> actually fixes it]. That's assuming this issue is real [I actually never >>> noticed anything during my many ab runs and they use a lot of threads, so I >>> have a hard time believing it is significant enough ;) ]. >> >> I haven't been able to reproduce anything like this yet. So far I have >> only been testing with tc-native 1.2.x and Tomcat 9.0.x. I might need to >> test with 1.1.x and Tomcat 7.0.x, the versions used by the OP. >> >> I'm having trouble understanding how this is happening. I could imagine >> that HashMap becoming a problem if there was a high churn in Threads. >> I'm thinking of something like bursty traffic levels and an executor >> aggressively halting spare threads. I need to experiment with that as well. >> >> Nate, >> >> We need as much information as you can provide on how to reproduce this. >> As a minimum we need to know: >> - Connector configuration from server.xml >> - Operating system >> - How tc-native was built >> - Exact versions for everything >> >> We need enough information to recreate the test and the results >> that you obtained. > OS: Fedora 22 > tc-native: tomcat-native-1.1.34-1.fc22.x86_64 > tomcat: tomcat-7.0.68-3.fc22.noarch > > This issue was seen in older versions of tomcat: > tomcat-native-1.1.30-2.fc21 and tomcat-7.0.54-3.fc21 I forgot to give you the openssl version openssl-1.0.1k-15.fc22.x86_64 -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 6:17 AM, Rémy Maucherat wrote: > 2016-06-24 12:08 GMT+02:00 Mark Thomas : > >> Thanks. >> >> I'm going to start some local performance testing to confirm I see >> similar results and, assuming I do, I'll start looking at fixing this >> for 1.2.x/9.0.x and back-porting. >> >> Hum, the fix that was submitted doesn't make sense IMO since writes can be > async, so I don't see a way besides adding the "error clear" thing after > each operation [and we'll remove it once OpenSSL 1.1 is there if it > actually fixes it]. That's assuming this issue is real [I actually never > noticed anything during my many ab runs and they use a lot of threads, so I > have a hard time believing it is significant enough ;) ]. I was not using async IO so I did not account for that in my patch. It was more a case of see if I can resolve this issue for my use case. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 11:18 AM, Mark Thomas wrote: > On 24/06/2016 11:17, Rémy Maucherat wrote: >> 2016-06-24 12:08 GMT+02:00 Mark Thomas : >> >>> Thanks. >>> >>> I'm going to start some local performance testing to confirm I see >>> similar results and, assuming I do, I'll start looking at fixing this >>> for 1.2.x/9.0.x and back-porting. >>> >> Hum, the fix that was submitted doesn't make sense IMO since writes can be >> async, so I don't see a way besides adding the "error clear" thing after >> each operation [and we'll remove it once OpenSSL 1.1 is there if it >> actually fixes it]. That's assuming this issue is real [I actually never >> noticed anything during my many ab runs and they use a lot of threads, so I >> have a hard time believing it is significant enough ;) ]. > > I haven't been able to reproduce anything like this yet. So far I have > only been testing with tc-native 1.2.x and Tomcat 9.0.x. I might need to > test with 1.1.x and Tomcat 7.0.x, the versions used by the OP. > > I'm having trouble understanding how this is happening. I could imagine > that HashMap becoming a problem if there was a high churn in Threads. > I'm thinking of something like bursty traffic levels and an executor > aggressively halting spare threads. I need to experiment with that as well. > I do not understand it either. Using the thread pool there is not much thread churn so I am not sure why the problem gets as bad as it does. I didn't look into what the hash table actually had in it. I just noticed that the majority of a read threads time was spent waiting for the lock to access this hash table. Once I added the call to ERR_remove_thread_state the waiting basically disappeared. For this test the traffic is constant. Each client thread creates one connection and just keeps pushing requests for set number of requests, so we aren't even creating new connections. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 2:07 PM, Mark Thomas wrote: > On 24/06/2016 18:41, Nate Clark wrote: >> On Fri, Jun 24, 2016 at 1:37 PM, Nate Clark wrote: >>> On Fri, Jun 24, 2016 at 1:27 PM, Mark Thomas wrote: On 24/06/2016 18:25, Mark Thomas wrote: > > Can you provide the settings you are using for the Executor as well > please? >>> >>> >> maxThreads="500" minSpareThreads="4"/> >>> And how long do the initial 5,000,000 4k requests take to process? >>> >>> 40 minutes. >>> >> Not sure this matters but I just double checked and there are actually >> 400 threads in total doing the 4k PUTs. Two clients each doing 200 >> threads. the 100MB test is 24 threads total 12 per client machine. >> >> Sorry for misinformation earlier. > > No problem. Thanks for the information. One last question (for now). How > many processors / cores / threads does the server support? I'm trying to > get a handle on what the concurrency looks like. > The machine has two physical chips each with 6 cores and hyper-threading enabled, so 24 cores exposed to the OS. cpuinfo for first core: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz stepping: 2 microcode : 0x2e cpu MHz : 1212.656 cache size : 15360 KB physical id : 0 siblings: 12 core id : 0 cpu cores : 6 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 15 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc bugs: bogomips: 4788.98 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: If it matters the system also has 256GB of memory. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 3:21 PM, Mark Thomas wrote: > On 24/06/2016 20:01, therealnewo...@gmail.com wrote: >> On Fri, Jun 24, 2016 at 2:07 PM, Mark Thomas wrote: >>> On 24/06/2016 18:41, Nate Clark wrote: On Fri, Jun 24, 2016 at 1:37 PM, Nate Clark wrote: > On Fri, Jun 24, 2016 at 1:27 PM, Mark Thomas wrote: >> On 24/06/2016 18:25, Mark Thomas wrote: >>> >>> Can you provide the settings you are using for the Executor as well >>> please? > > maxThreads="500" minSpareThreads="4"/> > >> >> And how long do the initial 5,000,000 4k requests take to process? >> > > 40 minutes. > Not sure this matters but I just double checked and there are actually 400 threads in total doing the 4k PUTs. Two clients each doing 200 threads. the 100MB test is 24 threads total 12 per client machine. Sorry for misinformation earlier. >>> >>> No problem. Thanks for the information. One last question (for now). How >>> many processors / cores / threads does the server support? I'm trying to >>> get a handle on what the concurrency looks like. >>> >> >> The machine has two physical chips each with 6 cores and >> hyper-threading enabled, so 24 cores exposed to the OS. > > Thanks. > > > >> If it matters the system also has 256GB of memory. > > I don't think RAM is playing a role here but it is still good to know. > > In terms of next steps, I want to see if I can come up with a theory > that matches what you are observing. From that we can then assess > whether the proposed patch can be improved. > > Apologies for the drip-feeding of questions. As I learn a bit more, a > few more questions come to mind. > > I'm wondering if this is a problem that builds up over time. If I > understood your previous posts correctly, running the big tests > immediately gave ~700MB/s whereas running the small tests then the big > tests resulting in ~350MB/s during the big tests. Are you able to > experiment with this a little bit? For example, if you do big tests, 1M > (~20%) small tests, big tests, 1M small tests, big tests etc. What is > the data rate for the big tests after 0, 1M, 2M, 3M, 4M and 5M little tests. Sure I can try that. For the in between tests do you want me to run those for a set amount of time or number of files? Like each smaller batch like 20min and then 10min of large and then next smaller size? > What I am trying to pin down is how quickly does this problem build up. > > Also, do you see any failed requests or do they all succeed? All successes. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 4:52 PM, wrote: > On Fri, Jun 24, 2016 at 3:21 PM, Mark Thomas wrote: >> On 24/06/2016 20:01, therealnewo...@gmail.com wrote: >>> On Fri, Jun 24, 2016 at 2:07 PM, Mark Thomas wrote: On 24/06/2016 18:41, Nate Clark wrote: > On Fri, Jun 24, 2016 at 1:37 PM, Nate Clark wrote: >> On Fri, Jun 24, 2016 at 1:27 PM, Mark Thomas wrote: >>> On 24/06/2016 18:25, Mark Thomas wrote: Can you provide the settings you are using for the Executor as well please? >> >> > maxThreads="500" minSpareThreads="4"/> >> >>> >>> And how long do the initial 5,000,000 4k requests take to process? >>> >> >> 40 minutes. >> > Not sure this matters but I just double checked and there are actually > 400 threads in total doing the 4k PUTs. Two clients each doing 200 > threads. the 100MB test is 24 threads total 12 per client machine. > > Sorry for misinformation earlier. No problem. Thanks for the information. One last question (for now). How many processors / cores / threads does the server support? I'm trying to get a handle on what the concurrency looks like. >>> >>> The machine has two physical chips each with 6 cores and >>> hyper-threading enabled, so 24 cores exposed to the OS. >> >> Thanks. >> >> >> >>> If it matters the system also has 256GB of memory. >> >> I don't think RAM is playing a role here but it is still good to know. >> >> In terms of next steps, I want to see if I can come up with a theory >> that matches what you are observing. From that we can then assess >> whether the proposed patch can be improved. >> >> Apologies for the drip-feeding of questions. As I learn a bit more, a >> few more questions come to mind. >> >> I'm wondering if this is a problem that builds up over time. If I >> understood your previous posts correctly, running the big tests >> immediately gave ~700MB/s whereas running the small tests then the big >> tests resulting in ~350MB/s during the big tests. Are you able to >> experiment with this a little bit? For example, if you do big tests, 1M >> (~20%) small tests, big tests, 1M small tests, big tests etc. What is >> the data rate for the big tests after 0, 1M, 2M, 3M, 4M and 5M little tests. > > Sure I can try that. For the in between tests do you want me to run > those for a set amount of time or number of files? Like each smaller > batch like 20min and then 10min of large and then next smaller size? Ignore that question. I misinterpreted your 1m to be 1MB. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Fri, Jun 24, 2016 at 5:31 PM, Mark Thomas wrote: > On 24/06/2016 21:52, therealnewo...@gmail.com wrote: > >>> I'm wondering if this is a problem that builds up over time. If I >>> understood your previous posts correctly, running the big tests >>> immediately gave ~700MB/s whereas running the small tests then the big >>> tests resulting in ~350MB/s during the big tests. Are you able to >>> experiment with this a little bit? For example, if you do big tests, 1M >>> (~20%) small tests, big tests, 1M small tests, big tests etc. What is >>> the data rate for the big tests after 0, 1M, 2M, 3M, 4M and 5M little tests. >> >> Sure I can try that. For the in between tests do you want me to run >> those for a set amount of time or number of files? Like each smaller >> batch like 20min and then 10min of large and then next smaller size? > > I was thinking set number of files. > > I would also be useful to know how many threads the executor has created > at each point as well. (JMX should tell you that. You might need to > adjust the executor so it doesn't stop idle threads.). I saw your message about not stopping idle threads after I already started things. 1st 100M test: 851348MB/s Executor: largestPoolSize: 25 poolSize: 25 1st 4k test Executor: largestPoolSize: 401 poolSize: 401 2nd 100M test: 460147MB/s Executor: largestPoolSize: 401 poolSize: 79 2nd 4k test Executor: largestPoolSize: 414 poolSize: 414 3rd 100M test: 429127MB/s Executor: largestPoolSize: 414 poolSize: 80 3rd 4k test: Executor: largestPoolSize: 414 poolSize: 401 4th 100M test: 437918MB/s Executor: largestPoolSize: 414 poolSize: 86 4th 4k test: Executor: largestPoolSize: 414 poolSize: 401 5th 100M test: 464837MB/s Executor: largestPoolSize: 414 poolSize: 87 It looks like the problem occurs right after the first set of 4k puts and doesn't get any worse so what ever causes the issue happens early. This is getting stranger and I really can not explain why calling ERR_remove_thread_state reliably improves performance. > Going back to your original description, you said you saw blocking > during the call to ERR_clear_err(). Did you mean ERR_clear_error()? > Either way, could you provide the full stack trace of an example blocked > thread? And, ideally, the stack trace of the thread currently holding > the lock? I'm still trying to understand what is going on here since > based on my understanding of the code so far, the HashMap is bounded (to > the number of threads) and should reach that limit fairly quickly. Sorry, yes I did mean ERR_clear_error(). #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x7f0f61ab989d in __GI___pthread_mutex_lock (mutex=0x2b49c58) at ../nptl/pthread_mutex_lock.c:80 #2 0x7f0f3205f183 in int_thread_get (create=0) at err.c:446 #3 0x7f0f3205f68d in int_thread_get_item (d=0x7f0ca89c7ce0) at err.c:491 #4 0x7f0f32060094 in ERR_get_state () at err.c:1014 #5 0x7f0f320602cf in ERR_clear_error () at err.c:747 #6 0x7f0f325f3579 in ssl_socket_recv (sock=0x7f0dcc391980, buf=0x7f0eec067820 "lock->199808-Source_filename->rhino_perf_https_lt_100g_a-Loop->1-Count->11089487-11089488-11089489-11089490-11089491-11089492-11089493-11089494-11089495-11089496-11089497-11089498-11089499-11089500-11"..., len=0x7f0ca89c7ff0) at src/sslnetwork.c:401 #7 0x7f0f325ece99 in Java_org_apache_tomcat_jni_Socket_recvbb (e=, o=, sock=, offset=, len=) at src/network.c:957 I tried getting more data but the jvm tends to dump core when gdb is attached or is going to slow to actually cause the lock contention. I can reliably see a thread waiting on this lock if I attach to a single thread and randomly interrupt it and look at the back trace. When I look at the mutex it has a different owner each time so different threads are getting the lock. I will play with this a bit more on Monday. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Mon, Jun 27, 2016 at 11:54 AM, Rainer Jung wrote: > Hi Mark, > > > Am 27.06.2016 um 15:11 schrieb Mark Thomas: >> >> I believe I have an explanation for what is going on that fits both the >> reported behaviour and the proposed fix. >> >> Background >> == >> >> OpenSSL tracks a list of the most recent errors for each thread in a >> hash map keyed on the thread (int_thread_hash in err.c). Reading and >> writing to this hash map is protected by a lock. The hash map is created >> and populated lazily. >> >> tc-native calls ERR_clear_error() before every call to >> SSL_do_handshake(), SSL_read() and SSL_write(). The call to >> ERR_clear_error() either clears the error list for the current thread or >> inserts a new empty list into the hash map of the thread is not already >> present. >> >> The performance problem was tracked down to threads waiting in >> ERR_clear_error() to obtain the write lock for the hash map. >> >> The proposed solution was to call ERR_remove_thread_state() just before >> the current Tomcat thread processing the connection is returned to the >> thread pool. This method removes the current thread and its associated >> error list from the hash map. >> >> >> Analysis >> >> >> The proposed solution, calling ERR_remove_thread_state(), adds a call >> that also obtains the write lock for the hash map. This indicates that >> the problem is not delays in obtaining the lock but contention for the >> lock because one or more operations taking place within the lock are >> taking a long time. >> >> Removing unused threads from the hash map removes the bottleneck. This >> points towards the hash map being the source of the problem. >> >> Testing by the OP showed that as soon as a test had been ran that >> required ~ 400 concurrent threads performance dropped significantly. It >> did not get noticeably worse if the same 400 thread test was run >> repeatedly. >> >> My testing indicated, on OSX at least, that the thread IDs used in the >> hash map were stable and that uncontrolled growth of the hash map was >> unlikely to be the cause. >> >> The manner in which thread IDs are generated varies by platform. On >> Linux, where this problem was observed, the thread ID is derived from >> (is normally equal to) the memory address of the per thread errno >> variable. This means that thread IDs tend to be concentrated in a >> relatively narrow range of values. For example, in a simple 10 thread >> test on OSX thread IDs ranged from 123145344839680 to 123145354387455. >> >> Thread IDs therefore fall with a 10^7 range within a possible range of >> 1.8x10^19. i.e. a very small, contiguous range. >> >> Hash maps use hashing functions to ensure that entries are (roughly) >> evenly distributed between the available buckets. The hash function, >> err_state_hash, used for the thread IDs in OpenSSL is threadID * 13. >> >> Supposition >> === >> >> The hash function used (multiple by 13) is insufficient to distribute >> the resulting values across multiple buckets because they will still >> fall in a relatively narrow band. Therefore all the threads end up in a >> single bucket which makes the performance of the hash map poor. This in >> turn makes calls to thread_get_item() slow because it does a hash map >> lookup. This lookup is performed with the read lock held for the hash >> map which in turn will slow down the calls that require the write lock. >> >> Proposal >> >> >> The analysis and supposition above need to be checked by someone with a >> better understanding of C than me. Assuming my work is correct, the next >> step is to look at possible fixes. I do not believe that patching >> OpenSSL is a viable option. >> >> The OpenSSL API needs to be reviewed to see if there is a way to avoid >> the calls that require the write lock. >> >> If the write lock cannot be avoided then we need to see if there is a >> better place to call ERR_remove_thread_state(). I'd like to fix this >> entirely in tc-native but that may mean calling >> ERR_remove_thread_state() more frequently which could create its own >> performance problems. >> >> Nate - I may have some patches for you to test in the next few days. >> >> Mark > > > Great analysis. I was really wondering, what could make the hash map so huge > and hadn't thought about the hash function as the problem. > > Before OpenSSL 1.1.0 there's a callback for applications to provide their > own thread IDs: > > https://www.openssl.org/docs/man1.0.2/crypto/CRYPTO_THREADID_set_callback.html > > So we could probably work around the problem of the poor hashing function by > passing in IDs that work for hashing (pre-hashed ID?). Of course then we > loose the direct association of the OpenSSL thread ID with the real platform > thread id. > > Currently our callback in tcnative is ssl_set_thread_id() which refers to > ssl_thread_id(), which on Linux gets the ID from the APR function > apr_os_thread_current(). So we could add some hashing formula in > ssl_th
Re: Bug that spans tomcat and tomcat-native
On Mon, Jun 27, 2016 at 12:04 PM, wrote: > On Mon, Jun 27, 2016 at 11:54 AM, Rainer Jung wrote: >> Great analysis. I was really wondering, what could make the hash map so huge >> and hadn't thought about the hash function as the problem. >> >> Before OpenSSL 1.1.0 there's a callback for applications to provide their >> own thread IDs: >> >> https://www.openssl.org/docs/man1.0.2/crypto/CRYPTO_THREADID_set_callback.html >> >> So we could probably work around the problem of the poor hashing function by >> passing in IDs that work for hashing (pre-hashed ID?). Of course then we >> loose the direct association of the OpenSSL thread ID with the real platform >> thread id. >> >> Currently our callback in tcnative is ssl_set_thread_id() which refers to >> ssl_thread_id(), which on Linux gets the ID from the APR function >> apr_os_thread_current(). So we could add some hashing formula in >> ssl_thread_id(). > > I think just using the real thread id would work, since now it isn't > using the real thread id instead it is using the address location of > errno. If this was the real thread id I think the hash algorithm and > bucket selection they have now will work much better since the thread > ids are basically numerically increasing and aren't aligned to powers > of 2. Sorry, I need to read code a bit closer. I misread the OPENSSL_VERSION_NUMBER < 0x1010L as version < 1.0.1 and not 1.1.0. I would still expect real thread ids to provide enough of a distribution in a map where the hash bucket ends up being is ((ID * 13) & 0X7F). -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Bug that spans tomcat and tomcat-native
On Tue, Jun 28, 2016 at 11:51 AM, Rainer Jung wrote: > Am 28.06.2016 um 16:07 schrieb Mark Thomas: >> >> On 28/06/2016 12:28, Mark Thomas wrote: >>> >>> On 28/06/2016 11:34, Rainer Jung wrote: Am 28.06.2016 um 11:15 schrieb Mark Thomas: >>> >>> >>> >>> > Index: src/ssl.c > === > --- src/ssl.c(revision 1750259) > +++ src/ssl.c(working copy) > @@ -420,6 +420,10 @@ > return psaptr->PSATOLD; > #elif defined(WIN32) > return (unsigned long)GetCurrentThreadId(); > +#elif defined(DARWIN) > +uint64_t tid; > +pthread_threadid_np(NULL, &tid); > +return (unsigned long)tid; > #else > return (unsigned long)(apr_os_thread_current()); > #endif > > > I want to do some similar testing for Linux before adding what I > suspect > will be a very similar block using gettid(). We could either add something to configure.in. Untested: Index: native/configure.in === --- native/configure.in (revision 1750462) +++ native/configure.in (working copy) @@ -218,6 +218,9 @@ *-solaris2*) APR_ADDTO(TCNATIVE_LIBS, -lkstat) ;; +*linux*) +APR_ADDTO(CFLAGS, -DTCNATIVE_LINUX) +;; *) ;; esac and then use a #ifdef TCNATIVE_LINUX or we copy some other more direct linux check from e.g. APR: #ifdef __linux__ The latter looks simpler, but the version above is based on all the logic put into config.guess. >>> >>> >>> I'd go with the __linux__ option as that is consistent with what we >>> already use in os/unix/system.c >>> >>> I'm not against the change to configure.in, I just think we should be >>> consistent with how we do this throughout the code base. >> >> >> I've confirmed that the same problem occurs with hash bucket selection >> on linux and that switching to gettid() fixes that problem. >> >> I'm going to go ahead with the 1.2.8 release shortly. We can continue to >> refine this as necessary and have a more complete fix in 1.2.9. > > > I did a quick check on Solaris. apr_os_thread_current() uses pthread_self on > Solaris like on Linux (actually on any Unix type OS), but unlike Linux where > this returns a address which is either 32 or 64 bit aligned depending on > address size, on Solaris you get an increasing number starting with 1 for > the first thread and incremented by one for each following thread. Thread > IDs do not get reused in the same process, even if the thread finished, but > thread IDs are common between different processes, because they always start > with 1. So Solaris should be fine as-is. Does the value have a cap? If not then Solaris will just continue to use more and more memory as threads are created over the lifetime of the server. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: svn commit: r1781952 - in /tomcat/native/trunk/native: include/ssl_private.h src/ssl.c
Mark, If there is anything I can do to help work on the patch I will, however as I mentioned in the bug I don't have a windows environment so I am basically useless if that is where the issues exist. -nate On Mon, Feb 6, 2017 at 4:03 PM, Mark Thomas wrote: > On 06/02/17 21:01, ma...@apache.org wrote: >> >> Author: markt >> Date: Mon Feb 6 21:01:09 2017 >> New Revision: 1781952 >> >> URL: http://svn.apache.org/viewvc?rev=1781952&view=rev >> Log: >> Follow-up to r1781943 >> Fix build errors on Windows >> Confirmed that terminated threads are removed from the hash > > > The patch isn't quite there yet. It triggers a JVM crash on shutdown that > I'm currently looking at. > > Mark > > > - > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: svn commit: r1781952 - in /tomcat/native/trunk/native: include/ssl_private.h src/ssl.c
On Mon, Feb 6, 2017 at 6:08 PM, Mark Thomas wrote: > On 06/02/17 22:55, Mark Thomas wrote: >> >> On 06/02/17 21:20, therealnewo...@gmail.com wrote: >>> >>> Mark, >>> >>> If there is anything I can do to help work on the patch I will, >>> however as I mentioned in the bug I don't have a windows environment >>> so I am basically useless if that is where the issues exist. >> >> >> Yes, this is Windows. >> >> The problem with the original approach was that DLL_THREAD_DETACH was >> being called for all threads - including JVM threads stopping after the >> native library had effectively closed down. Hence the crash. >> >> I'm currently trying to use the same approach as used for Linux but I'm >> not seeing the thread local being destroyed when the associated thread >> exits. I'm still debugging why. >> >> Any hints, suggestions etc. welcome. > > > Looking at the APR docs and source, the destructor function is only called > when apr_threadkey_private_delete is called and I don't see that being > called anywhere. How is this working on Linux? I suspect it isn't but I > haven't set up a build env to confirm that at this point. > It should work on linux because pthreads guarantees that the destructor will be called on thread exit if the value is not NULL. In fact if you call pthread_key_delete the destructor is explicitly not called and it is up to the caller to handle any clean up. I used openssl's approach which obviously does not use apr but uses pthreads directly and did a rough mapping on apr's approach to thread locals. WIndows does not have the concept of a destructor for their normal thread locals so that is why openssl used the thread detatch mechanism and I did too. I do know that if you use windows fibers instead of threads there is destructor but I didn't think that was an option with how tomcat native was being used but I am not an expert. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: svn commit: r1781952 - in /tomcat/native/trunk/native: include/ssl_private.h src/ssl.c
On Mon, Feb 6, 2017 at 6:08 PM, Mark Thomas wrote: > On 06/02/17 22:55, Mark Thomas wrote: >> >> On 06/02/17 21:20, therealnewo...@gmail.com wrote: >>> >>> Mark, >>> >>> If there is anything I can do to help work on the patch I will, >>> however as I mentioned in the bug I don't have a windows environment >>> so I am basically useless if that is where the issues exist. >> >> >> Yes, this is Windows. >> >> The problem with the original approach was that DLL_THREAD_DETACH was >> being called for all threads - including JVM threads stopping after the >> native library had effectively closed down. Hence the crash. >> >> I'm currently trying to use the same approach as used for Linux but I'm >> not seeing the thread local being destroyed when the associated thread >> exits. I'm still debugging why. >> >> Any hints, suggestions etc. welcome. > > > Looking at the APR docs and source, the destructor function is only called > when apr_threadkey_private_delete is called and I don't see that being > called anywhere. How is this working on Linux? I suspect it isn't but I > haven't set up a build env to confirm that at this point. > Looking at the apr source threadproc/win32/threadpriv.c the destructor is actually thrown away on windows, so that approach will not work on windows. I think what you are looking at is the TlsAlloc in create and the TlsFree in delete, which I don't think relates to the value stored but the key structure itself. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: [VOTE] Release Apache Tomcat Native 1.2.11
On Tue, Feb 7, 2017 at 3:30 PM, Mark Thomas wrote: > Version 1.2.10 includes the following change: > > - Update minimum recommended OpenSSL version to 1.0.2k > - Windows binaries built with OpenSSL 1.0.2k > - Better documentation for building on Windows > (including with FIPS enabled OpenSSL) > > The proposed release artefacts can be found at [1], > and the build was done using tag [2]. > > The Apache Tomcat Native 1.2.11 is > [X] Stable, go ahead and release > [ ] Broken because of ... > > Thanks, > > Mark > Tested on Linux and works well for me. -nate > > [1] > https://dist.apache.org/repos/dist/dev/tomcat/tomcat-connectors/native/1.2.11/ > [2] https://svn.apache.org/repos/asf/tomcat/native/tags/TOMCAT_NATIVE_1_2_11 > > - > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: [VOTE] Release Apache Tomcat Native 1.2.11
On Wed, Feb 8, 2017 at 6:17 PM, Mark Thomas wrote: > On 07/02/17 20:30, Mark Thomas wrote: >> >> Version 1.2.10 includes the following change: >> >> - Update minimum recommended OpenSSL version to 1.0.2k >> - Windows binaries built with OpenSSL 1.0.2k >> - Better documentation for building on Windows >> (including with FIPS enabled OpenSSL) >> >> The proposed release artefacts can be found at [1], >> and the build was done using tag [2]. >> >> The Apache Tomcat Native 1.2.11 is >> [ ] Stable, go ahead and release >> [X] Broken because of ... > > > I'm seeing intermittent crashes in the in the unit tests on Windows. > > As far as I can tell it is caused by the following: > - test 1 ends > - test 1 shuts down APR > - not all test 1 threads complete > - test 2 starts > - remaining test 1 threads complete > - remaining test 1 threads try to clean up thread-local memory > - crash as this memory was cleaned-up when APR for test 1 was shut down > > Liberal use of #if to remove all references to APR thread locals for the > Windows code and calling the OpenSSL clean-up directly seems to fix it. > > I am therefore cancelling this release vote. > > I should have a new RC ready in ~ 12 hours. > > Sorry for not spotting the problem sooner. > I think the issue might have been in my original patch independent of OS. Looking at it again I think the initialization of the thread local should have been wrapped in something like apr_thread_once. However, I am not exactly sure how to correctly use the apr thread once api. It looks like you must first initialize the control using apr_thread_once_init however that does not have any safety preventing it from being called multiple times so I am not sure where that could be safely invoked, so that it is called once during the life of the process. On Linux I think it is not causing a major problem because the pthread thread local is just being initialized to a new slot so it is using extra thread local slots but doesn't cause crashes. Sorry, about that over site. -nate - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org