Launchpad has imported 19 comments from the remote bug at https://bugs.openldap.org/show_bug.cgi?id=8650.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2017-05-06T22:32:26+00:00 Ryan Tandy wrote: Full_Name: Ryan Tandy Version: RE24 OS: Debian URL: Submission from: (NULL) (24.68.41.160) Submitted by: ryan https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861838 That bug's submitter seems to have unintentionally configured their slapd with the entire list of system CAs. They're fixing it, but we have a bug here too. When the ServerHello is larger than 16kb, gnutls_handshake can return GNUTLS_E_AGAIN. In theory this was always possible, but I'm only seeing it happen with gnutls 3.x and haven't the exact change responsible. We need to loop gnutls_handshake until it completes, like we do already in the re-handshake case. Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/0 ------------------------------------------------------------------------ On 2017-05-06T22:52:25+00:00 Ryan Tandy wrote: changed notes changed state Open to Test moved from Incoming to Software Bugs Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/1 ------------------------------------------------------------------------ On 2017-05-06T22:58:54+00:00 Ryan Tandy wrote: Committed the fix, and pinged the submitter to test it. Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/2 ------------------------------------------------------------------------ On 2018-02-09T17:22:50+00:00 Quanah-x wrote: changed notes changed state Test to Release Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/3 ------------------------------------------------------------------------ On 2018-03-22T19:25:42+00:00 Quanah-x wrote: changed notes changed state Release to Closed Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/4 ------------------------------------------------------------------------ On 2018-08-03T15:19:06+00:00 Kartik Subbarao wrote: Hi Ryan, I'm running into a problem with slapd 2.4.46 hanging on Ubuntu 18.04, which seems to be a side effect of the ITS#8650 patch: https://github.com/openldap/openldap/commit/7b5181da8cdd47a13041f9ee36fa9590a0fa6e48 slapd will run fine for a while, but during some periods of high-traffic, it'll hang. It'll peg the CPU at 100% and won't respond to any new LDAP connections. After some time, it'll resume working again, but overall it's fairly unreliable. strace on slapd during the hang shows that it's constantly making read() calls that return EAGAIN. After doing a gdb stack trace on slapd, I realized that these read() calls are happening as part of the busywait for loop in tlsg_session_accept() that repeatedly calls gnutls_handshake() when it gets EAGAIN. When slapd recovers from this hang state, the first message it prints is a TLS negotiation failure error on the culprit file descriptor. If I back out the ITS#8650 patch, the problem goes away. If I insert sleep(1) in the for loop, slapd no longer pegs the CPU at 100%, but it still becomes unresponsive during these high-traffic periods. I don't know what's happening during these high-traffic periods that causes the TLS negotiation to go astray. Unfortunately it's not easy to reproduce this problem outside of this production environment, given the diversity of clients running different OSes with various versions of SSL libraries. I'm wondering if there is a better way to handle EAGAIN returned from gnutls_handshake(), instead of doing a busywait as in ITS#8650, or my simplistic attempt at inserting a sleep() call which doesn't really seem to help. I'm wondering how the GnuTLS developers intend for people to use gnutls_handshake() properly, so as to gracefully handle sessions that involve long packets on the one hand, without opening up a vulnerability to chew up lots of system resources on the other hand. Regards, -Kartik Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/5 ------------------------------------------------------------------------ On 2018-08-03T16:09:47+00:00 Ryan Tandy wrote: Hi Kartik, On Fri, Aug 03, 2018 at 11:19:06AM -0400, Kartik Subbarao wrote: >I'm running into a problem with slapd 2.4.46 hanging on Ubuntu 18.04, >which seems to be a side effect of the ITS#8650 patch: > >https://github.com/openldap/openldap/commit/7b5181da8cdd47a13041f9ee36fa9590a0fa6e48 > >slapd will run fine for a while, but during some periods of >high-traffic, it'll hang. It'll peg the CPU at 100% and won't respond >to any new LDAP connections. After some time, it'll resume working >again, but overall it's fairly unreliable. Thanks for letting me know about this. This patch is running on quite a few systems by now, I'm sorry the problem wasn't caught sooner. :/ >I'm wondering if there is a better way to handle EAGAIN returned from >gnutls_handshake(), instead of doing a busywait as in ITS#8650, or my >simplistic attempt at inserting a sleep() call which doesn't really >seem to help. I'm wondering how the GnuTLS developers intend for >people to use gnutls_handshake() properly, so as to gracefully handle >sessions that involve long packets on the one hand, without opening up >a vulnerability to chew up lots of system resources on the other hand. Right. I mean, this is how GnuTLS' own example shows to do it: https://gitlab.com/gnutls/gnutls/blob/master/doc/examples/ex-client- dtls.c#L73-77 We could place a limit on the number of iterations, though any such limit would have to be arbitrary. There might be an asynchronous GnuTLS API that could be used to avoid tying up slapd while this is going on. I will look at how some other GnuTLS servers deal with this... Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/6 ------------------------------------------------------------------------ On 2018-08-03T18:35:02+00:00 Kartik Subbarao wrote: On 08/03/2018 12:09 PM, Ryan Tandy wrote: > Thanks for letting me know about this. This patch is running on quite > a few systems by now, I'm sorry the problem wasn't caught sooner. :/ No worries, thanks for responding so quickly on this! >> I'm wondering if there is a better way to handle EAGAIN returned from >> gnutls_handshake(), instead of doing a busywait as in ITS#8650, or my >> simplistic attempt at inserting a sleep() call which doesn't really >> seem to help. I'm wondering how the GnuTLS developers intend for >> people to use gnutls_handshake() properly, so as to gracefully handle >> sessions that involve long packets on the one hand, without opening >> up a vulnerability to chew up lots of system resources on the other >> hand. > > Right. I mean, this is how GnuTLS' own example shows to do it: > > https://gitlab.com/gnutls/gnutls/blob/master/doc/examples/ex-client-dtls.c#L73-77 > > Hmm, that's a head-scratcher. It doesn't seem very effective to have a non-blocking I/O interface and then recommend wrapping it in a busywait loop :-) > We could place a limit on the number of iterations, though any such > limit would have to be arbitrary. > > There might be an asynchronous GnuTLS API that could be used to avoid > tying up slapd while this is going on. > > I will look at how some other GnuTLS servers deal with this... Cool, thanks Ryan. Regards, -Kartik Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/7 ------------------------------------------------------------------------ On 2018-08-26T19:07:20+00:00 Ryan Tandy wrote: Hi Karthik, Sorry about the lack of updates on this one. It looks clear that my patch for this ITS was wrong and needs to be reverted. Looking again at the original issue, after reverting the patch, I've found that the behaviour varies with GnuTLS version. I need to figure out why this is, which probably means spending some time bisecting GnuTLS changes. If I create a patch to log some additional debug info about the GnuTLS setup, would you be willing to run it in your environment? For example I'm curious whether EAGAIN is being returned from the read side or write side (guessing read, but would be nice to confirm). thanks Ryan Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/8 ------------------------------------------------------------------------ On 2018-08-27T17:22:12+00:00 Kartik Subbarao wrote: On 08/26/2018 03:07 PM, Ryan Tandy wrote: > If I create a patch to log some additional debug info about the GnuTLS > setup, would you be willing to run it in your environment? For example > I'm curious whether EAGAIN is being returned from the read side or > write side (guessing read, but would be nice to confirm). Sure, send me the patch. If the patch just does some additional GnuTLS logging, then I don't see a problem with running it in the production environment and sending you the results. Regards, -Kartik Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/9 ------------------------------------------------------------------------ On 2018-09-06T15:49:51+00:00 O-bugs wrote: fixed in master fixed in RE24 (2.4.46) Needs to be fixed further Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/10 ------------------------------------------------------------------------ On 2018-09-06T15:49:51+00:00 Quanah-x wrote: changed notes changed state Closed to Open Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/11 ------------------------------------------------------------------------ On 2018-09-19T05:55:50+00:00 Ryan Tandy wrote: Made some good progress on this one this evening. The original issue this ITS is about is that gnutls_handshake() can, in some versions of GnuTLS, return GNUTLS_E_AGAIN even when the socket is blocking. Specifically, this happens in the case I described with a large CA list sent by the server. For slapd, the patch I committed is unfortunately completely wrong. It has been using non-blocking sockets forever, EAGAIN is expected and handled robustly -- or it was, until I introduced the busy-loop. For clients I'm still working on figuring out the right path forward. There is some EAGAIN handling conditional on LDAP_USE_NON_BLOCKING_TLS which itself is behind LDAP_DEVEL. However this code is meant for non-blocking sockets, and in my case it ends up stuck in poll() waiting for a notification that never arrives. In 2.4, ret == 1 simply falls into the success case and proceeds to send data without completing the handshake first. It's possible that what I actually want here is a (ret > 0) case in ldap_int_tls_start for when LDAP_USE_NON_BLOCKING_TLS is absent and ldap_int_tls_connect returns 1. (I'd also need to adapt the non-blocking path to be able to handle a blocking socket as well.) But it's also possible that gnutls_handshake() returning GNUTLS_E_AGAIN with a blocking socket is simply a GnuTLS bug that was introduced at some point. I still need to determine exactly when and why its behaviour changed. (It is still happening with 3.5.19.) In any case, my patch has to be reverted, as its impact (making slapd busy-loop) is obviously worse than the status quo (misbehaving clients in a specific case). I have pushed that revert now and will continue digging as time permits. Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/12 ------------------------------------------------------------------------ On 2018-10-05T05:33:21+00:00 Ryan Tandy wrote: On Tue, Sep 18, 2018 at 10:55:50PM -0700, Ryan Tandy wrote: >There is some EAGAIN handling conditional on LDAP_USE_NON_BLOCKING_TLS >which itself is behind LDAP_DEVEL. However this code is meant for >non-blocking sockets, and in my case it ends up stuck in poll() >waiting for a notification that never arrives. This turned out to be because the fd and timeout fields are only initialized when timeout is configured and the non-blocking behaviour was triggered because of it. Otherwise the code simply doesn't anticipate EAGAIN could be returned and the behaviour is more or less undefined; it ends up calling poll() with fd = -1 and a garbage timeout. >It's possible that what I actually want here is a (ret > 0) case in >ldap_int_tls_start for when LDAP_USE_NON_BLOCKING_TLS is absent and >ldap_int_tls_connect returns 1. (I'd also need to adapt the >non-blocking path to be able to handle a blocking socket as well.) More precisely, what I actually want is a (ret > 0) case that is used unless both USE_NON_BLOCKING_TLS is true and a timeout is configured. If we added to ber_sockbuf_ctrl() the ability to query whether the socket is non-blocking, for GnuTLS at least we could bypass poll() and go straight back into ldap_int_tls_connect(). However I don't see a lot of benefit to this as long as calling poll() on a blocking socket has no downside. Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/13 ------------------------------------------------------------------------ On 2020-04-12T18:58:42+00:00 Ryan Tandy wrote: *** Issue 9210 has been marked as a duplicate of this issue. *** Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/14 ------------------------------------------------------------------------ On 2020-04-12T19:04:47+00:00 Ryan Tandy wrote: The other way we can get a non-blocking socket is if the client set one up itself and gave it to us via ldap_init_fd(). sssd does this, or used to: bug 9210. Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/15 ------------------------------------------------------------------------ On 2020-04-12T20:46:44+00:00 Ryan Tandy wrote: *** Issue 9210 has been marked as a duplicate of this issue. *** Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/16 ------------------------------------------------------------------------ On 2020-04-12T21:02:40+00:00 Ryan Tandy wrote: Created attachment 708 test program with non-blocking socket Here's a test program that exercises the scenario with a non-blocking socket, similar to the case described in bug 9210. Currently it fails on 2.4 with LDAP_SERVER_DOWN and on 2.5 with LDAP_TIMEOUT, but succeeds if you comment out the fcntl(). Any patch needs to correct that as well as the scenario described here with a blocking socket. Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/17 ------------------------------------------------------------------------ On 2020-04-12T23:50:10+00:00 Howard Chu wrote: Commits: • 735e1ab by Howard Chu at 2020-04-12T22:18:51+00:00 ITS#8650 loop on incomplete TLS handshake Reply at: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1921562/comments/18 ** Changed in: openldap Status: Unknown => Fix Released ** Changed in: openldap Importance: Unknown => Medium ** Bug watch added: Debian Bug tracker #861838 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861838 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921562 Title: Intermittent hangs during ldap_search_ext when TLS enabled To manage notifications about this bug go to: https://bugs.launchpad.net/openldap/+bug/1921562/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs