On Sun, 2010-09-12 at 22:30 +0200, Vincent Danjean wrote: > On 12/09/2010 12:43, Arthur de Jong wrote: > > First of all, it is recommended to use and IP address for your LDAP > > server or at least something that can be locally resolved. > > Otherwise, if your DNS server is unavailable your LDAP server will > > also be unavailable. > > It is a workaround and perhaps a better configuration. But, as all my > servers are VM on one host, it is very infrequent that one works but > not the others...
Another workaround is putting the LDAP server in /etc/hosts. It is slightly nicer and it also speeds things up if DNS is not available because the OpenLDAP library does address to hostname lookups in some cases. > You miss my point: I got the two kind of answers serveral times, in > any order. It is not because I got the answer one time that the next > one will be ok (and even when I got the answer, it is *very* long to > get (several seconds)). And I suspect (not tested) that if all nslcd > threads try to answer when resolv.conf is wrong, then I would never > got any answer once resolv.conf is good. I think I understand now, you get failures mixed with success results right after nsswitch.conf is valid. Also, when nsswitch.conf is correct one in about 5 requests keeps returning failures. I can reproduce this in my test environment which is good. The bad part is that I haven't narrowed it down yet. I suspect it is either a bug in OpenLDAP that is caching some hostname information somewhere but I think it is more likely a bug in glibc that doesn't correctly reload /etc/resolv.conf in threaded applications. From what I gathered with strace it seems that some tests are in place to reload /etc/resolv.conf if the file has changed (at leasts a stat() is done). The first time the file is loaded twice and also twice if the file was modified but I don't think it is reloaded by the thread that originally failed if it was reloaded again by another thread. My guess is that the cached contents of /etc/resolv.conf is thread local but the timestamp that is used for the stat() is global. Anyway, I will try to make a test application to reproduce this and file a bugreport in glibc if the above is confirmed. > The problem is that there is *no* timeout for bad DNS resolution: I got > "no available LDAP server found, sleeping 1 seconds" during about > 10 hours (ie between my restart of the host that trigger the problem > to my wakeup next week when I tried to logging and look at the logs > sent by logcheck: cron is regularly using nslcd). The problem is that if a bad thread is triggered the retry-mechanism will kick in and prevent other threads from even trying a request. If more than a couple of requests came in with the bad /etc/resolv.conf this would eventually result in most threads being dysfunctional. Anyway, thanks for the clarification. -- -- arthur - adej...@debian.org - http://people.debian.org/~adejong --
signature.asc
Description: This is a digitally signed message part