On 12/09/2010 12:43, Arthur de Jong wrote: > On Sun, 2010-09-12 at 08:46 +0200, Vincent Danjean wrote: >> I've a ldap client that is not my dns server and that get its IP (and >> gateway and DNS server) with DHCP. When nslcd is started and a first >> request to nslcd is done before /etc/resolv.conf is correctly filled, >> then this request fails (normal) but also any future requests done >> (even after /etc/resolv.conf is correct). > > First of all, it is recommended to use and IP address for your LDAP > server or at least something that can be locally resolved. Otherwise, if > your DNS server is unavailable your LDAP server will also be > unavailable.
It is a workaround and perhaps a better configuration. But, as all my servers are VM on one host, it is very infrequent that one works but not the others... >> Step to reproduce on my system: >> ifdown eth0 ; sleep 2 ; (sleep 10 ; ifup eth0 ) & (sleep 5 ; id vdanjean ) & >> nslcd -d >> >> In this case, command "id vdanjean" gives: >> aya:~# id vdanjean >> id: vdanjean : utilisateur inexistant >> aya:~# id vdanjean >> uid=2001 gid=2001(vdanjean) >> groupes=4294967295,4(adm),20(dialout),24(cdrom),25(floppy),29(audio),44(video),46(plugdev),100(users),122(kvm),116(libvirt),125(freevo),10000(photos),3000 >> aya:~# >> [when I got the correct answer, it is after several seconds] > > When nslcd find that the LDAP server is unavailable it first does a > number of retries (once every second). If nslcd has determined that the > LDAP server is unavailable for 10 seconds it will only retry once every > 10 seconds. This mechanism is in place to avoid getting the whole system > locked up while retrying connections to the LDAP server. You miss my point: I got the two kind of answers serveral times, in any order. It is not because I got the answer one time that the next one will be ok (and even when I got the answer, it is *very* long to get (several seconds)). And I suspect (not tested) that if all nslcd threads try to answer when resolv.conf is wrong, then I would never got any answer once resolv.conf is good. >> Getting the good answer or the bad one depends on which thread/process >> (I do not know precisely how nslcd works) handles the request. If this >> is a thread launch before /etc/resolv.conf is correct, I got in the >> log: > > The availability of the LDAP server is shared between the threads but > each thread (there are 5 by default) has their own connection. > >> ========= >> nslcd: [1bd7b7] DEBUG: ldap_initialize(ldap://ldap.danjean.fr/) > [...] >> nslcd: [1bd7b7] DEBUG: ldap_simple_bind_s(NULL,NULL) >> (uri="ldap://ldap.danjean.fr/") >> nslcd: [1bd7b7] failed to bind to LDAP server ldap://ldap.danjean.fr/: Can't >> contact LDAP server: No such file or directory >> nslcd: [1bd7b7] no available LDAP server found >> ========= >> This is repeated several times. > > The "No such file or directory" part is a bit weird. I only reproduce > this if there is no /etc/resolv.conf at all. I have one but it is mainly empty (just a line with "start", no nameserver) when nslcd start. > You should also first get a > couple of lines saying "no available LDAP server found, sleeping 1 > seconds". Yes, I got these lines too. >> When I got an answer, I have the same kind of log, but I also have other >> threadsloging successful ldap requests such as: >> ========== >> nslcd: [8c895d] DEBUG: connection from pid=7998 uid=0 gid=0 >> nslcd: [8c895d] DEBUG: nslcd_group_bygid(2001) >> nslcd: [8c895d] DEBUG: myldap_search(base="dc=danjean,dc=fr", >> filter="(&(objectClass=posixGroup)(gidNumber=2001))") >> nslcd: [8c895d] DEBUG: ldap_result(): end of results >> ========== > > This happens if there is already a working connection. When using a > hostname instead of an IP address you are also dependant on what nscd > returns (if you're using that). It may be that nscd also caches negative > host name lookups. > >> My guess is that, when a thread fails to resolve a name with the DNS >> due to a bad /etc/resolv.conf file, something is cached and latter >> ldap_simple_bind_s still fail. > > This is more or less correct. If nslcd has determined that the LDAP > server is unavailable for more than 10 seconds it will "cache" that > state for 10 seconds. > >> The correct fix for this bug would be to find where the info is cached >> and discard it in case of a failed connection. > > The whole point of having that information cached is to not have the > whole system hang if the LDAP server becomes unavailable. You could > increase the reconnect_retrytime option in nslcd.conf if you think that > the period should be longer. The problem is that there is *no* timeout for bad DNS resolution: I got "no available LDAP server found, sleeping 1 seconds" during about 10 hours (ie between my restart of the host that trigger the problem to my wakeup next week when I tried to logging and look at the logs sent by logcheck: cron is regularly using nslcd). If the retry to the LDAP would really retry to resolv the hostname with the current resolv.conf, it would be ok (perhaps a few windows where it does not work, but a few minutes after, all should be ok). The problem is that it never happens (at least during at least nearly 10 hours). >> However, this is perhaps to intrusive for squeeze. For squeeze, you >> should be able to, at least, put a script >> in /etc/resolvconf/update-libc.d to restart nslcd when dns changes. > > Do you think /etc/resolvconf/update-libc.d is the best place? This is the place where scripts are run when resolv.conf changes. > What > about /etc/network/if-up.d? That should also catch the case where the > network goes up. Then again the init script should have started after > hostname lookups are available ($remote_fs which implies working > networking and $named which implies working hostname lookups). My dns server is slow at boot (in fact, the VM with the DNS server is generally not yet started when this host try to acquire an IP). But the strange config shows that nslcd does not handle correctly a resolv.conf change. If I recall some discution on debian-devel, I saw some config where ldap is used on laptop with a cache to be able to resolve request when the network is not there. This is also a case where not taking into account new resolv.conf would lead to problems. About /etc/network/if-up.d, I think that nslcd already handle correctly network problems (ie it retries a few second later and has timeout for cached answers). Adding scripts here can improve the time needed to get a full working system (it "flush" all timeouts) but I do not think it is really needed. >> Something as simple as: >> if [ -x /etc/init.d/nslcd ]; then >> /etc/init.d/nslcd restart >> fi > > I'm not sure it's that simple. It works for me ;-) > First, you also need to check if nslcd is running in the first place. /etc/init.d/nslcd has a "status" command, so it should be easy. > Also when you are shutting down > update-libc.d is likely also run (not sure about this though). I'm not > sure you want to restart nslcd then. Why not ? If you restart it only if it is working (using the "status" command of your script for example), then you will have a nslcd that will always use the current resolv.conf to resolve hostnames. > In short, I don't see what good solution (short term or otherwise) is > available for this problem. There is an idea that the behaviour of nslcd > should be different during booting (e.g. immediately start testing > whether the LDAP server is available and only become available for the > NSS module when the LDAP server is determined to be up) but it has not > been implemented at this point. > > Anyway, thanks for the bugreport. I'll see what I can do about this but > I'm not sure if I can get this into squeeze. Just to be sure, my point is that there is *no* timeout when a thread of nslcd use a old resolv.conf. When a new resolv.conf appairs, nslcd will continue to resolve hostnames as if resolv.conf were the previous one for ever. If there was a timeout (few second or few minutes) as there is for broken network connections, then I would not complain. Regards, Vincent -- Vincent Danjean GPG key ID 0x9D025E87 vdanj...@debian.org GPG key fingerprint: FC95 08A6 854D DB48 4B9A 8A94 0BF7 7867 9D02 5E87 Unofficial packages: http://moais.imag.fr/membres/vincent.danjean/deb.html APT repo: deb http://people.debian.org/~vdanjean/debian unstable main -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org