Launchpad has imported 29 comments from the remote bug at http://bugs.ntp.org/show_bug.cgi?id=2831.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2015-05-14T08:49:08+00:00 H-murray wrote: It's not solid, but I've seen three of these so far. It crashes ballpark of 1 in 5 tries. FreeBSD 10.1-RELEASE amd64 I haven't seen any troubles like this before 4.3.33 It crashes before it writes anything to the post-switching log file. May 14 01:32:20 ted3 ntpd[79529]: switching logging to file /var/log/ntp/ntpd.lo g May 14 01:32:20 ted3 kernel: pid 79529 (ntpd), uid 0: exited on signal 11 (core dumped) Core was generated by `ntpd'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libgcc_s.so.1...done. Loaded symbols for /lib/libgcc_s.so.1 Reading symbols from /lib/libmd.so.6...done. Loaded symbols for /lib/libmd.so.6 Reading symbols from /lib/libm.so.5...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /lib/libthr.so.3...done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x000000080119db43 in sbrk () from /lib/libc.so.7 [New Thread 801c06c00 (LWP 100287/ntpd)] [New Thread 801c06400 (LWP 100169/ntpd)] (gdb) bt #0 0x000000080119db43 in sbrk () from /lib/libc.so.7 #1 0x0000000801199aaf in sbrk () from /lib/libc.so.7 #2 0x0000000801184593 in syscall () from /lib/libc.so.7 #3 0x00000008011a5283 in realloc () from /lib/libc.so.7 #4 0x0000000000437285 in ereallocz (ptr=0x80180a140, newsz=32, priorsz=0, zero_init=1) at ../../libntp/emalloc.c:43 #5 0x00000000004399c7 in get_worker_context (c=0x801c42100, idx=0) at ../../libntp/ntp_intres.c:982 #6 0x0000000000439665 in blocking_getaddrinfo (c=0x801c42100, req=0x801c1b0c0) at ../../libntp/ntp_intres.c:327 #7 0x000000000043a5d0 in blocking_child_common (c=<value optimized out>) at ../../libntp/ntp_worker.c:288 #8 0x000000000043c619 in blocking_thread (ThreadArg=0x80180a140) at ../../libntp/work_thread.c:663 #9 0x0000000800ed74f5 in pthread_create () from /lib/libthr.so.3 #10 0x0000000000000000 in ?? () (gdb) Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/0 ------------------------------------------------------------------------ On 2015-05-14T09:32:52+00:00 Stenn wrote: Hal, This should also duplicate using -stable, I hope... Anyway, if you could nose around in the stack frames to try and hone in on this that would be great. You might need to compile without optimization, not sure. I haven't seen this on my freebsd boxes... Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/1 ------------------------------------------------------------------------ On 2015-05-16T10:06:03+00:00 H-murray wrote: I'm up to 6 core dumps now. All identical. I've poked around. It doesn't fail in gdb. (or maybe I just haven't figured out how to make it fail) I don't have any good ideas. It could be: a bug in ntpd that just happens to get triggered in this case a bug in the hardware a bug in the OS a bug in the tool chain an operator error I recompiled things. It gets the same error and objdump of both versions is identical. Here is something fishy: #4 0x0000000000437285 in ereallocz (ptr=0x80180a140, newsz=32, priorsz=0, zero_init=1) at ../../libntp/emalloc.c:43 get_worker_context is growing the array of pointers to worker contexts. I think it's growing it from empty. If so, ptr should be NULL. The version in memory is NULL. That address comes from several layers back the call stack: #8 0x000000000043c619 in blocking_thread (ThreadArg=0x80180a140) at ../../libntp/work_thread.c:663 I'll look carefully at the compiled code after some sleep. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/2 ------------------------------------------------------------------------ On 2015-06-11T04:07:41+00:00 Stenn wrote: Hal, Have you learned anything new about this? Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/3 ------------------------------------------------------------------------ On 2015-06-11T06:41:48+00:00 H-murray wrote: > Hal, > Have you learned anything new about this? Nope. I'm stumped. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/4 ------------------------------------------------------------------------ On 2015-08-17T23:55:19+00:00 john.marsh...@riverwillow.com.au wrote: FreeBSD 10.2-RC3 ntpd 4.3.68 Hal, thanks for mentioning this on the mailing list. I should have spoken up sooner. I've been seeing this for a LONG time (2-3 years?) but I workaround by replacing hostnames with IP addresses in the config file 'server' statements and then forget. Every several months, I look at the config, scratch my head, put the domain names back in, and then remember! I have been seeing this ONLY on an Intel Xeon E5-2603 (the biggest of our machines). It has two CPU's each with 4 cores and, to me, this smells like a thread problem. This server is now running FreeBSD 10.2-RC3 but I have seen this same problem on this server on earlier versions as well (definitely FreeBSD 10.1 and 9, not sure about 8). Just now I edited the config file to use domain names for server config and produced this dump. Like Hal, this doesn't happen EVERY time ntpd starts but, for me, it is the rule rather than the exception. rwsrv08# gdb /usr/sbin/ntpd /ntpd.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. ... Core was generated by `ntpd'. ... #0 0x0000000800678fe2 in _rtld_atfork_post () from /libexec/ld-elf.so.1 [New Thread 801c07400 (LWP 101340/<unknown>)] [New Thread 801c06400 (LWP 100229/<unknown>)] (gdb) bt #0 0x0000000800678fe2 in _rtld_atfork_post () from /libexec/ld-elf.so.1 #1 0x0000000800679349 in _rtld_atfork_post () from /libexec/ld-elf.so.1 #2 0x00000008006749f4 in _rtld_is_dlopened () from /libexec/ld-elf.so.1 #3 0x0000000800673e3a in _rtld_is_dlopened () from /libexec/ld-elf.so.1 #4 0x0000000800670ea0 in dlopen () from /libexec/ld-elf.so.1 #5 0x00000008013e9025 in _nsdbtaddsrc () from /lib/libc.so.7 #6 0x00000008013e37e4 in _nsyyparse () from /lib/libc.so.7 #7 0x00000008013e96a1 in nsdispatch () from /lib/libc.so.7 #8 0x00000008013cd011 in getservbyname () from /lib/libc.so.7 #9 0x00000008013ccf19 in getservbyname () from /lib/libc.so.7 #10 0x00000008013c9a33 in getaddrinfo () from /lib/libc.so.7 #11 0x00000008013c7358 in getaddrinfo () from /lib/libc.so.7 #12 0x00000000004382c6 in blocking_getaddrinfo () #13 0x0000000000439190 in blocking_child_common () #14 0x000000000043b3b9 in blocking_thread () #15 0x00000008010bd7d5 in pthread_create () from /lib/libthr.so.3 #16 0x0000000000000000 in ?? () (gdb) q When this happens, syslog shows... Aug 18 09:30:24 rwsrv08 ntpd[17587]: ntpd 4.3.68@1.2483-o Fri Aug 7 02:03:11 UTC 2015 (1): Starting Aug 18 09:30:24 rwsrv08 ntpd[17587]: Command line: /usr/sbin/ntpd -g -w 120 -N -c /data/ntpd/ntp.conf -p /var/run/ntpd.pid Aug 18 09:30:25 rwsrv08 ntpd[17588]: proto: precision = 1.118 usec (-20) Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen and drop on 0 v6wildcard [::]:123 Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen and drop on 1 v4wildcard 0.0.0.0:123 Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 2 GFNX 203.58.93.40:123 Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 3 GFNX [2001:8000:1000:1801::5001]:123 Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 4 lo0 [::1]:123 Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 5 lo0 127.0.0.1:123 Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listening on routing socket on fd #26 for interface updates Aug 18 09:30:25 rwsrv08 ntpd[17588]: ff08::101 8811 81 mobilize assoc 36615 Aug 18 09:30:25 rwsrv08 kernel: pid 17588 (ntpd), uid 0: exited on signal 11 (core dumped) Please let me know if I can be of any assistance. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/5 ------------------------------------------------------------------------ On 2015-08-19T15:01:45+00:00 Burnicki wrote: I've just set up a machine with FreeBSD 10.2-RELEASE from scratch, and I don't see any problem with the version of ntpd shipped with FreeBSD. It's labelled "4.2.8p3-a", but I don't know what the "-a" stands for. The only problem I encountered is that I had to remove "nopeer" from the "restrict" lines in the shipped ntp.conf file if I wanted to use the "pool" directive, since otherwise no pool servers were added. However, the comments in bug 2152 say this is OK. Digging through bugzilla I found a few issues where ntpd didn't work correctly due to memory restrictions: Bug 2362 - mlockall() breaks DNS resolution when using the "files" service in nsswitch.conf http://bugs.ntp.org/show_bug.cgi?id=2362 Bug 2643 - Server crash with pool directive http://bugs.ntp.org/show_bug.cgi?id=2643 Bug 2817 - Stop locking ntpd into memory by default http://bugs.ntp.org/show_bug.cgi?id=2817 For me it smells like all this is somehow related. Can you try if the problem still persists if you add an "rlimit memlock 128" line (or even a higher number) to ntp.conf? If I find some time I'll try to build ntp-dev on my FreeBSD machine and see if I can duplicate the problem. On the other hand, Hal reported on the hackers@ list that he also saw this on Fedora 22. I've also installed that Linux version a few days ago and played a bit with it, but didn't encounter any problems with the shipped ntpd, either. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/6 ------------------------------------------------------------------------ On 2015-08-20T01:28:30+00:00 john.marsh...@riverwillow.com.au wrote: Martin, Thanks for looking at this. I'd like to stress that I'm only seeing this on a system with more memory (16GB) and more cores (8) than we have anywhere else. As you suggested, I tried adding "rlimit memlock 128" to ntp.conf but it made no difference. I then tried "rlimit memlock 256" and it also made no difference. I am now using: FreeBSD 10.2-RELEASE-p1 ntpd 4.3.70 When ntpd fails, the dump backtrace looks like what I pasted in Comment #5 or like the following. The three backtraces (Hal's + my two) diverge after the blocking_getaddrinfo(). (gdb) bt #0 0x00000008013ed631 in __h_errno_set () from /lib/libc.so.7 #1 0x00000008013bf90e in __res_vinit () from /lib/libc.so.7 #2 0x00000008013c33b0 in getaddrinfo () from /lib/libc.so.7 #3 0x00000008013e39ef in nsdispatch () from /lib/libc.so.7 #4 0x00000008013c20ec in getaddrinfo () from /lib/libc.so.7 #5 0x000000000043435a in blocking_getaddrinfo () #6 0x00000000004352f0 in blocking_child_common () #7 0x0000000000437159 in blocking_thread () #8 0x00000008010b77d5 in pthread_create () from /lib/libthr.so.3 #9 0x0000000000000000 in ?? () Since you mentioned nsswitch.conf in Comment #6, I note that all our servers have "hosts: dns" in nsswitch.conf. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/7 ------------------------------------------------------------------------ On 2015-08-20T08:52:46+00:00 H-murray wrote: It (or something very similar) also happens on Linux. I tried mail to hackers, but the discussion ended up here, so I'll copy the data from that message. http://lists.ntp.org/pipermail/hackers/2015-August/007156.html A few days ago, I tried to add a pool line to a server and got a strange error message. 14 Aug 13:58:07 ntpd[12618]: error resolving pool 0.fedora.pool.ntp.org: System error (-11) It tries again in a few minutes and gets the same error. ... EAI_SYSTEM (System Error) says look in errno. I added some debugging printout. errno is always EAGAIN. More printout says it's taking ~15 ms which is reasonable for a packet exchange over my DSL line. I added a loop to try a few times. It always gets the same error. I changed the server lines of local systems from names to IP Addresses. Now I get: 16 Aug 02:37:41 ntpd[21377]: fatal out of memory (32 bytes) That's from the DNS thread creation code getting ready to look up the pool info. That's on a 64 bit Fedora 22 system. I got the same sort of thing on another Fedora box and a Debian box so I'm pretty sure it isn't a simple flaky hardware box. (But all the problems have been on the same type of hardware, so it might be a design bug. Dell Optiplex FX 160, Intel Atom 330.) Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/8 ------------------------------------------------------------------------ On 2015-09-08T09:57:25+00:00 H-murray wrote: Harlan pointed me at a wonderful blog post: https://blog.crashed.org/dont-backout Thanks. Quick summary: Bug in FreeBSD page fault handler That solves the FreeBSD half of this bug. I'll submit a new one for the Linux variant. Harlan: I don't see anything like UPSTREAM in the resolved-at options. I'll let you sort out how to mark this as no-longer-open. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/9 ------------------------------------------------------------------------ On 2015-09-08T21:16:30+00:00 john.marsh...@riverwillow.com.au wrote: (In reply to comment #9) > Quick summary: Bug in FreeBSD page fault handler > > That solves the FreeBSD half of this bug. Thanks for posting this Hal but you don't reference a patch, I can't find any reference in that blog post to a patch, and my attempts at trawling FreeBSD commit logs have yielded no results (my fault, no doubt). It would be great to close off this bug with a pointer to the FreeBSD pager patch that fixes this. Any clues? Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/10 ------------------------------------------------------------------------ On 2015-09-08T21:53:31+00:00 H-murray wrote: > Thanks for posting this Hal but you don't reference a patch, > I can't find any reference in that blog post to a patch ... > Any clues? Nope. I'm not plugged into the FreeBSD ecosystem. I expect there would be something in their bug database or mailing lists. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/11 ------------------------------------------------------------------------ On 2015-09-08T23:29:31+00:00 john.marsh...@riverwillow.com.au wrote: (In reply to comment #11) Hal, I've sent email to the author of the blog post to which you referred in Comment #9 and plan to post details of any response here. If I can get a pointer to a FreeBSD patch, I'll apply that, test and report. I think it's premature to suggest that this bug be closed without seeing if there is, actually, a fix for this problem. Peter may even have hit a different crash to the one we are seeing. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/12 ------------------------------------------------------------------------ On 2015-09-09T08:16:28+00:00 john.marsh...@riverwillow.com.au wrote: Created attachment 1325 Do mlockall before threads I exchanged email with the author of the blog post referred to in Comment #9. He suggested that I build ntpd with HAVE_MLOCKALL disabled and test. I had no problem at all with mlockall() disabled. He also suggested that, notwithstanding potential problems with FreeBSD's mlockall(), running mlockall() in one thread while allocating memory in another thread is probably unwise anyway; and that calling mlockall() before starting any threads may be preferable. In the attached patch (against 4.3.70), I moved the "if (do_memlock)" block in ntpd.c up to an earlier point, after the fork() and just after the RLIMITs are set. WARNING: I do not *know* ntpd.c, so this needs careful scrutiny by someone who does but..."It works for me"! (on FreeBSD 10.2-RELEASE with a patched 4.3.70) Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/13 ------------------------------------------------------------------------ On 2015-09-21T14:45:17+00:00 H-max-3 wrote: I think I ran into the same issue (realloc() returning an error when being asked for 32 bytes somewhere down the callstack from blocking_getaddrinfo()) with 4.2.8p3 on SUSE, but with a slightly different behaviour: The error in realloc() only happens when using ntpq to add a server to a running ntpd that does not have any servers yet. When a server is given on the command line or in ntp.conf, ntpd starts fine and more servers can be added at runtime. I can confirm that disabling mlockall() as suggested in comment 13 prevents the call. Applying the patch from comment 13 makes it even worse: Now the error also happens when a server is specified at startup on the command line or in ntp.conf. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/14 ------------------------------------------------------------------------ On 2015-09-22T12:06:09+00:00 H-max-3 wrote: It looks like I am rather suffering from bug 2817. Sorry for the noise here. But while being there, I found that the proposed patch from comment 13 is at least incomplete, because it places the block that depends on do_memlock above getconfig(), which is the only place where it can get changed from 1 to 0, so at the new location it will always be 1. So, if the do_memlock block needs to be moved up, at least the getconfig() line should be moved with it, but I have not checked whether there are other cross-dependencies to all the init_* stuff that happens between those two locations. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/15 ------------------------------------------------------------------------ On 2016-04-11T19:01:35+00:00 Smallm wrote: Created attachment 1399 protect dnsworker_contexts with a mutex Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/24 ------------------------------------------------------------------------ On 2016-04-11T19:03:06+00:00 Smallm wrote: I believe this (at least the original problem as described) is caused by the dynamic array pointed to by dnsworker_contexts in ntp_intres.c being potentially realloced from multiple threads with no synchronization objects used. I encounter an almost identical stack trace from a core file created by a segmentation fault when running ntps (actually, ntpdig from the ntpsec fork but your code here has not diverged). I was able to get the seg fault twice in 40 runs passing ntpdate two server names on the command line. Here was my stack trace: #0 alloc_dnsworker_context (idx=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h:85 #1 get_worker_context (c=0x11a0750, idx=2) at ../../libntp/ntp_intres.c:911 #2 0x000000000040987d in blocking_getaddrinfo (c=0x11a0750, req=0x11a0ae0) at ../../libntp/ntp_intres.c:286 #3 0x000000000040a413 in blocking_child_common (c=0x11a0750) at ../../libntp/ntp_worker.c:283 #4 0x000000000040b319 in blocking_thread (ThreadArg=<optimized out>) at ../../libntp/work_thread.c:667 #5 0x00007fcedfe99e9a in ?? () #6 0x0000000000000000 in ?? () Looking at the instructions in frame #0 I saw that the register representing dnsworker_contexts had a 0 (NULL) value. 883 dnsworker_contexts[idx] = emalloc_zero(worker_context_sz); 0x0000000000408d13 <+67>: mov $0x1,%ecx 0x0000000000408d18 <+72>: xor %edx,%edx 0x0000000000408d1a <+74>: mov $0x18,%esi 0x0000000000408d1f <+79>: xor %edi,%edi 0x0000000000408d21 <+81>: callq 0x4079e0 <ereallocz> 0x0000000000408d26 <+86>: mov %rax,(%r12) 0x0000000000408d2a <+90>: mov 0x20ec17(%rip),%rax # 0x617948 <dnsworker_contexts> 0x0000000000408d31 <+97>: mov (%rax,%rbx,8),%rax (gdb) p $rbx $16 = 2 (gdb) p $rax $17 = 0 Couldn't figure out how that could come about but noticed that I got here from a worker thread and that dnsworker_contexts is realloced in get_worker_context, so potentially pointed somewhere else. That should have some kind of lock shouldn't it? When I run with the attached patch protecting that path with a mutex I no longer see the seg faults. Sorry, I did my testing with ntpsec because of work but I think it applies equally to you. I redid the patch off your master branch so it would apply cleanly for you in case you want to test with this. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/25 ------------------------------------------------------------------------ On 2016-04-12T03:56:28+00:00 Stenn wrote: Comment on attachment 1325 Do mlockall before threads Pearly, thoughts? Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/26 ------------------------------------------------------------------------ On 2016-04-12T03:56:48+00:00 Stenn wrote: Comment on attachment 1399 protect dnsworker_contexts with a mutex Pearly, thoughts? Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/27 ------------------------------------------------------------------------ On 2016-04-12T03:57:47+00:00 Stenn wrote: Mike, Thanks for the patch - I hope we can get it reviewed soon. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/28 ------------------------------------------------------------------------ On 2016-04-12T05:05:09+00:00 H-murray wrote: Mike: Thanks for tracking this down. I think this explains all the problems. I don't think the patch is good enough. You also need a lock on read references to dnsworker_contexts. The only other reference is a few lines below and in a subroutine called from there. I suggest moving the lock to the top of get_worker_context and the unlock to the bottom. (and adding a assumes-lock comment to the top of alloc_dnsworker_context) Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/29 ------------------------------------------------------------------------ On 2016-04-12T18:21:42+00:00 Smallm wrote: Ack, that was careless of me. Started fixing it the way you suggested, but I'm wondering if something more major is needed. Even if I spread out the locks within get_worker_context() to top and bottom, it would still be giving out a pointer into the array that realloc can relocate. Return a copy of the struct? Does anyone else have ideas for this module (I thought I saw a comment in another CR to that effect)? I'm really very bad at multi-threaded coding. Also, I guess a real patch needs to consider Windows. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/35 ------------------------------------------------------------------------ On 2016-04-13T20:04:21+00:00 H-murray wrote: There is probably another copy of this problem in the other direction. There are two places where info gets queued up and passed from thread to thread. One is when the main thread tells the worker thread(s) what to do. The other is when a worker thread is telling the main thread an answer. Looks like the other one is reserve_dnschild_ctx I suggest folding alloc_dnsworker_context into get_worker_context It's only called from one place and it will be easier to make sure the locks are right without that extra layer. It's only a few lines of code. The abstraction layer isn't helping anything. It might be cleaner to move the definition of dnsworker_contexts and dnsworker_contexts_alloc into get_worker_context. The idea is to make sure the lock covers all uses. static xxx I think all c compilers support that. > Even if I spread out the locks within get_worker_context() > to top and bottom, it would still be giving out a pointer > into the array that realloc can relocate. The thing that is getting realloc-ed is the array holding pointers to blocks. The individual block never gets realloc-ed. The lock only needs to protect the array. It's only referenced within that routine. (aside from the alloc which I suggested moving) Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/36 ------------------------------------------------------------------------ On 2016-04-17T05:41:53+00:00 Stenn wrote: *** This bug has been marked as a duplicate of bug 2954 *** Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/48 ------------------------------------------------------------------------ On 2016-04-17T14:21:14+00:00 Perlinger wrote: (In reply to comment #24) > > *** This bug has been marked as a duplicate of bug 2954 *** That was bit early -- my fault. It is *not* exactly a dup of 2954, but related -- that is, it is also a race condition in the async/threaded resolver code. I think the lock in the latest patch does not protect all data races here, but I'm still digging. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/49 ------------------------------------------------------------------------ On 2016-04-18T04:42:02+00:00 Perlinger wrote: Harlan, the repo is in psp.ntp.org:~perlinger/ntp-stable-2831 compiled and run with linux/x64 --with-threads (threading resolver) linux/x64 --without-threads (forking resolver) Windows7/x64/VS2008 (threading resolver) Hal, Mike, good catch. Only the proposed lock falls a bit short. You have to interlock all access to the global table, not just the realloc() call. And using pthread_mutex_t is not so easy with Windows, but we all knew that ;) I used a semaphore (again) since there is already a suitable wrapper. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/50 ------------------------------------------------------------------------ On 2016-04-18T05:44:42+00:00 Stenn wrote: Hal, Thanks for the report. John, Pearly, et al, thanks for your work on this. Pearly's fix is STAGED for 4.2.8p7. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/51 ------------------------------------------------------------------------ On 2016-04-27T04:01:07+00:00 Stenn wrote: Hal, Thanks - please mark this bug as VERIFIED or IN_PROGRESS, as appropriate. Reply at: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/62 ** Changed in: ntp Status: Unknown => Fix Released ** Changed in: ntp Importance: Unknown => High ** Bug watch added: bugs.ntp.org/ #2362 http://bugs.ntp.org/show_bug.cgi?id=2362 ** Bug watch added: bugs.ntp.org/ #2643 http://bugs.ntp.org/show_bug.cgi?id=2643 ** Bug watch added: bugs.ntp.org/ #2817 http://bugs.ntp.org/show_bug.cgi?id=2817 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to ntp in Ubuntu. https://bugs.launchpad.net/bugs/1567540 Title: ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes up or down.) Status in NTP: Fix Released Status in ntp package in Ubuntu: Triaged Bug description: ntp crashes every time the network goes up or down while the system is running and also crashes after booting up without network. --- ApportVersion: 2.20.1-0ubuntu1 Architecture: amd64 CurrentDesktop: XFCE DistroRelease: Ubuntu 16.04 InstallationDate: Installed on 2016-03-12 (26 days ago) InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224) NtpStatus: ntpq: read: Connection refused Package: ntp 1:4.2.8p4+dfsg-3ubuntu4 PackageArchitecture: amd64 ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-17-generic root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6 Tags: xenial Uname: Linux 4.4.0-17-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo _MarkForUpload: True --- ApportVersion: 2.20.1-0ubuntu1 Architecture: amd64 CurrentDesktop: XFCE DistroRelease: Ubuntu 16.04 InstallationDate: Installed on 2016-03-12 (31 days ago) InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224) NtpStatus: ntpq: read: Connection refused Package: ntp 1:4.2.8p4+dfsg-3ubuntu5 PackageArchitecture: amd64 ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6 Tags: xenial Uname: Linux 4.4.0-18-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo _MarkForUpload: True --- ApportVersion: 2.20.1-0ubuntu1 Architecture: amd64 CurrentDesktop: XFCE DistroRelease: Ubuntu 16.04 InstallationDate: Installed on 2016-04-13 (0 days ago) InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412) NtpStatus: ntpq: read: Connection refused Package: ntp 1:4.2.8p4+dfsg-3ubuntu5 PackageArchitecture: amd64 ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6 Tags: xenial Uname: Linux 4.4.0-18-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo _MarkForUpload: True --- ApportVersion: 2.20.1-0ubuntu1 Architecture: amd64 CurrentDesktop: XFCE DistroRelease: Ubuntu 16.04 InstallationDate: Installed on 2016-04-13 (0 days ago) InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412) NtpStatus: ntpq: read: Connection refused Package: ntp 1:4.2.8p4+dfsg-3ubuntu5 PackageArchitecture: amd64 ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6 Tags: xenial Uname: Linux 4.4.0-18-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo _MarkForUpload: True --- ApportVersion: 2.20.1-0ubuntu2 Architecture: amd64 CurrentDesktop: XFCE DistroRelease: Ubuntu 16.04 InstallationDate: Installed on 2016-04-14 (3 days ago) InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412) NtpStatus: ntpq: read: Connection refused Package: ntp 1:4.2.8p4+dfsg-3ubuntu5 PackageArchitecture: amd64 ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-20-generic root=UUID=b9c0528f-e81f-4b08-9b31-032f14f72ccd ro quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.4.0-20.36-generic 4.4.6 Tags: xenial Uname: Linux 4.4.0-20-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo _MarkForUpload: True --- ApportVersion: 2.20.1-0ubuntu2 Architecture: amd64 CurrentDesktop: XFCE DistroRelease: Ubuntu 16.04 InstallationDate: Installed on 2016-04-14 (3 days ago) InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412) NtpStatus: ntpq: read: Connection refused Package: ntp 1:4.2.8p4+dfsg-3ubuntu5 PackageArchitecture: amd64 ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-20-generic root=UUID=b9c0528f-e81f-4b08-9b31-032f14f72ccd ro quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.4.0-20.36-generic 4.4.6 Tags: xenial Uname: Linux 4.4.0-20-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo _MarkForUpload: True --- ApportVersion: 2.20.1-0ubuntu2.1 Architecture: amd64 CurrentDesktop: XFCE DistroRelease: Ubuntu 16.04 InstallationDate: Installed on 2016-04-14 (63 days ago) InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412) NtpStatus: ntpq: read: Connection refused Package: ntp 1:4.2.8p4+dfsg-3ubuntu5 PackageArchitecture: amd64 ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-25-generic root=UUID=3aea4570-4011-4247-9636-68317385324d ro ProcVersionSignature: Ubuntu 4.4.0-25.44-generic 4.4.13 Tags: xenial third-party-packages Uname: Linux 4.4.0-25-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dialout dip lpadmin mail netdev plugdev sambashare sudo _MarkForUpload: True To manage notifications about this bug go to: https://bugs.launchpad.net/ntp/+bug/1567540/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp