[Touch-packages] [Bug 1567540] Re: ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes up or down.)

Bug Watch Updater Mon, 27 Jun 2016 04:51:25 -0700

Launchpad has imported 29 comments from the remote bug at
http://bugs.ntp.org/show_bug.cgi?id=2831.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2015-05-14T08:49:08+00:00 H-murray wrote:

It's not solid, but I've seen three of these so far.
It crashes ballpark of 1 in 5 tries.

FreeBSD 10.1-RELEASE amd64

I haven't seen any troubles like this before 4.3.33
It crashes before it writes anything to the post-switching log file.

May 14 01:32:20 ted3 ntpd[79529]: switching logging to file /var/log/ntp/ntpd.lo
g
May 14 01:32:20 ted3 kernel: pid 79529 (ntpd), uid 0: exited on signal 11 (core
dumped)

Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libmd.so.6...done.
Loaded symbols for /lib/libmd.so.6
Reading symbols from /lib/libm.so.5...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libthr.so.3...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.so.7...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x000000080119db43 in sbrk () from /lib/libc.so.7
[New Thread 801c06c00 (LWP 100287/ntpd)]
[New Thread 801c06400 (LWP 100169/ntpd)]
(gdb) bt
#0  0x000000080119db43 in sbrk () from /lib/libc.so.7
#1  0x0000000801199aaf in sbrk () from /lib/libc.so.7
#2  0x0000000801184593 in syscall () from /lib/libc.so.7
#3  0x00000008011a5283 in realloc () from /lib/libc.so.7
#4  0x0000000000437285 in ereallocz (ptr=0x80180a140, newsz=32, priorsz=0, 
    zero_init=1) at ../../libntp/emalloc.c:43
#5  0x00000000004399c7 in get_worker_context (c=0x801c42100, idx=0)
    at ../../libntp/ntp_intres.c:982
#6  0x0000000000439665 in blocking_getaddrinfo (c=0x801c42100, req=0x801c1b0c0)
    at ../../libntp/ntp_intres.c:327
#7  0x000000000043a5d0 in blocking_child_common (c=<value optimized out>)
    at ../../libntp/ntp_worker.c:288
#8  0x000000000043c619 in blocking_thread (ThreadArg=0x80180a140)
    at ../../libntp/work_thread.c:663
#9  0x0000000800ed74f5 in pthread_create () from /lib/libthr.so.3
#10 0x0000000000000000 in ?? ()
(gdb)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/0

------------------------------------------------------------------------
On 2015-05-14T09:32:52+00:00 Stenn wrote:

Hal,

This should also duplicate using -stable, I hope...

Anyway, if you could nose around in the stack frames to try and hone in
on this that would be great.  You might need to compile without
optimization, not sure.

I haven't seen this on my freebsd boxes...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/1

------------------------------------------------------------------------
On 2015-05-16T10:06:03+00:00 H-murray wrote:

I'm up to 6 core dumps now.  All identical.

I've poked around.  It doesn't fail in gdb.  (or maybe I just haven't
figured out how to make it fail)

I don't have any good ideas.  It could be:
  a bug in ntpd that just happens to get triggered in this case
  a bug in the hardware
  a bug in the OS
  a bug in the tool chain
  an operator error

I recompiled things.  It gets the same error and objdump of both
versions is identical.

Here is something fishy:
#4  0x0000000000437285 in ereallocz (ptr=0x80180a140, newsz=32, priorsz=0,
    zero_init=1) at ../../libntp/emalloc.c:43
get_worker_context is growing the array of pointers to worker contexts.
I think it's growing it from empty.  If so, ptr should be NULL.
The version in memory is NULL.

That address comes from several layers back the call stack:
#8  0x000000000043c619 in blocking_thread (ThreadArg=0x80180a140)
    at ../../libntp/work_thread.c:663

I'll look carefully at the compiled code after some sleep.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/2

------------------------------------------------------------------------
On 2015-06-11T04:07:41+00:00 Stenn wrote:

Hal,

Have you learned anything new about this?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/3

------------------------------------------------------------------------
On 2015-06-11T06:41:48+00:00 H-murray wrote:

> Hal,
> Have you learned anything new about this?

Nope.  I'm stumped.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/4

------------------------------------------------------------------------
On 2015-08-17T23:55:19+00:00 john.marsh...@riverwillow.com.au wrote:

FreeBSD 10.2-RC3
   ntpd 4.3.68

Hal, thanks for mentioning this on the mailing list. I should have
spoken up sooner. I've been seeing this for a LONG time (2-3 years?) but
I workaround by replacing hostnames with IP addresses in the config file
'server' statements and then forget. Every several months, I look at the
config, scratch my head, put the domain names back in, and then
remember!

I have been seeing this ONLY on an Intel Xeon E5-2603 (the biggest of
our machines). It has two CPU's each with 4 cores and, to me, this
smells like a thread problem. This server is now running FreeBSD
10.2-RC3 but I have seen this same problem on this server on earlier
versions as well (definitely FreeBSD 10.1 and 9, not sure about 8).

Just now I edited the config file to use domain names for server config
and produced this dump. Like Hal, this doesn't happen EVERY time ntpd
starts but, for me, it is the rule rather than the exception.

rwsrv08# gdb /usr/sbin/ntpd /ntpd.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
...
Core was generated by `ntpd'.
...
#0  0x0000000800678fe2 in _rtld_atfork_post () from /libexec/ld-elf.so.1
[New Thread 801c07400 (LWP 101340/<unknown>)]
[New Thread 801c06400 (LWP 100229/<unknown>)]
(gdb) bt
#0  0x0000000800678fe2 in _rtld_atfork_post () from /libexec/ld-elf.so.1
#1  0x0000000800679349 in _rtld_atfork_post () from /libexec/ld-elf.so.1
#2  0x00000008006749f4 in _rtld_is_dlopened () from /libexec/ld-elf.so.1
#3  0x0000000800673e3a in _rtld_is_dlopened () from /libexec/ld-elf.so.1
#4  0x0000000800670ea0 in dlopen () from /libexec/ld-elf.so.1
#5  0x00000008013e9025 in _nsdbtaddsrc () from /lib/libc.so.7
#6  0x00000008013e37e4 in _nsyyparse () from /lib/libc.so.7
#7  0x00000008013e96a1 in nsdispatch () from /lib/libc.so.7
#8  0x00000008013cd011 in getservbyname () from /lib/libc.so.7
#9  0x00000008013ccf19 in getservbyname () from /lib/libc.so.7
#10 0x00000008013c9a33 in getaddrinfo () from /lib/libc.so.7
#11 0x00000008013c7358 in getaddrinfo () from /lib/libc.so.7
#12 0x00000000004382c6 in blocking_getaddrinfo ()
#13 0x0000000000439190 in blocking_child_common ()
#14 0x000000000043b3b9 in blocking_thread ()
#15 0x00000008010bd7d5 in pthread_create () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
(gdb) q

When this happens, syslog shows...

Aug 18 09:30:24 rwsrv08 ntpd[17587]: ntpd 4.3.68@1.2483-o Fri Aug  7 02:03:11 
UTC 2015 (1): Starting
Aug 18 09:30:24 rwsrv08 ntpd[17587]: Command line: /usr/sbin/ntpd -g -w 120 -N 
-c /data/ntpd/ntp.conf -p /var/run/ntpd.pid
Aug 18 09:30:25 rwsrv08 ntpd[17588]: proto: precision = 1.118 usec (-20)
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen and drop on 0 v6wildcard [::]:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 2 GFNX 203.58.93.40:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 3 GFNX 
[2001:8000:1000:1801::5001]:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 4 lo0 [::1]:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 5 lo0 127.0.0.1:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listening on routing socket on fd #26 for 
interface updates
Aug 18 09:30:25 rwsrv08 ntpd[17588]: ff08::101 8811 81 mobilize assoc 36615
Aug 18 09:30:25 rwsrv08 kernel: pid 17588 (ntpd), uid 0: exited on signal 11 
(core dumped)

Please let me know if I can be of any assistance.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/5

------------------------------------------------------------------------
On 2015-08-19T15:01:45+00:00 Burnicki wrote:

I've just set up a machine with FreeBSD 10.2-RELEASE from scratch, and I
don't see any problem with the version of ntpd shipped with FreeBSD.

It's labelled "4.2.8p3-a", but I don't know what the "-a" stands for.

The only problem I encountered is that I had to remove "nopeer" from the
"restrict" lines in the shipped ntp.conf file if I wanted to use the
"pool" directive, since otherwise no pool servers were added. However,
the comments in bug 2152 say this is OK.

Digging through bugzilla I found a few issues where ntpd didn't work
correctly due to memory restrictions:

Bug 2362 - mlockall() breaks DNS resolution when using the "files" service in 
nsswitch.conf
http://bugs.ntp.org/show_bug.cgi?id=2362

Bug 2643 - Server crash with pool directive
http://bugs.ntp.org/show_bug.cgi?id=2643

Bug 2817 - Stop locking ntpd into memory by default
http://bugs.ntp.org/show_bug.cgi?id=2817

For me it smells like all this is somehow related. Can you try if the
problem still persists if you add an "rlimit memlock 128" line (or even
a higher number) to ntp.conf?

If I find some time I'll try to build ntp-dev on my FreeBSD machine and
see if I can duplicate the problem.

On the other hand, Hal reported on the hackers@ list that he also saw
this on Fedora 22. I've also installed that Linux version a few days ago
and played a bit with it, but didn't encounter any problems with the
shipped ntpd, either.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/6

------------------------------------------------------------------------
On 2015-08-20T01:28:30+00:00 john.marsh...@riverwillow.com.au wrote:

Martin,

Thanks for looking at this. I'd like to stress that I'm only seeing this
on a system with more memory (16GB) and more cores (8) than we have
anywhere else.

As you suggested, I tried adding "rlimit memlock 128" to ntp.conf but it
made no difference. I then tried "rlimit memlock 256" and it also made
no difference.

I am now using:
  FreeBSD 10.2-RELEASE-p1
     ntpd 4.3.70

When ntpd fails, the dump backtrace looks like what I pasted in Comment
#5 or like the following. The three backtraces (Hal's + my two) diverge
after the blocking_getaddrinfo().

(gdb) bt
#0  0x00000008013ed631 in __h_errno_set () from /lib/libc.so.7
#1  0x00000008013bf90e in __res_vinit () from /lib/libc.so.7
#2  0x00000008013c33b0 in getaddrinfo () from /lib/libc.so.7
#3  0x00000008013e39ef in nsdispatch () from /lib/libc.so.7
#4  0x00000008013c20ec in getaddrinfo () from /lib/libc.so.7
#5  0x000000000043435a in blocking_getaddrinfo ()
#6  0x00000000004352f0 in blocking_child_common ()
#7  0x0000000000437159 in blocking_thread ()
#8  0x00000008010b77d5 in pthread_create () from /lib/libthr.so.3
#9  0x0000000000000000 in ?? ()

Since you mentioned nsswitch.conf in Comment #6, I note that all our
servers have "hosts: dns" in nsswitch.conf.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/7

------------------------------------------------------------------------
On 2015-08-20T08:52:46+00:00 H-murray wrote:

It (or something very similar) also happens on Linux.

I tried mail to hackers, but the discussion ended up here, so I'll copy the 
data from that message.
  http://lists.ntp.org/pipermail/hackers/2015-August/007156.html

A few days ago, I tried to add a pool line to a server and got a strange 
error message.

14 Aug 13:58:07 ntpd[12618]: error resolving pool 0.fedora.pool.ntp.org: 
System error (-11)

It tries again in a few minutes and gets the same error. ...

EAI_SYSTEM (System Error) says look in errno.  I added some debugging 
printout.  errno is always EAGAIN.  More printout says it's taking ~15 ms 
which is reasonable for a packet exchange over my DSL line.  I added a loop 
to try a few times.  It always gets the same error.

I changed the server lines of local systems from names to IP Addresses.  Now 
I get:
  16 Aug 02:37:41 ntpd[21377]: fatal out of memory (32 bytes)
That's from the DNS thread creation code getting ready to look up the pool 
info.

That's on a 64 bit Fedora 22 system.  I got the same sort of thing on another 
Fedora box and a Debian box so I'm pretty sure it isn't a simple flaky 
hardware box.  (But all the problems have been on the same type of hardware, 
so it might be a design bug.  Dell Optiplex FX 160, Intel Atom 330.)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/8

------------------------------------------------------------------------
On 2015-09-08T09:57:25+00:00 H-murray wrote:

Harlan pointed me at a wonderful blog post:
  https://blog.crashed.org/dont-backout
Thanks.

Quick summary: Bug in FreeBSD page fault handler

That solves the FreeBSD half of this bug.  I'll submit a new one
for the Linux variant.

Harlan:
  I don't see anything like UPSTREAM in the resolved-at options.
  I'll let you sort out how to mark this as no-longer-open.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/9

------------------------------------------------------------------------
On 2015-09-08T21:16:30+00:00 john.marsh...@riverwillow.com.au wrote:

(In reply to comment #9)
> Quick summary: Bug in FreeBSD page fault handler
> 
> That solves the FreeBSD half of this bug.

Thanks for posting this Hal but you don't reference a patch, I can't
find any reference in that blog post to a patch, and my attempts at
trawling FreeBSD commit logs have yielded no results (my fault, no
doubt). It would be great to close off this bug with a pointer to the
FreeBSD pager patch that fixes this. Any clues?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/10

------------------------------------------------------------------------
On 2015-09-08T21:53:31+00:00 H-murray wrote:

> Thanks for posting this Hal but you don't reference a patch,
> I can't find any reference in that blog post to a patch ...
> Any clues?

Nope.  I'm not plugged into the FreeBSD ecosystem.

I expect there would be something in their bug database
or mailing lists.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/11

------------------------------------------------------------------------
On 2015-09-08T23:29:31+00:00 john.marsh...@riverwillow.com.au wrote:

(In reply to comment #11)
Hal, I've sent email to the author of the blog post to which you referred in 
Comment #9 and plan to post details of any response here. If I can get a 
pointer to a FreeBSD patch, I'll apply that, test and report.

I think it's premature to suggest that this bug be closed without seeing
if there is, actually, a fix for this problem. Peter may even have hit a
different crash to the one we are seeing.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/12

------------------------------------------------------------------------
On 2015-09-09T08:16:28+00:00 john.marsh...@riverwillow.com.au wrote:

Created attachment 1325
Do mlockall before threads

I exchanged email with the author of the blog post referred to in
Comment #9. He suggested that I build ntpd with HAVE_MLOCKALL disabled
and test. I had no problem at all with mlockall() disabled.

He also suggested that, notwithstanding potential problems with
FreeBSD's mlockall(), running mlockall() in one thread while allocating
memory in another thread is probably unwise anyway; and that calling
mlockall() before starting any threads may be preferable.

In the attached patch (against 4.3.70), I moved the "if (do_memlock)"
block in ntpd.c up to an earlier point, after the fork() and just after
the RLIMITs are set.

WARNING: I do not *know* ntpd.c, so this needs careful scrutiny by
someone who does but..."It works for me"! (on FreeBSD 10.2-RELEASE with
a patched 4.3.70)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/13

------------------------------------------------------------------------
On 2015-09-21T14:45:17+00:00 H-max-3 wrote:

I think I ran into the same issue (realloc() returning an error when
being asked for 32 bytes somewhere down the callstack from
blocking_getaddrinfo()) with 4.2.8p3 on SUSE, but with a slightly
different behaviour:

The error in realloc() only happens when using ntpq to add a server to a
running ntpd that does not have any servers yet. When a server is given
on the command line or in ntp.conf, ntpd starts fine and more servers
can be added at runtime.

I can confirm that disabling mlockall() as suggested in comment 13
prevents the call.

Applying the patch from comment 13 makes it even worse: Now the error
also happens when a server is specified at startup on the command line
or in ntp.conf.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/14

------------------------------------------------------------------------
On 2015-09-22T12:06:09+00:00 H-max-3 wrote:

It looks like I am rather suffering from bug 2817. Sorry for the noise
here.

But while being there, I found that the proposed patch from comment 13
is at least incomplete, because it places the block that depends on
do_memlock above getconfig(), which is the only place where it can get
changed from 1 to 0, so at the new location it will always be 1.

So, if the do_memlock block needs to be moved up, at least the
getconfig() line should be moved with it, but I have not checked whether
there are other cross-dependencies to all the init_* stuff that happens
between those two locations.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/15

------------------------------------------------------------------------
On 2016-04-11T19:01:35+00:00 Smallm wrote:

Created attachment 1399
protect dnsworker_contexts with a mutex

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/24

------------------------------------------------------------------------
On 2016-04-11T19:03:06+00:00 Smallm wrote:

I believe this (at least the original problem as described) is caused by
the dynamic array pointed to by dnsworker_contexts in ntp_intres.c being
potentially realloced from multiple threads with no synchronization
objects used.

I encounter an almost identical stack trace from a core file created by
a segmentation fault when running ntps (actually, ntpdig from the ntpsec
fork but your code here has not diverged). I was able to get the seg
fault twice in 40 runs passing ntpdate two server names on the command
line. Here was my stack trace:

#0  alloc_dnsworker_context (idx=<optimized out>)
    at /usr/include/x86_64-linux-gnu/bits/string3.h:85
#1  get_worker_context (c=0x11a0750, idx=2) at ../../libntp/ntp_intres.c:911
#2  0x000000000040987d in blocking_getaddrinfo (c=0x11a0750, req=0x11a0ae0)
    at ../../libntp/ntp_intres.c:286
#3  0x000000000040a413 in blocking_child_common (c=0x11a0750)
    at ../../libntp/ntp_worker.c:283
#4  0x000000000040b319 in blocking_thread (ThreadArg=<optimized out>)
    at ../../libntp/work_thread.c:667
#5  0x00007fcedfe99e9a in ?? ()
#6  0x0000000000000000 in ?? ()

Looking at the instructions in frame #0 I saw that the register
representing dnsworker_contexts had a 0 (NULL) value.

883             dnsworker_contexts[idx] = emalloc_zero(worker_context_sz);
   0x0000000000408d13 <+67>:    mov    $0x1,%ecx
   0x0000000000408d18 <+72>:    xor    %edx,%edx
   0x0000000000408d1a <+74>:    mov    $0x18,%esi
   0x0000000000408d1f <+79>:    xor    %edi,%edi
   0x0000000000408d21 <+81>:    callq  0x4079e0 <ereallocz>
   0x0000000000408d26 <+86>:    mov    %rax,(%r12)
   0x0000000000408d2a <+90>:    mov    0x20ec17(%rip),%rax        # 0x617948 
<dnsworker_contexts>
   0x0000000000408d31 <+97>:    mov    (%rax,%rbx,8),%rax
(gdb) p $rbx
$16 = 2
(gdb) p $rax
$17 = 0

Couldn't figure out how that could come about but noticed that I got
here from a worker thread and that dnsworker_contexts is realloced in
get_worker_context, so potentially pointed somewhere else.  That should
have some kind of lock shouldn't it?

When I run with the attached patch protecting that path with a mutex I
no longer see the seg faults. Sorry, I did my testing with ntpsec
because of work but I think it applies equally to you. I redid the patch
off your master branch so it would apply cleanly for you in case you
want to test with this.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/25

------------------------------------------------------------------------
On 2016-04-12T03:56:28+00:00 Stenn wrote:

Comment on attachment 1325
Do mlockall before threads

Pearly, thoughts?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/26

------------------------------------------------------------------------
On 2016-04-12T03:56:48+00:00 Stenn wrote:

Comment on attachment 1399
protect dnsworker_contexts with a mutex

Pearly, thoughts?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/27

------------------------------------------------------------------------
On 2016-04-12T03:57:47+00:00 Stenn wrote:

Mike,

Thanks for the patch - I hope we can get it reviewed soon.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/28

------------------------------------------------------------------------
On 2016-04-12T05:05:09+00:00 H-murray wrote:

Mike: Thanks for tracking this down.  I think this explains all the
problems.

I don't think the patch is good enough.

You also need a lock on read references to dnsworker_contexts.  The only
other reference is a few lines below and in a subroutine called from there.
I suggest moving the lock to the top of get_worker_context and the
unlock to the bottom.  (and adding a assumes-lock comment to the top of
alloc_dnsworker_context)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/29

------------------------------------------------------------------------
On 2016-04-12T18:21:42+00:00 Smallm wrote:

Ack, that was careless of me. Started fixing it the way you suggested,
but I'm wondering if something more major is needed. Even if I spread
out the locks within get_worker_context() to top and bottom, it would
still be giving out a pointer into the array that realloc can relocate.
Return a copy of the struct? Does anyone else have ideas for this module
(I thought I saw a comment in another CR to that effect)? I'm really
very bad at multi-threaded coding.

Also, I guess a real patch needs to consider Windows.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/35

------------------------------------------------------------------------
On 2016-04-13T20:04:21+00:00 H-murray wrote:

There is probably another copy of this problem in the other direction.

There are two places where info gets queued up and passed from thread
to thread.  One is when the main thread tells the worker thread(s) what
to do.  The other is when a worker thread is telling the main thread
an answer.

Looks like the other one is reserve_dnschild_ctx

I suggest folding alloc_dnsworker_context into get_worker_context
It's only called from one place and it will be easier to make
sure the locks are right without that extra layer.  It's only a few
lines of code.  The abstraction layer isn't helping anything.

It might be cleaner to move the definition of dnsworker_contexts
and dnsworker_contexts_alloc into get_worker_context.  The idea
is to make sure the lock covers all uses.
  static xxx
I think all c compilers support that.

> Even if I spread out the locks within get_worker_context()
> to top and bottom, it would still be giving out a pointer
> into the array that realloc can relocate.

The thing that is getting realloc-ed is the array holding pointers
to blocks.  The individual block never gets realloc-ed.  The lock
only needs to protect the array.  It's only referenced within
that routine.  (aside from the alloc which I suggested moving)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/36

------------------------------------------------------------------------
On 2016-04-17T05:41:53+00:00 Stenn wrote:

*** This bug has been marked as a duplicate of bug 2954 ***

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/48

------------------------------------------------------------------------
On 2016-04-17T14:21:14+00:00 Perlinger wrote:

(In reply to comment #24)
> 
> *** This bug has been marked as a duplicate of bug 2954 ***

That was bit early -- my fault. It is *not* exactly a dup of 2954, but
related -- that is, it is also a race condition in the async/threaded
resolver code.

I think the lock in the latest patch does not protect all data races
here, but I'm still digging.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/49

------------------------------------------------------------------------
On 2016-04-18T04:42:02+00:00 Perlinger wrote:

Harlan, the repo is in

  psp.ntp.org:~perlinger/ntp-stable-2831

compiled and run with

  linux/x64 --with-threads (threading resolver)
  linux/x64 --without-threads (forking resolver)
  Windows7/x64/VS2008 (threading resolver)

Hal, Mike, good catch. Only the proposed lock falls a bit short. You
have to interlock all access to the global table, not just the realloc()
call.

And using pthread_mutex_t is not so easy with Windows, but we all knew
that ;) I used a semaphore (again) since there is already a suitable
wrapper.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/50

------------------------------------------------------------------------
On 2016-04-18T05:44:42+00:00 Stenn wrote:

Hal,

Thanks for the report.  John, Pearly, et al, thanks for your work on
this.

Pearly's fix is STAGED for 4.2.8p7.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/51

------------------------------------------------------------------------
On 2016-04-27T04:01:07+00:00 Stenn wrote:

Hal,

Thanks - please mark this bug as VERIFIED or IN_PROGRESS, as
appropriate.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1567540/comments/62

** Changed in: ntp
       Status: Unknown => Fix Released

** Changed in: ntp
   Importance: Unknown => High

** Bug watch added: bugs.ntp.org/ #2362
   http://bugs.ntp.org/show_bug.cgi?id=2362

** Bug watch added: bugs.ntp.org/ #2643
   http://bugs.ntp.org/show_bug.cgi?id=2643

** Bug watch added: bugs.ntp.org/ #2817
   http://bugs.ntp.org/show_bug.cgi?id=2817

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ntp in Ubuntu.
https://bugs.launchpad.net/bugs/1567540

Title:
  ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes
  up or down.)

Status in NTP:
  Fix Released
Status in ntp package in Ubuntu:
  Triaged

Bug description:
  ntp crashes every time the network goes up or down while the system is 
running and also crashes after booting up without network.
  --- 
  ApportVersion: 2.20.1-0ubuntu1
  Architecture: amd64
  CurrentDesktop: XFCE
  DistroRelease: Ubuntu 16.04
  InstallationDate: Installed on 2016-03-12 (26 days ago)
  InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
  NtpStatus: ntpq: read: Connection refused
  Package: ntp 1:4.2.8p4+dfsg-3ubuntu4
  PackageArchitecture: amd64
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-17-generic 
root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
  ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6
  Tags:  xenial
  Uname: Linux 4.4.0-17-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  --- 
  ApportVersion: 2.20.1-0ubuntu1
  Architecture: amd64
  CurrentDesktop: XFCE
  DistroRelease: Ubuntu 16.04
  InstallationDate: Installed on 2016-03-12 (31 days ago)
  InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
  NtpStatus: ntpq: read: Connection refused
  Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
  PackageArchitecture: amd64
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic 
root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
  ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
  Tags:  xenial
  Uname: Linux 4.4.0-18-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  --- 
  ApportVersion: 2.20.1-0ubuntu1
  Architecture: amd64
  CurrentDesktop: XFCE
  DistroRelease: Ubuntu 16.04
  InstallationDate: Installed on 2016-04-13 (0 days ago)
  InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
  NtpStatus: ntpq: read: Connection refused
  Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
  PackageArchitecture: amd64
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic 
root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
  ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
  Tags:  xenial
  Uname: Linux 4.4.0-18-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  --- 
  ApportVersion: 2.20.1-0ubuntu1
  Architecture: amd64
  CurrentDesktop: XFCE
  DistroRelease: Ubuntu 16.04
  InstallationDate: Installed on 2016-04-13 (0 days ago)
  InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
  NtpStatus: ntpq: read: Connection refused
  Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
  PackageArchitecture: amd64
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic 
root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
  ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
  Tags:  xenial
  Uname: Linux 4.4.0-18-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  --- 
  ApportVersion: 2.20.1-0ubuntu2
  Architecture: amd64
  CurrentDesktop: XFCE
  DistroRelease: Ubuntu 16.04
  InstallationDate: Installed on 2016-04-14 (3 days ago)
  InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
  NtpStatus: ntpq: read: Connection refused
  Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
  PackageArchitecture: amd64
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-20-generic 
root=UUID=b9c0528f-e81f-4b08-9b31-032f14f72ccd ro quiet splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.4.0-20.36-generic 4.4.6
  Tags:  xenial
  Uname: Linux 4.4.0-20-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  --- 
  ApportVersion: 2.20.1-0ubuntu2
  Architecture: amd64
  CurrentDesktop: XFCE
  DistroRelease: Ubuntu 16.04
  InstallationDate: Installed on 2016-04-14 (3 days ago)
  InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
  NtpStatus: ntpq: read: Connection refused
  Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
  PackageArchitecture: amd64
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-20-generic 
root=UUID=b9c0528f-e81f-4b08-9b31-032f14f72ccd ro quiet splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.4.0-20.36-generic 4.4.6
  Tags:  xenial
  Uname: Linux 4.4.0-20-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  --- 
  ApportVersion: 2.20.1-0ubuntu2.1
  Architecture: amd64
  CurrentDesktop: XFCE
  DistroRelease: Ubuntu 16.04
  InstallationDate: Installed on 2016-04-14 (63 days ago)
  InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
  NtpStatus: ntpq: read: Connection refused
  Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
  PackageArchitecture: amd64
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-25-generic 
root=UUID=3aea4570-4011-4247-9636-68317385324d ro
  ProcVersionSignature: Ubuntu 4.4.0-25.44-generic 4.4.13
  Tags: xenial third-party-packages
  Uname: Linux 4.4.0-25-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dialout dip lpadmin mail netdev plugdev sambashare sudo
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/ntp/+bug/1567540/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

[Touch-packages] [Bug 1567540] Re: ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes up or down.)

Reply via email to