Re: INTERNAL: Exiting with 2 jobserver tokens available; should be 5!

2016-11-12 Thread Jaak Ristioja
On 11.11.2016 19:41, Jaak Ristioja wrote:
> On 10.11.2016 09:55, Jaak Ristioja wrote:
>> On 09.11.2016 22:58, Paul Smith wrote:
>>> On Wed, 2016-11-09 at 22:42 +0200, Jaak Ristioja wrote:
 I have no ARM experience myself. I don't even know where to look for
 ABI
 documentation. This is the best I can currently get from the core:

 (gdb) thread apply all bt full

 Thread 1 (LWP 15210):
 #0  0x0d33b0bc in ?? ()
 No symbol table info available.
 #1  
 No symbol table info available.
 #2  0x64a2a8b0 in strlen () from /lib/libc.so.6
 No symbol table info available.
 #3  0x0d340370 in concat ()
 No symbol table info available.
 #4  0x0d680d34 in ?? ()
 No symbol table info available.
 Backtrace stopped: previous frame identical to this frame (corrupt
 stack?)
>>>
>>> You won't need any ABI docs.  This is a good first step, but if you can
>>> rebuild GNU make with debugging (-g) and without optimization (-O0) you
>>> will hopefully get a much more interesting and useable core.  I'm
>>> assuming, although I'm not familiar with working on ARM.
>>>
>>> It looks like somewhere in the GNU make code we're passing an invalid
>>> pointer to strlen(); either NULL or pointing to invalid memory of some
>>> kind.
>>
>> After re-compiling make 4.2.1 with "-O0 -pipe -mcpu=cortex-a7
>> -mfpu=neon-vfpv4 -mfloat-abi=hard -ggdb" instead of the regular "-O2
>> -pipe -mcpu=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard" I got:
>>
>> Thread 1 (LWP 20416):
>> #0  0x0c5cbd74 in child_error (child=0xbf78e700, exit_code=1900259124,
>> exit_sig=-1082595584, coredump=1900259184, ignored=0) at job.c:519
>> #1  0x0c5cbd8e in child_handler (sig=1935828325) at job.c:537
>> #2  0x0008 in ?? ()
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>
>> Which looks even more weird. I'm not even sure its the same crash.
>> Something seriously seems to corrupt the stack in both cases. As far as
>> I can tell, child_handler() does not call child_error() directly or
>> indirectly.
> 
> After examining about 10 more core files, these all point to job.c:519
> and job.c:537, similarly to the above:
> 
>   #0  0x00c2bd74 in child_error (child=0x0, exit_code=0, exit_sig=0,
> coredump=0, ignored=0) at job.c:519
>   pre = 0x0
>   post = 0x0
>   dump = 0x0
>   f = 0x0
>   flocp = 0x0
>   nm = 0x0
>   l = 0
>   #1  0x00c2bd8e in child_handler (sig=0) at job.c:537
>   No locals.
>   #2  0x00c67cc0 in ?? ()
>   No symbol table info available.
>   Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> Any ideas?

Looking at the code, job.c (and other source code files in GNU Make)
uses alloca (3) to allocate memory!? This is definitely looks like one
possible source for stack overflows! Quoting `man 3 alloca`:

BUGS:
There is no error indication if the stack frame cannot
be extended. (However, after a failed allocation, the
program is likely to receive a SIGSEGV signal if it
attempts to access the unallocated space.)

Personally, I would never use this function in regular code. In
child_error() it is used to allocate enough space for a filename:

char *a = alloca (strlen (flocp->filenm) + 1 + 11 + 1);

Assuming that flocp->filenm points to a path and not just the name of
the single file, one could easily overflow that stack, IMHO. I'm
guessing that PATH_MAX is 4096 on most Linux systems, while the stack is
8192.

Are you sure this is safe?

J


___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: INTERNAL: Exiting with 2 jobserver tokens available; should be 5!

2016-11-12 Thread Paul Smith
On Sat, 2016-11-12 at 13:06 +0200, Jaak Ristioja wrote:
> I'm guessing that PATH_MAX is 4096 on most Linux systems, while the stack is
> 8192.

There's no way the stack is so small.  Virtually no userspace program
can run with an 8k stack, regardless of whether they use alloca() or
not.

I think you might be misled by the output of ulimit -s as "8192";
however, the doc says:

> Values are in 1024-byte increments

so really the default is an 8M stack, not an 8K stack.

Also, traditional Linux systems set the hard limit on the stack size to
"unlimited" (run 'ulimit -S' to see it), and GNU make will reset its own
stack size to the maximum when it starts.

I don't know if there are special features of ARM which make alloca()
more problematic than other systems, but I've certainly never heard of
any issues like this on ARM.

I suspect this is a red herring.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: INTERNAL: Exiting with 2 jobserver tokens available; should be 5!

2016-11-12 Thread Paul Smith
On Fri, 2016-11-11 at 19:41 +0200, Jaak Ristioja wrote:
> After examining about 10 more core files, these all point to job.c:519
> and job.c:537, similarly to the above:
> 
>   #0  0x00c2bd74 in child_error (child=0x0, exit_code=0, exit_sig=0,
> coredump=0, ignored=0) at job.c:519
>   pre = 0x0
>   post = 0x0
>   dump = 0x0
>   f = 0x0
>   flocp = 0x0
>   nm = 0x0
>   l = 0
>   #1  0x00c2bd8e in child_handler (sig=0) at job.c:537
>   No locals.
>   #2  0x00c67cc0 in ?? ()
>   No symbol table info available.
>   Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> Any ideas?

Unfortunately this doesn't help much.  It's pretty clear that either ARM
debug information is very limited, or GDB is problematic on ARM, or else
the core file is very corrupted: there's no way that all the values in
the argument lists to these functions are really 0/NULL.  Also, as you
point out, in no way does child_handler() (which is a signal handler)
ever call child_error().

In fact, if your config.h from GNU make has HAVE_PSELECT set (which I
would expect it would since this is Linux) there's no way the
child_handler() function should ever be invoked at all.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: INTERNAL: Exiting with 2 jobserver tokens available; should be 5!

2016-11-12 Thread Tim Murphy
Something like Valgrind might spot some initial problem that doesn't
immediately crash but eventually spirals out of control. It seems to
support ARM linux now:

"20 October 2016: valgrind-3.12.0 is available. This release supports:
X86/Linux, AMD64/Linux, ARM32/Linux, ARM64/Linux, PPC32/Linux,
PPC64BE/Linux, PPC64LE/Linux, S390X/Linux, MIPS32/Linux, MIPS64/Linux,
ARM/Android, ARM64/Android, MIPS32/Android, X86/Android, X86/Solaris,
AMD64/Solaris, X86/MacOSX 10.10 and AMD64/MacOSX 10.10. There is also
preliminary support for X86/MacOSX 10.11/12, AMD64/MacOSX 10.11/12 and
TILEGX/Linux. For more details see the release notes
."

I don't know what the gcc version is on your Pi but if you have a recent
enough one  you might manage to use the address sanitiser option to get a
similar result.

Regards,

Tim

On 12 November 2016 at 14:30, Paul Smith  wrote:

> On Fri, 2016-11-11 at 19:41 +0200, Jaak Ristioja wrote:
> > After examining about 10 more core files, these all point to job.c:519
> > and job.c:537, similarly to the above:
> >
> >   #0  0x00c2bd74 in child_error (child=0x0, exit_code=0, exit_sig=0,
> > coredump=0, ignored=0) at job.c:519
> >   pre = 0x0
> >   post = 0x0
> >   dump = 0x0
> >   f = 0x0
> >   flocp = 0x0
> >   nm = 0x0
> >   l = 0
> >   #1  0x00c2bd8e in child_handler (sig=0) at job.c:537
> >   No locals.
> >   #2  0x00c67cc0 in ?? ()
> >   No symbol table info available.
> >   Backtrace stopped: previous frame identical to this frame (corrupt
> stack?)
> >
> > Any ideas?
>
> Unfortunately this doesn't help much.  It's pretty clear that either ARM
> debug information is very limited, or GDB is problematic on ARM, or else
> the core file is very corrupted: there's no way that all the values in
> the argument lists to these functions are really 0/NULL.  Also, as you
> point out, in no way does child_handler() (which is a signal handler)
> ever call child_error().
>
> In fact, if your config.h from GNU make has HAVE_PSELECT set (which I
> would expect it would since this is Linux) there's no way the
> child_handler() function should ever be invoked at all.
>
> ___
> Bug-make mailing list
> Bug-make@gnu.org
> https://lists.gnu.org/mailman/listinfo/bug-make
>
___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make