Weird startup issue with -fsplit-stack

2014-05-20 Thread Dmitry Antipov

Hello,

I'm trying to support -fsplit-stack in GNU Emacs. The most important problem is 
that
GC uses conservative scanning of a C stack, so I need to iterate over stack 
segments.
I'm doing this by using  __splitstack_find, as described in 
libgcc/generic-morestack.c;
but now I'm facing the weird issue with startup:

Core was generated by `./temacs --batch --load loadup bootstrap'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:486
486 pushq   %rax
(gdb) bt 10
#0  __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:486
#1  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#2  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#3  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#4  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#5  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#6  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#7  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#8  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#9  0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
(More stack frames follow...)
(gdb) bt -10
#87310 0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#87311 0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#87312 0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#87313 0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#87314 0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#87315 0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#87316 0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#87317 0x005f15df in __morestack () at 
../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502
#87318 0x003791a21d65 in __libc_start_main (main=0x4d111d , argc=5, 
argv=0x7fffacc868d8, init=,
fini=, rtld_fini=, stack_end=0x7fffacc868c8) 
at libc-start.c:285
#87319 0x00405f69 in _start ()
(gdb)

Unfortunately I was unable to reproduce this issue with small test programs, so
there is no simple and easy-to-use recipe. Anyway, if someone would like to try:

bzr branch bzr://bzr.savannah.gnu.org/emacs/trunk
cd trunk
cat /path/to/emacs_split_stack.patch | patch -p0
# 'configure' options for 'smallest possible' configuration
CPPFLAGS='-DSPLIT_STACK=1' CFLAGS='-O0 -g3 -fsplit-stack' ./configure 
--prefix=/some/dir --without-all --without-x --disable-acl
make

I'm using (homebrew) GCC 4.9.0 and (stock) gold 2.24 on a Fedora 20 system.

Dmitry

=== modified file 'src/alloc.c'
--- src/alloc.c	2014-05-19 19:19:05 +
+++ src/alloc.c	2014-05-20 14:01:56 +
@@ -4932,11 +4932,28 @@
 #endif /* not GC_SAVE_REGISTERS_ON_STACK */
 #endif /* not HAVE___BUILTIN_UNWIND_INIT */
 
-  /* This assumes that the stack is a contiguous region in memory.  If
- that's not the case, something has to be done here to iterate
- over the stack segments.  */
+#ifdef SPLIT_STACK
+
+  /* This assumes gcc >= 4.6.0 with -fsplit-stack
+ and corresponding support in libgcc.  */
+  {
+size_t stack_size;
+extern void * __splitstack_find (void *, void *, size_t *,
+ void **, void **, void **);
+void *next_segment = NULL, *next_sp = NULL, *initial_sp = NULL, *stack;
+
+while ((stack = __splitstack_find (next_segment, next_sp, &stack_size,
+   &next_segment, &next_sp, &initial_sp)))
+  mark_memory (stack, (char *) stack + stack_size);
+  }
+
+#else /* not SPLIT_STACK */
+
+  /* This assumes that the stack is a contiguous region in memory.  */
   mark_memory (stack_base, end);
 
+#endif /* SPLIT_STACK */
+
   /* Allow for marking a secondary stack, like the register stack on the
  ia64.  */
 #ifdef GC_MARK_SECONDARY_STACK



Re: Weird startup issue with -fsplit-stack

2014-05-20 Thread Dmitry Antipov

On 05/20/2014 10:16 PM, Ian Lance Taylor wrote:


This is the call to __morestack_block_signals in morestack.S.  It
should only be possible if __morestack_block_signals or something it
calls directly has a split stack.  __morestack_block_signals has the
no_split_stack attribute, meaning that it should never call
__morestack.  __morestack_block_signals only calls pthread_sigmark or
sigprocmask, neither of which should be compiled with -fsplit-stack.
So something has gone wrong, but I don't know what.


Thanks - that was an application's own copy of pthread_sigmask (compiled
with -fsplit-stack) linked into the binary due to a subtle configuration
issue.

The next major problem is that -fsplit-stack code randomly crashes with the
useless gdb backtrace, usually pointing to the very beginning of the function
(plus occasional "Cannot access memory at..." messages), e.g.:

(gdb) bt 1
#0  0x005a615b in mark_object (arg=0) at ../../trunk/src/alloc.c:6039

 6037  void
 6038  mark_object (Lisp_Object arg)
==>  6039  {

IIUC this usually (with traditional stack) happens due to stack overflow.
But what may be the case with -fsplit-stack? I do not receive any error
messages from libgcc, and there are a lot of free heap memory. If that matters,
mark_object is recursive, and recursion depth may be very high, up to a few
tens of thousands calls.

Dmitry


Re: Weird startup issue with -fsplit-stack

2014-06-10 Thread Dmitry Antipov

On 05/21/2014 06:10 PM, Ian Lance Taylor wrote:


I'm sorry, I have nothing useful to suggest.  I agree that that sounds
like a stack overflow, which should in general be impossible with
-fsplit-stack when using the gold linker.  I don't know what is
happening here.  I've tested with massive recursion so I don't think
that is the problem by itself.


Hm...did you test with a lot of longjmps? I'm just curious about this
comment in libgcc/generic-morestack.c:

/* The stack segment that we think we are currently using.  This will
   be correct in normal usage, but will be incorrect if an exception
   unwinds into a different stack segment or if longjmp jumps to a
   different stack segment.  */

So, what happens if longjmp jumps to a different segment? Is the result
undefined? Is it possible to detect such a jump?

Dmitry



Warning about variable optimized away?

2014-08-05 Thread Dmitry Antipov

Hello,

it it possible to get a kind of diagnostics if the variable
is totally optimized away?  For example, in:

void foo (struct some_type *obj) {
  ... some code where 'obj' is not used ...
  bar (obj->some_member);
  ... some code where 'obj' is not used again ...
  baz (obj->some_member);
}

'obj' is likely to be optimized away so only 'obj->some_member' really
exists (in a register or stack location).  Getting diagnostics
or preserving 'obj' may be important if there is a GC which scans
C stack and registers conservatively - if there is no direct reference
to 'obj', it's likely to be reclaimed and so 'obj->some_member'
becomes garbage.

Dmitry


-Wstack-usage and alloca in loops

2014-09-22 Thread Dmitry Antipov

For the following translation unit:

#include 

int
foo (unsigned n)
{
  int *p;

  if (n < 1024)
p = alloca (n * sizeof (int));
  else
p = malloc (n * sizeof (int));

  return g (p, n);
}

int
bar (unsigned n)
{
  int x, i, *p;

  for (x = 0, i = 0; i < n; i++)
{
  if (n < 1024)
p = alloca (n * sizeof (int));
  else
p = malloc (n * sizeof (int));

  x += h (p, n);

  if (n >= 1024)
free (p);
}

  return x;
}

compiling with -Wstack-usage=32 produces (as of 4.9.1):

test.c: In function 'foo':
test.c:14:1: warning: stack usage might be unbounded [-Wstack-usage=]
 }
 ^
test.c: In function 'bar':
test.c:35:1: warning: stack usage might be unbounded [-Wstack-usage=]
 }
 ^

1) I'm just curious why it's unbounded for foo().  It shouldn't be too
hard to find that alloca() is never requested to allocate more than
1024 * sizeof (int), and never called more than once, isn't it?

2) In bar(), stack usage is unbounded unless bar() is always inline with
a compile-time constant argument N.

IIUC good detection of 2) is much harder to implement, but is it
reasonable/possible to make -Wstack-usage more accurate in 1)?

Dmitry


[ARM] unexpected sizeof() of a complex packed type

2023-11-16 Thread Dmitry Antipov

(The following sample is taken from my LKML post at 
https://lkml.org/lkml/2023/11/15/213)

$ cat t-build-bug.c

struct vring_tx_mac {
unsigned int d[3];
unsigned int ucode_cmd;
} __attribute__((packed));

struct vring_rx_mac {
unsigned int d0;
unsigned int d1;
unsigned short w4;
union { struct { unsigned short pn_15_0; unsigned int pn_47_16; } 
__attribute__((packed));
struct { unsigned short pn_15_0; unsigned int pn_47_16; } 
__attribute__((packed)) pn;
};
} __attribute__((packed));

struct wil_ring_dma_addr {
unsigned int addr_low;
unsigned short addr_high;
} __attribute__((packed));

struct vring_tx_dma {
unsigned int d0;
struct wil_ring_dma_addr addr;
unsigned char ip_length;
unsigned char b11;
unsigned char error;
unsigned char status;
unsigned short length;
} __attribute__((packed));

struct vring_tx_desc {
struct vring_tx_mac mac;
struct vring_tx_dma dma;
} __attribute__((packed));

struct wil_ring_tx_enhanced_mac {
unsigned int d[3];
unsigned short tso_mss;
unsigned short scratchpad;
} __attribute__((packed));

struct wil_ring_tx_enhanced_dma {
unsigned char l4_hdr_len;
unsigned char cmd;
unsigned short w1;
struct wil_ring_dma_addr addr;
unsigned char ip_length;
unsigned char b11;
unsigned short addr_high_high;
unsigned short length;
} __attribute__((packed));

struct wil_tx_enhanced_desc {
struct wil_ring_tx_enhanced_mac mac;
struct wil_ring_tx_enhanced_dma dma;
} __attribute__((packed));

union wil_tx_desc {
struct vring_tx_desc legacy;
struct wil_tx_enhanced_desc enhanced;
} __attribute__((packed));

struct vring_rx_dma {
unsigned int d0;
struct wil_ring_dma_addr addr;
unsigned char ip_length;
unsigned char b11;
unsigned char error;
unsigned char status;
unsigned short length;
} __attribute__((packed));

struct vring_rx_desc {
struct vring_rx_mac mac;
struct vring_rx_dma dma;
} __attribute__((packed));

struct wil_ring_rx_enhanced_mac {
unsigned int d[3];
unsigned short buff_id;
unsigned short reserved;
} __attribute((packed));

struct wil_ring_rx_enhanced_dma {
unsigned int d0;
struct wil_ring_dma_addr addr;
unsigned short w5;
unsigned short addr_high_high;
unsigned short length;
} __attribute((packed));

struct wil_rx_enhanced_desc {
struct wil_ring_rx_enhanced_mac mac;
struct wil_ring_rx_enhanced_dma dma;
} __attribute((packed));

union wil_rx_desc {
struct vring_rx_desc legacy;
struct wil_rx_enhanced_desc enhanced;
} __attribute__((packed));

union wil_ring_desc {
union wil_tx_desc tx;
union wil_rx_desc rx;
} __attribute__((packed));

int f (void) {
return sizeof(union wil_ring_desc);
}

$ arm-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=arm-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/arm-linux-gnueabi/13/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../gcc-13.2.1-20230728/configure --bindir=/usr/bin --build=x86_64-redhat-linux-gnu --datadir=/usr/share --disable-decimal-float --disable-dependency-tracking --disable-gold 
--disable-libgcj --disable-libgomp --disable-libmpx --disable-libquadmath --disable-libssp --disable-libunwind-exceptions --disable-shared --disable-silent-rules --disable-sjlj-exceptions 
--disable-threads --with-ld=/usr/bin/arm-linux-gnu-ld --enable-__cxa_atexit --enable-checking=release --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++ 
--enable-linker-build-id --enable-lto --enable-nls --enable-obsolete --enable-plugin --enable-targets=all --exec-prefix=/usr --host=x86_64-redhat-linux-gnu --includedir=/usr/include 
--infodir=/usr/share/info --libexecdir=/usr/libexec --localstatedir=/var --mandir=/usr/share/man --prefix=/usr --program-prefix=arm-linux-gnu- --sbindir=/usr/sbin --sharedstatedir=/var/lib 
--sysconfdir=/etc --target=arm-linux-gnueabi --with-bugurl=http://bugzilla.redhat.com/bugzilla/ --with-gcc-major-version-only --with-isl --with-newlib --with-plugin-ld=/usr/bin/arm-linux-gnu-ld 
--with-sysroot=/usr/arm-linux-gnu/sys-root --with-system-libunwind --with-system-zlib --without-headers --with-tune=generic-armv7-a --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 
--with-abi=aapcs-linux --enable-gnu-indirect-function --with-linker-hash-style=gnu

Thread model: single
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.1 20230728 (Red Hat Cross 13.2.1-1) (GCC)

$ arm-linux-gnu-gcc -Os -c t-build-bug.c
$ arm-linux-gnu-objdump -j .text -D t-build-bug.o

t-build-bug.o: file format elf32-littlearm

Disassembly of section .text:

 :
   0:   e3a00020mov r0, #32 ;; As expected
   4:   e12ff

Compiler support for write barrier insertion ?

2007-07-23 Thread Dmitry Antipov

Hello all,

I have a question about possible cooperation between the compiler and 
hypothetical
garbage collector. Unfortunately, my experience around GCC internals is too 
small,
so I would like to ask compiler specialists before re-inventing an ugly 
bicycle...

The most non-straightforward garbage collection methods, including, but not 
limited to,
the most frequently used generational and incremental techniques, requires a 
'write
barrier' or 'store barrier' - a piece of code which is executed when a pointer 
(within
one object, usually) to some another object is written. The following methods 
are
used widely:
 1) call barrier code explicitly when it's needed, determining such places
by static analysis performed by the programmer;
 2) allocate objects from the heaps with OS 'page-aware' structures, then use OS
memory protection for the underlying pages and handle protection faults;
 3) rely on the compiler support, which means the compiler should emit some
code when the pointer store is generated.

Each of these methods has their own pitfails. In short, 1) it's very 
error-prone - one
missed barrier may break everything. For 2), it's system-dependend and slow due 
to
signal handling. For 3), an insertion of a write barrier at each pointer store 
is
obviously redundant and will introduce an enormous overhead for any real 
program.

So I'm investigating the possibility (and usability) of a hybrid scheme which 
is based
on both 1) and 3). An idea is to use GCC attributes machinery to inform the 
compiler
about special treatment of some pointers.

As an example, consider the following structure, with hypothetical attribute 
attached
to one of it's member:

struct obj {
  int value;
  char *name;
  struct obj *next __attribute__((trapped));
};

Here 'next' is 'write-barriered' pointer. During compilation, the compiler 
should
see that at least one member of 'struct obj' is trapped, and emit a call for 
special
function, for example '__builtin_obj_trap' when seeing a write to 'next' via 
pointer
of type 'struct obj *', for example:

struct obj *prev, *curr = alloc_obj ();
...
curr = alloc_obj ();
curr->value = 1234; /* Works as usual. */
curr->name = name;  /* Works as usual. */
curr->next = prev;  /* A call of __builtin_obj_trap() is arranged here,
   for example, immediately after store instruction */
...

The special function may have the following prototype:

void __builtin_obj_trap (void *obj, int offset)

where 'obj' is a pointer to the structure contains trapped member ('curr' in 
this
example) and 'offset' is equal to 'offsetof (struct obj, next)'.

This functon must be provided by programmer (which is similar to providing
__cyg_profile_func_* when '-finstrument-functions' is used).

This method has an obvious pitfail: 'memset (curr, 0, sizeof (struct obj))' or
'memcpy (curr, otherobj, sizeof (struct obj))' probably can't be catched by the
compiler.

Another interesting situation is:

struct obj **opp, *prev, *curr = alloc_obj ();
...
opp = &curr->next;
*opp = prev;

For this case, taking an address of trapped pointer may issue a warning since it
creates a way to rewrite trapped pointer bypassing write barrier. (The real hack
is to treat 'trapped' attribute as 'promoted by assignment', i.e. 'opp' becomes
'trapped' automagically after initialization and writes through 'opp' are also
surrounded by write barrier).

Of course, you may ask: why not just having

'void set_obj_next (struct obj *obj, struct obj *next)'

with write barrier inside ? The short answers are:
 1) If 'struct obj' has 100 trapped members, having 100 set_XXX functions
or macros just to set the fields is ugly;
 2) Migration from explicit memory management to garbage collection - if you
have 1M lines of code which uses 'XXX->next = ...', it's quite hard
to rewrite all stuff even with the help of modern refactoring tools.

I realize that the whole thing is very specific and probably will never used
by the most of compiler users. But, anyway, is it technically possible to
implement such thing ? How much overhead it may introduce ?

Thanks,
Dmitry


Re: On -Wmaybe-uninitialized

2025-01-30 Thread Dmitry Antipov

On 1/30/25 4:29 PM, David Malcolm wrote:


Arguably the state-merging code could be smarter here; I haven't
investigated the details, but have filed it as PR analyzer/118702
here:
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118702


Thanks. You might be also interesting in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118707 I've just created.

Dmitry



On -Wmaybe-uninitialized

2025-01-30 Thread Dmitry Antipov

With (probably) -Wmaybe-uninitialized and/or -Wextra, shouldn't the compiler 
emit
warning about possibly uninitialized 'y' passed to 'ddd()' in the example below?

struct T {
  int a;
  int b;
};

extern int bbb (struct T *, int *);
extern int ccc (struct T *, int *);
extern int ddd (struct T *, int);

int
aaa (struct T *t)
{
  int x = 0, y; /* 'y' is uninitialized */

  if (t->a) /* if this condition is true */
goto l;

  x += bbb (t, &y);

 l:
  if (t->b) /* and this condition is false */
  x += ccc (t, &y);

  x += ddd (t, y);  /* then 'y' is passed to ddd() uninitialized */

  return x;
}

Dmitry