Weird startup issue with -fsplit-stack
Hello, I'm trying to support -fsplit-stack in GNU Emacs. The most important problem is that GC uses conservative scanning of a C stack, so I need to iterate over stack segments. I'm doing this by using __splitstack_find, as described in libgcc/generic-morestack.c; but now I'm facing the weird issue with startup: Core was generated by `./temacs --batch --load loadup bootstrap'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:486 486 pushq %rax (gdb) bt 10 #0 __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:486 #1 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #2 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #3 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #4 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #5 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #6 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #7 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #8 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #9 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 (More stack frames follow...) (gdb) bt -10 #87310 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #87311 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #87312 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #87313 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #87314 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #87315 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #87316 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #87317 0x005f15df in __morestack () at ../../../gcc-4.9.0/libgcc/config/i386/morestack.S:502 #87318 0x003791a21d65 in __libc_start_main (main=0x4d111d , argc=5, argv=0x7fffacc868d8, init=, fini=, rtld_fini=, stack_end=0x7fffacc868c8) at libc-start.c:285 #87319 0x00405f69 in _start () (gdb) Unfortunately I was unable to reproduce this issue with small test programs, so there is no simple and easy-to-use recipe. Anyway, if someone would like to try: bzr branch bzr://bzr.savannah.gnu.org/emacs/trunk cd trunk cat /path/to/emacs_split_stack.patch | patch -p0 # 'configure' options for 'smallest possible' configuration CPPFLAGS='-DSPLIT_STACK=1' CFLAGS='-O0 -g3 -fsplit-stack' ./configure --prefix=/some/dir --without-all --without-x --disable-acl make I'm using (homebrew) GCC 4.9.0 and (stock) gold 2.24 on a Fedora 20 system. Dmitry === modified file 'src/alloc.c' --- src/alloc.c 2014-05-19 19:19:05 + +++ src/alloc.c 2014-05-20 14:01:56 + @@ -4932,11 +4932,28 @@ #endif /* not GC_SAVE_REGISTERS_ON_STACK */ #endif /* not HAVE___BUILTIN_UNWIND_INIT */ - /* This assumes that the stack is a contiguous region in memory. If - that's not the case, something has to be done here to iterate - over the stack segments. */ +#ifdef SPLIT_STACK + + /* This assumes gcc >= 4.6.0 with -fsplit-stack + and corresponding support in libgcc. */ + { +size_t stack_size; +extern void * __splitstack_find (void *, void *, size_t *, + void **, void **, void **); +void *next_segment = NULL, *next_sp = NULL, *initial_sp = NULL, *stack; + +while ((stack = __splitstack_find (next_segment, next_sp, &stack_size, + &next_segment, &next_sp, &initial_sp))) + mark_memory (stack, (char *) stack + stack_size); + } + +#else /* not SPLIT_STACK */ + + /* This assumes that the stack is a contiguous region in memory. */ mark_memory (stack_base, end); +#endif /* SPLIT_STACK */ + /* Allow for marking a secondary stack, like the register stack on the ia64. */ #ifdef GC_MARK_SECONDARY_STACK
Re: Weird startup issue with -fsplit-stack
On 05/20/2014 10:16 PM, Ian Lance Taylor wrote: This is the call to __morestack_block_signals in morestack.S. It should only be possible if __morestack_block_signals or something it calls directly has a split stack. __morestack_block_signals has the no_split_stack attribute, meaning that it should never call __morestack. __morestack_block_signals only calls pthread_sigmark or sigprocmask, neither of which should be compiled with -fsplit-stack. So something has gone wrong, but I don't know what. Thanks - that was an application's own copy of pthread_sigmask (compiled with -fsplit-stack) linked into the binary due to a subtle configuration issue. The next major problem is that -fsplit-stack code randomly crashes with the useless gdb backtrace, usually pointing to the very beginning of the function (plus occasional "Cannot access memory at..." messages), e.g.: (gdb) bt 1 #0 0x005a615b in mark_object (arg=0) at ../../trunk/src/alloc.c:6039 6037 void 6038 mark_object (Lisp_Object arg) ==> 6039 { IIUC this usually (with traditional stack) happens due to stack overflow. But what may be the case with -fsplit-stack? I do not receive any error messages from libgcc, and there are a lot of free heap memory. If that matters, mark_object is recursive, and recursion depth may be very high, up to a few tens of thousands calls. Dmitry
Re: Weird startup issue with -fsplit-stack
On 05/21/2014 06:10 PM, Ian Lance Taylor wrote: I'm sorry, I have nothing useful to suggest. I agree that that sounds like a stack overflow, which should in general be impossible with -fsplit-stack when using the gold linker. I don't know what is happening here. I've tested with massive recursion so I don't think that is the problem by itself. Hm...did you test with a lot of longjmps? I'm just curious about this comment in libgcc/generic-morestack.c: /* The stack segment that we think we are currently using. This will be correct in normal usage, but will be incorrect if an exception unwinds into a different stack segment or if longjmp jumps to a different stack segment. */ So, what happens if longjmp jumps to a different segment? Is the result undefined? Is it possible to detect such a jump? Dmitry
Warning about variable optimized away?
Hello, it it possible to get a kind of diagnostics if the variable is totally optimized away? For example, in: void foo (struct some_type *obj) { ... some code where 'obj' is not used ... bar (obj->some_member); ... some code where 'obj' is not used again ... baz (obj->some_member); } 'obj' is likely to be optimized away so only 'obj->some_member' really exists (in a register or stack location). Getting diagnostics or preserving 'obj' may be important if there is a GC which scans C stack and registers conservatively - if there is no direct reference to 'obj', it's likely to be reclaimed and so 'obj->some_member' becomes garbage. Dmitry
-Wstack-usage and alloca in loops
For the following translation unit: #include int foo (unsigned n) { int *p; if (n < 1024) p = alloca (n * sizeof (int)); else p = malloc (n * sizeof (int)); return g (p, n); } int bar (unsigned n) { int x, i, *p; for (x = 0, i = 0; i < n; i++) { if (n < 1024) p = alloca (n * sizeof (int)); else p = malloc (n * sizeof (int)); x += h (p, n); if (n >= 1024) free (p); } return x; } compiling with -Wstack-usage=32 produces (as of 4.9.1): test.c: In function 'foo': test.c:14:1: warning: stack usage might be unbounded [-Wstack-usage=] } ^ test.c: In function 'bar': test.c:35:1: warning: stack usage might be unbounded [-Wstack-usage=] } ^ 1) I'm just curious why it's unbounded for foo(). It shouldn't be too hard to find that alloca() is never requested to allocate more than 1024 * sizeof (int), and never called more than once, isn't it? 2) In bar(), stack usage is unbounded unless bar() is always inline with a compile-time constant argument N. IIUC good detection of 2) is much harder to implement, but is it reasonable/possible to make -Wstack-usage more accurate in 1)? Dmitry
[ARM] unexpected sizeof() of a complex packed type
(The following sample is taken from my LKML post at https://lkml.org/lkml/2023/11/15/213) $ cat t-build-bug.c struct vring_tx_mac { unsigned int d[3]; unsigned int ucode_cmd; } __attribute__((packed)); struct vring_rx_mac { unsigned int d0; unsigned int d1; unsigned short w4; union { struct { unsigned short pn_15_0; unsigned int pn_47_16; } __attribute__((packed)); struct { unsigned short pn_15_0; unsigned int pn_47_16; } __attribute__((packed)) pn; }; } __attribute__((packed)); struct wil_ring_dma_addr { unsigned int addr_low; unsigned short addr_high; } __attribute__((packed)); struct vring_tx_dma { unsigned int d0; struct wil_ring_dma_addr addr; unsigned char ip_length; unsigned char b11; unsigned char error; unsigned char status; unsigned short length; } __attribute__((packed)); struct vring_tx_desc { struct vring_tx_mac mac; struct vring_tx_dma dma; } __attribute__((packed)); struct wil_ring_tx_enhanced_mac { unsigned int d[3]; unsigned short tso_mss; unsigned short scratchpad; } __attribute__((packed)); struct wil_ring_tx_enhanced_dma { unsigned char l4_hdr_len; unsigned char cmd; unsigned short w1; struct wil_ring_dma_addr addr; unsigned char ip_length; unsigned char b11; unsigned short addr_high_high; unsigned short length; } __attribute__((packed)); struct wil_tx_enhanced_desc { struct wil_ring_tx_enhanced_mac mac; struct wil_ring_tx_enhanced_dma dma; } __attribute__((packed)); union wil_tx_desc { struct vring_tx_desc legacy; struct wil_tx_enhanced_desc enhanced; } __attribute__((packed)); struct vring_rx_dma { unsigned int d0; struct wil_ring_dma_addr addr; unsigned char ip_length; unsigned char b11; unsigned char error; unsigned char status; unsigned short length; } __attribute__((packed)); struct vring_rx_desc { struct vring_rx_mac mac; struct vring_rx_dma dma; } __attribute__((packed)); struct wil_ring_rx_enhanced_mac { unsigned int d[3]; unsigned short buff_id; unsigned short reserved; } __attribute((packed)); struct wil_ring_rx_enhanced_dma { unsigned int d0; struct wil_ring_dma_addr addr; unsigned short w5; unsigned short addr_high_high; unsigned short length; } __attribute((packed)); struct wil_rx_enhanced_desc { struct wil_ring_rx_enhanced_mac mac; struct wil_ring_rx_enhanced_dma dma; } __attribute((packed)); union wil_rx_desc { struct vring_rx_desc legacy; struct wil_rx_enhanced_desc enhanced; } __attribute__((packed)); union wil_ring_desc { union wil_tx_desc tx; union wil_rx_desc rx; } __attribute__((packed)); int f (void) { return sizeof(union wil_ring_desc); } $ arm-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=arm-linux-gnu-gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/arm-linux-gnueabi/13/lto-wrapper Target: arm-linux-gnueabi Configured with: ../gcc-13.2.1-20230728/configure --bindir=/usr/bin --build=x86_64-redhat-linux-gnu --datadir=/usr/share --disable-decimal-float --disable-dependency-tracking --disable-gold --disable-libgcj --disable-libgomp --disable-libmpx --disable-libquadmath --disable-libssp --disable-libunwind-exceptions --disable-shared --disable-silent-rules --disable-sjlj-exceptions --disable-threads --with-ld=/usr/bin/arm-linux-gnu-ld --enable-__cxa_atexit --enable-checking=release --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++ --enable-linker-build-id --enable-lto --enable-nls --enable-obsolete --enable-plugin --enable-targets=all --exec-prefix=/usr --host=x86_64-redhat-linux-gnu --includedir=/usr/include --infodir=/usr/share/info --libexecdir=/usr/libexec --localstatedir=/var --mandir=/usr/share/man --prefix=/usr --program-prefix=arm-linux-gnu- --sbindir=/usr/sbin --sharedstatedir=/var/lib --sysconfdir=/etc --target=arm-linux-gnueabi --with-bugurl=http://bugzilla.redhat.com/bugzilla/ --with-gcc-major-version-only --with-isl --with-newlib --with-plugin-ld=/usr/bin/arm-linux-gnu-ld --with-sysroot=/usr/arm-linux-gnu/sys-root --with-system-libunwind --with-system-zlib --without-headers --with-tune=generic-armv7-a --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --enable-gnu-indirect-function --with-linker-hash-style=gnu Thread model: single Supported LTO compression algorithms: zlib zstd gcc version 13.2.1 20230728 (Red Hat Cross 13.2.1-1) (GCC) $ arm-linux-gnu-gcc -Os -c t-build-bug.c $ arm-linux-gnu-objdump -j .text -D t-build-bug.o t-build-bug.o: file format elf32-littlearm Disassembly of section .text: : 0: e3a00020mov r0, #32 ;; As expected 4: e12ff
Compiler support for write barrier insertion ?
Hello all, I have a question about possible cooperation between the compiler and hypothetical garbage collector. Unfortunately, my experience around GCC internals is too small, so I would like to ask compiler specialists before re-inventing an ugly bicycle... The most non-straightforward garbage collection methods, including, but not limited to, the most frequently used generational and incremental techniques, requires a 'write barrier' or 'store barrier' - a piece of code which is executed when a pointer (within one object, usually) to some another object is written. The following methods are used widely: 1) call barrier code explicitly when it's needed, determining such places by static analysis performed by the programmer; 2) allocate objects from the heaps with OS 'page-aware' structures, then use OS memory protection for the underlying pages and handle protection faults; 3) rely on the compiler support, which means the compiler should emit some code when the pointer store is generated. Each of these methods has their own pitfails. In short, 1) it's very error-prone - one missed barrier may break everything. For 2), it's system-dependend and slow due to signal handling. For 3), an insertion of a write barrier at each pointer store is obviously redundant and will introduce an enormous overhead for any real program. So I'm investigating the possibility (and usability) of a hybrid scheme which is based on both 1) and 3). An idea is to use GCC attributes machinery to inform the compiler about special treatment of some pointers. As an example, consider the following structure, with hypothetical attribute attached to one of it's member: struct obj { int value; char *name; struct obj *next __attribute__((trapped)); }; Here 'next' is 'write-barriered' pointer. During compilation, the compiler should see that at least one member of 'struct obj' is trapped, and emit a call for special function, for example '__builtin_obj_trap' when seeing a write to 'next' via pointer of type 'struct obj *', for example: struct obj *prev, *curr = alloc_obj (); ... curr = alloc_obj (); curr->value = 1234; /* Works as usual. */ curr->name = name; /* Works as usual. */ curr->next = prev; /* A call of __builtin_obj_trap() is arranged here, for example, immediately after store instruction */ ... The special function may have the following prototype: void __builtin_obj_trap (void *obj, int offset) where 'obj' is a pointer to the structure contains trapped member ('curr' in this example) and 'offset' is equal to 'offsetof (struct obj, next)'. This functon must be provided by programmer (which is similar to providing __cyg_profile_func_* when '-finstrument-functions' is used). This method has an obvious pitfail: 'memset (curr, 0, sizeof (struct obj))' or 'memcpy (curr, otherobj, sizeof (struct obj))' probably can't be catched by the compiler. Another interesting situation is: struct obj **opp, *prev, *curr = alloc_obj (); ... opp = &curr->next; *opp = prev; For this case, taking an address of trapped pointer may issue a warning since it creates a way to rewrite trapped pointer bypassing write barrier. (The real hack is to treat 'trapped' attribute as 'promoted by assignment', i.e. 'opp' becomes 'trapped' automagically after initialization and writes through 'opp' are also surrounded by write barrier). Of course, you may ask: why not just having 'void set_obj_next (struct obj *obj, struct obj *next)' with write barrier inside ? The short answers are: 1) If 'struct obj' has 100 trapped members, having 100 set_XXX functions or macros just to set the fields is ugly; 2) Migration from explicit memory management to garbage collection - if you have 1M lines of code which uses 'XXX->next = ...', it's quite hard to rewrite all stuff even with the help of modern refactoring tools. I realize that the whole thing is very specific and probably will never used by the most of compiler users. But, anyway, is it technically possible to implement such thing ? How much overhead it may introduce ? Thanks, Dmitry
Re: On -Wmaybe-uninitialized
On 1/30/25 4:29 PM, David Malcolm wrote: Arguably the state-merging code could be smarter here; I haven't investigated the details, but have filed it as PR analyzer/118702 here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118702 Thanks. You might be also interesting in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118707 I've just created. Dmitry
On -Wmaybe-uninitialized
With (probably) -Wmaybe-uninitialized and/or -Wextra, shouldn't the compiler emit warning about possibly uninitialized 'y' passed to 'ddd()' in the example below? struct T { int a; int b; }; extern int bbb (struct T *, int *); extern int ccc (struct T *, int *); extern int ddd (struct T *, int); int aaa (struct T *t) { int x = 0, y; /* 'y' is uninitialized */ if (t->a) /* if this condition is true */ goto l; x += bbb (t, &y); l: if (t->b) /* and this condition is false */ x += ccc (t, &y); x += ddd (t, y); /* then 'y' is passed to ddd() uninitialized */ return x; } Dmitry