On Tue, Oct 16, 2012 at 04:13:15PM +0200, Zbigniew Jędrzejewski-Szmek wrote: > On Tue, Oct 16, 2012 at 01:01:04AM +0200, Lennart Poettering wrote: > > On Mon, 15.10.12 23:02, Zbigniew Jędrzejewski-Szmek ([email protected]) > > wrote: > > > > > > > > On Mon, Oct 15, 2012 at 10:01:31PM +0200, Lennart Poettering wrote: > > > > On Sat, 13.10.12 17:59, Zbigniew Jędrzejewski-Szmek ([email protected]) > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I'm having trouble debugging the problem below. Maybe somebody has an > > > > > idea... When I run journalctl, on a specific (large) set of journal > > > > > logs, it segfaults. Always in the same place. > > > > > > > > Hmm, I have not see this so far. Have you tried valgrind on this? > > > Yeah, but it didn't say anything useful. > > > > > > ==32666== Invalid write of size 1 > > > ==32666== at 0x5076B12: md_close (md.c:771) > > > ==32666== by 0x41282D: journal_file_close (journal-file.c:109) > > > ==32666== by 0x4110FE: sd_journal_close (sd-journal.c:1620) > > > ==32666== by 0x406DEA: main (journalctl.c:990) > > > ==32666== Address 0x402d000 is not stack'd, malloc'd or (recently) free'd > > > ==32666== > > > ==32666== > > > ==32666== Process terminating with default action of signal 11 (SIGSEGV) > > > ==32666== Access not within mapped region at address 0x402D000 > > > ==32666== at 0x5076B12: md_close (md.c:771) > > > ==32666== by 0x41282D: journal_file_close (journal-file.c:109) > > > ==32666== by 0x4110FE: sd_journal_close (sd-journal.c:1620) > > > ==32666== by 0x406DEA: main (journalctl.c:990) > > > ==32666== If you believe this happened as a result of a stack > > > ==32666== overflow in your program's main thread (unlikely but > > > ==32666== possible), you can try to increase the size of the > > > ==32666== main thread stack using the --main-stacksize= flag. > > > ==32666== The main thread stack size used in this run was 8388608. > > > --32666-- Caught __NR_exit; running __libc_freeres() > > > --32666-- Discarding syms at 0x33981230-0x3398887c in > > > /usr/lib64/libnss_files-2.16.90.so due to munmap() > > > ==32666== > > > ==32666== HEAP SUMMARY: > > > ==32666== in use at exit: 17,860,820 bytes in 57 blocks > > > ==32666== total heap usage: 73,740 allocs, 73,683 frees, 33,511,230,790 > > > bytes allocated > > > > > > When I compile with --disable-gcrypt, everything seems to work fine > > > (no valgrind warnings). So the problem seems related to gcrypt, > > > but I can't see anything wrong by looking at the code. > > > > I wonder if valgrind actually tracks mmap()s properly. I wonder if the > > mmap_cache is mistakingly unmapping a map it shoudln't. It might be > > worth loking for munmap() invocations in mmap-cache.c and printing the > > range unmapped and comparing that with the address valgrind mentions as > > freed. > Seems that mmaps/unmmaps are not the problem: > > mmap 0x7f7c38c03000 2f5000 > mmap 0x7f7c3890e000 2f5000 > mmap 0x7f7c38618000 2f5000 > ... > mmap 0x7f7c0c53f000 2f5000 > mmap 0x7f7c0bd3e000 800000 > mmap 0x7f7c0b53d000 800000 > mmap 0x7f7c0b247000 2f5000 > mmap 0x7f7c0af51000 2f5000 > munmap 0x7f7c38c03000 2f5000 > ... > munmap 0x7f7c0c53f000 2f5000 > Segmentation fault (core dumped) > > and > a->ctx->macpads > $2 = (byte *) 0x7f7c3a3b2f88 "" > > So, no overlap, everything mmaped and unmmaped in order. BTW, this is on top of Colin's patches: (journal: Set the last_unused pointer... and journal: Properly track the number of allocated windows).
Zbyszek _______________________________________________ systemd-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel
