Hi, (This is a follow-up for Debian bug #498183 - see [1] for details. Please keep [EMAIL PROTECTED] Cc'ed when replying.)
[1] http://bugs.debian.org/498183 On Sun, Sep 07, 2008 at 09:51:16PM +0100, Jurij Smakov wrote: > I don't have any problem reproducing it on sparc, so reopening. The > segfault occurs in rrd_open() function in librrd4, as following gdb > session illustrates (rebuilt rrd with debugging symbols to get it): [...] > (gdb) list > 363 rra_start += > 364 rrd->rra_def[i].row_cnt * rrd->stat_head->ds_cnt * > 365 sizeof(rrd_value_t); > 366 } > 367 #ifdef USE_MADVISE > 368 madvise(rrd_file->file_start + dontneed_start, > 369 rrd_file->file_len - dontneed_start, MADV_DONTNEED); > 370 #endif > 371 #ifdef HAVE_POSIX_FADVISE > 372 posix_fadvise(rrd_file->fd, dontneed_start, [...] > (gdb) print dontneed_start > $16 = 8192 > (gdb) print rrd_file->file_len > $17 = 972 > (gdb) print rrd_file->file_len - dontneed_start > $18 = 4294960076 [...] > (gdb) n > > Program received signal SIGSEGV, Segmentation fault. > rrd_dontneed (rrd_file=Cannot access memory at address 0x44) at rrd_open.c:372 > 372 posix_fadvise(rrd_file->fd, dontneed_start, > Disabling display 5 to avoid infinite recursion. > 5: i = Cannot access memory at address 0xffffffe8 (See [2] for the full session dump.) Jurij, thanks a lot for the detailed information - that was very helpful. [2] http://bugs.debian.org/498183#25 > I guess that the problem here is passing negative second argument to > madvise() which makes it very unhappy and smashes the stack, but I did > not grok the code yet to understand what's going on here. Yes, that seems to be the problem. Roughly, what's going on here is that we're stepping through all RRAs of the RRD file and mark "cold" blocks as unused (using madvise() und posix_fadvise()). dontneed_start is used as an offset into the file in that search. After we've stepped through all RRAs, the last call to madvise() (which will then trigger the segfault) marks the remainder of the file as unused as well. Now, we might have already passed the end of the file as dontneed_start is increased by multiples of the page size only. E.g. this happens if the page size is larger than the file size as in this case. For some reasons that I don't know, amd64 and i386 don't seem to care about that. I was not able to reproduce the problem but I could verify the same situation in the debugger. The attached patch should solve this issue. I've simply added a check if we're already passed the end of the file. Since I do not have access to a sparc box, I'd like to get some feedback if that really solves the issue. Also, I'd like Tobi (or anyone else involved in that specific code) to comment on that just to make sure that I did not miss some important fact. Thanks in advance! Cheers, Sebastian -- Sebastian "tokkee" Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/ Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
diff --git a/program/src/rrd_open.c b/program/src/rrd_open.c index 51b621f..af91c7b 100644 --- a/program/src/rrd_open.c +++ b/program/src/rrd_open.c @@ -363,14 +363,19 @@ void rrd_dontneed( rrd->rra_def[i].row_cnt * rrd->stat_head->ds_cnt * sizeof(rrd_value_t); } + + if (dontneed_start < rrd_file->file_len) { #ifdef USE_MADVISE - madvise(rrd_file->file_start + dontneed_start, - rrd_file->file_len - dontneed_start, MADV_DONTNEED); + madvise(rrd_file->file_start + dontneed_start, + rrd_file->file_len - dontneed_start, MADV_DONTNEED); #endif #ifdef HAVE_POSIX_FADVISE - posix_fadvise(rrd_file->fd, dontneed_start, - rrd_file->file_len - dontneed_start, POSIX_FADV_DONTNEED); + posix_fadvise(rrd_file->fd, dontneed_start, + rrd_file->file_len - dontneed_start, + POSIX_FADV_DONTNEED); #endif + } + #if defined DEBUG && DEBUG > 1 mincore_print(rrd_file, "after"); #endif
signature.asc
Description: Digital signature