bug#73022: 31.0.50; Crash in build_frame_matrix_from_leaf_window after C-x 2 and reducing terminal size
Continuing from this comment at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=71289: > That's another problem. There seems to be some disconnect, time-wise, > in reallocating frame matrices and sub-allocating window matrices from > the frame matrices, and the crash happens when the check is done > in-between those two. In particular, I found a case in which an assert fails because the window row glyph memory is not contained in the frame row glyph memory. It happens in a recent build (99a03ddb2d4) built without X support. I'm running it in a urxvt terminal but it happens in xterm too. It crashes both with and without glyph debug. This is not related to garbage collection like bug 71289. To reproduce: 1. Open emacs -Q 2. Press C-x 2 to split the frame (top/bottom) 3. Make the terminal very small (I slowly resize the X window that's running urxvt, to the minimum size, 1 row and 2 columns in my case). This shrinking process alone can produce the crash when the window is around 5 lines high 4. It always crashes in my case. If it doesn't, make the terminal larger again, and repeat the resizing for some seconds until it crashes Note that the C-x 2 is required. The problem doesn't happen with a left/right split (C-x 3). But it happens after a C-x 3 C-x 2. Manually resizing is enough to trigger the problem. But it's also possible to automate the resizing with something like this (change the first part to find the window ID): EC=$(xdotool search --name '^\gdb src/emacs') && echo $EC && while :; do for height_px in `seq 275 -25 10`; do xdotool windowsize $EC $((height_px+5)) $height_px; sleep 0.001; done; for height_px in `seq 1 25 400`; do xdotool windowsize $EC $((height_px+5)) $height_px; sleep 0.1; done; sleep 0.3 && done (gdb) bt #0 __pthread_kill_implementation (threadid=, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x7554ae8f in __pthread_kill_internal (signo=6, threadid=) at ./nptl/pthread_kill.c:78 #2 0x754fbfb2 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26 #3 0x5568d0c2 in terminate_due_to_signal (sig=6, backtrace_limit=2147483647) at emacs.c:469 #4 0x5573de15 in die ( msg=0x55856798 "frame_size_change_delayed (XFRAME (w->frame)) || glyph_row_slice_p (window_row, frame_row)", file=0x55856231 "dispnew.c", line=2647) at alloc.c:8058 #5 0x5558b697 in build_frame_matrix_from_leaf_window (frame_matrix=0x560fed00, w=0x5603cca8) at dispnew.c:2647 #6 0x5558b154 in build_frame_matrix_from_window_tree (matrix=0x560fed00, w=0x5603cca8) at dispnew.c:2536 #7 0x5558b13f in build_frame_matrix_from_window_tree (matrix=0x560fed00, w=0x56260390) at dispnew.c:2534 #8 0x5558b0e3 in build_frame_matrix (f=0x5603ca88) at dispnew.c:2520 #9 0x5558d0e3 in update_frame (f=0x5603ca88, force_p=true, inhibit_hairy_id_p=false) at dispnew.c:3336 #10 0x555d0904 in redisplay_internal () at xdisp.c:17518 #11 0x555d1244 in redisplay_preserve_echo_area (from_where=11) at xdisp.c:17801 #12 0x557f5893 in wait_reading_process_output (time_limit=97, nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=0x0, wait_proc=0x0, just_wait_proc=0) at process.c:5584 #13 0x555951d3 in sit_for (timeout=0x186, reading=true, display_option=1) at dispnew.c:6335 This is the failing assertion in frame 5, dispnew.c: (gdb) list 2642} 2643 2644#ifdef GLYPH_DEBUG 2645 /* Window row window_y must be a slice of frame row 2646 frame_y. */ 2647 eassert (frame_size_change_delayed (XFRAME (w->frame)) 2648 || glyph_row_slice_p (window_row, frame_row)); 2649 2650 /* If rows are in sync, we don't have to copy glyphs because 2651 frame and window share glyphs. */ Both conditions: frame_size_change_delayed (XFRAME (w->frame) glyph_row_slice_p (window_row, frame_row) are false. I don't know which one should be true in this case. This is the part about being delayed. (gdb) p delayed_size_change $15 = false glyph_row_slice_p contains this code: struct glyph *window_glyph_start = window_row->glyphs[0]; struct glyph *frame_glyph_start = frame_row->glyphs[0]; struct glyph *frame_glyph_end = frame_row->glyphs[LAST_AREA]; return (frame_glyph_start <= window_glyph_start && window_glyph_start < frame_glyph_end); The first part of the condition is false: (gdb) p frame_row->glyphs[0] $24 = (struct glyph *) 0x70f414c0 (gdb) p window_row->glyphs[0] $25 = (struct glyph *) 0x7088b430 (gdb) p frame_row->glyphs[0] <= window_row->glyphs[0] $26 = 0 (gdb) The second part is true: (gdb) p window_row->glyphs[0] $29 = (struct glyph *) 0x7088b430 (gdb) p frame_row->glyphs[LAST_AREA] $30 = (struct glyph *) 0x70f41970 (gdb) p window_row->glyphs[0] < frame_row->glyphs[LAST_AREA] $31 = 1 (gdb) Since the second part is true, I'm guessing that the first part shoul
bug#71289: 30.0.50; cmcheckmagic aborts when tty_write_glyphs writes "Garbage collecting..." in some cases
> > The build_frame_matrix_from_leaf_window crash still happens. > > That's another problem. There seems to be some disconnect, time-wise, > in reallocating frame matrices and sub-allocating window matrices from > the frame matrices, and the crash happens when the check is done > in-between those two. > > This will need more work. Patches and relevant data are welcome. I reported the build_frame_matrix_from_leaf_window problem separately to provide simpler instructions. (Can you mention the new bug number once it's approved?). The original problem mentioned here is about GC messages and it was fixed, so I think it can be closed. The build_frame_matrix_from_leaf_window error is easier to reproduce in comparison.
bug#71289: 30.0.50; cmcheckmagic aborts when tty_write_glyphs writes "Garbage collecting..." in some cases
> I reported the build_frame_matrix_from_leaf_window problem separately to > provide simpler instructions It's at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=73022
bug#73022: Acknowledgement (31.0.50; Crash in build_frame_matrix_from_leaf_window after C-x 2 and reducing terminal size)
> Configured using: > […] > --without-x 'CFLAGS=-g3 -O3'' I sent another build's information by mistake. The backtraces are actually from a -O0 build, with this information: In GNU Emacs 31.0.50 (build 3, x86_64-pc-linux-gnu) of 2024-09-01 built on sonn Repository revision: 99a03ddb2d43d67577814b96e40ec069739b6421 Repository branch: master System Description: Devuan GNU/Linux 5 (daedalus) Configured using: 'configure --prefix=/opt/dc/emacs-dev/ --with-tiff=no --without-tiff --without-libsystemd --without-dbus --with-mailutils --without-modules --with-native-compilation --with-x-toolkit=no --without-imagemagick --without-xft --without-harfbuzz --without-freetype --without-libotf --without-xwidgets --without-xpm --without-jpeg --without-gif --without-png --without-webp --without-rsvg --without-cairo --without-x --without-sound --enable-checking=yes,glyphs --enable-profiling 'CFLAGS=-g3 -O0 ''
bug#73022: 31.0.50; Crash in build_frame_matrix_from_leaf_window after C-x 2 and reducing terminal size
> #0 terminate_due_to_signal (sig=6, backtrace_limit=40) at > ../../src/emacs.c:432 > #1 0x0061b51b in emacs_abort () at ../../src/sysdep.c:2391 > #2 0x00541fc2 in cmcheckmagic (tty=0x1ebe0d0) at ../../src/cm.c:122 > #3 0x00546564 in tty_write_glyphs (f=0x1e7bb30, string=0x1e9afe0, > len=80) at ../../src/term.c:819 > #4 0x005508c9 in write_glyphs (f=0x1e7bb30, string=0x1e9a0e0, > len=80) at ../../src/terminal.c:164 > #5 0x0042a6d7 in update_frame_line (f=0x1e7bb30, vpos=4, > updating_menu_p=false) at ../../src/dispnew.c:5326 > #6 0x004298c5 in update_frame_1 (f=0x1e7bb30, force_p=true, > inhibit_id_p=false, set_cursor_p=true, > […] > so the original assertion violation is gone here. > We're dealing with two type of assertion violations / crashes / backtraces: - in update_frame_line, that causes chcheckmagic to fail - in build_frame_matrix_from_leaf_window, assertion violation With the same build I mentioned in the top post (that is, without any code change), I can see both, after following the same instructions I posted. The update_frame_line happens much more frequently, therefore I thought that the update_frame_line was fixed in 71289; but it's not: I just reproduced the update_frame_line issue. So you can't be sure that the original assertion violation is gone. Your build may have crashed with the update_frame_line crash, before reaching the build_frame_matrix_from_leaf_window crash. The best would be to solve both crashes :-) I don't know whether both crashes are related.
bug#73022: 31.0.50; Crash in build_frame_matrix_from_leaf_window after C-x 2 and reducing terminal size
> I noticed that causing this assertion to fail is not very easy. For > example, if I drag the terminal emulator window one line at a time, I > can never cause it, even if I get to frame sizes that are much smaller > than the minimum we need for 2 windows. Somehow, I need to drag the > frame so it resizes by several lines and/or columns. Not sure why. > I can cause the build_frame_matrix_from_leaf_window failed assertion ( glyph_row_slice_p(window_row, frame_row) is false ) when slowly resizing row by row (7→6→5 rows). The number of columns doesn't matter (can be a normal one). You can also try maximizing/unmaximizing“the window if you window manager supports it. That's a way of suddenly changing the number of rows from a normal value (e.g. 20) to a dangerous value (e.g. 4). After unmaximizing it immediately crashes (if you did the C-x 2 split).
bug#73022: 31.0.50; Crash in build_frame_matrix_from_leaf_window after C-x 2 and reducing terminal size
> I attach the patch now, sorry for not doing it earlier. > Please attach patches if I can help trying out things. I don't know this code so I may break it if I change the low-level behavior. But I can safely add many breakpoints and debug messages to research the bugs. Apparently we could use more assertions in other parts of the code, or move the existing ones (under glyph debug) to catch the problem earlier. In cmcheckmagic we have: if (!MagicWrap (tty) || curY (tty) >= FrameRows (tty) - 1) emacs_abort (); Is this an assertion that needs to always hold? I have tried copy-pasting it at the beginning of tty_write_glyphs and I found that it fails continuously (e.g. it tests: 62 >= 63-1). Actually I used a breakpoint instead of emacs_abort. Are there useful places where I could trace that the assertion holds true?, earlier places than the current place.
bug#73022: 31.0.50; Crash in build_frame_matrix_from_leaf_window after C-x 2 and reducing terminal size
> Please try the attached diff (from my heavily edited copy of master, if > it doesn't apply cleanly, complain immediately rather than messing up > your Emacs). It should fix two silly bugs in window.el that make a > frame's safe minimum size much too large and includes the fix I proposed > earlier here. It applies well; thanks. By the way, (window-min-size) still answers 4 (with and without your patch). Emacs feels a bit "stronger" in the C-x 2 scenario, meaning that it survives shrinking+enlarging the terminal, but it still crashes sometimes. But it also crashes in a new situation: just by shrinking the normal/initial/unsplit frame, i.e. without needing to do C-x 2. It didn't crash before, so the patch doesn't make the whole Emacs more stable yet. Without doing C-x 2: #4 0x556c3016 in emacs_abort () at sysdep.c:2391 #5 0x5566d36f in cmcheckmagic (tty=0x5608fa80) at cm.c:122 #6 0x55671b1b in tty_write_glyphs (f=0x56012ff0, string=0x70f4ec10, len=80) at term.c:831 #7 0x5567c11d in write_glyphs (f=0x56012ff0, string=0x70f4dd10, len=80) at terminal.c:164 #8 0x55591b60 in update_frame_line (f=0x56012ff0, vpos=3, updating_menu_p=false) at dispnew.c:5326 126 if (tty->termscript) (gdb) p curY (tty) $1 = 3 (gdb) p FrameRows (tty) - 1 $2 = 3 (gdb) This one still happens (this is after C-x 2): #5 0x5558b697 in build_frame_matrix_from_leaf_window (frame_matrix=0x56050420, w=0x56013210) at dispnew.c:2647 > > And it moves the assignment of FrameRows to handle_window_change_signal > in dispnew.c. Doing it in adjust_frame_size was silly (as Gerd Möllmann > noticed earlier). FrameRows should be the height of the tty which can > be smaller than the height of our frame. Emacs is supposed to store it > but not to modify it according to our capabilities. > I don't have the expert opinion of how it should be. But I agree that a source of the problem is that the tty can be smaller than the frame. Frame and terminal may have different sizes and this creates inconsistencies. For instance, Eli recently added this code (dispnew.c): /* This should never happen, but evidently sometimes does if one resizes the frame quickly enough. Prevent aborts in cmcheckmagic. */ if (vpos >= FRAME_TOTAL_LINES (f)) return; But this is checking the *frame*. Later, the assertion in cmcheckmagic will be made about the *terminal*. I think the frame never gets smaller than 2 lines (there's code in handle_window_change_signal to prevent it), but the tty can. So if we're in a 1-line terminal, and updating the 2nd line, the above code sees nothing wrong (the frame still has 2 lines) but cmcheckmagick won't like it (the terminal doesn't have a 2nd line). (Usual disclaimer: I don't know very well how this works. I'm often brainstorming). > And it moves the assignment of FrameRows to handle_window_change_signal > in dispnew.c. In practice, that code (in the new place you put it) is still inside an if: the if (width > 5 && height > 2) in handle_window_change_signal. This means that in e.g. a 1-line terminal, you won't be setting FrameRows to the actual number of lines of terminal (1), as you wanted. It will still be the old value, e.g. 2, and it may make cmcheckmagic crash by trying to do things in line 2 (which exists in the frame, which has 2 lines) but not in the terminal (which doesn't have a 2nd line). One option seems to have FrameRows always match the amount of terminal lines. Can we remove the "if (… height >2)" limitation, and allow 1-line, 2-line frames etc.? (To really match the terminal size). Other changes may be needed in other places, to avoid working with very small frames.. Otherwise, if we let FrameRows be larger than the amount of terminal lines, another option could be: At the beginning of tty_write_glyphs, check whether we've been asked to write glyphs in a line which is higher than the amount of lines in the terminal. If so, return without writing anything. (Because the line would be invisible). Or maybe in some of the callers of tty_write_glyphs. Don't call tty_write_glyphs to write glyphs in a line which is not visible (it exists in the frame, but doesn't exist in the terminal). By the way, I saw that the cmcheckmagic code checks curY (tty) >= FrameRows (tty) - 1, note the -1, whereas the dispnew.c code I quote above doesn't use the -1. I hope this is ok. As mentioned they're checking different things so it may be ok, though I don't understand why the -1.