> On Jun 26, 2023, at 04:32, Mark Millard <mark...@yahoo.com> wrote: > > On Jun 24, 2023, at 17:25, Mark Millard <mark...@yahoo.com> wrote: > >> On Jun 24, 2023, at 14:26, John F Carr <j...@mit.edu> wrote: >> >>> >>>> On Jun 24, 2023, at 13:00, Mark Millard <mark...@yahoo.com> wrote: >>>> >>>> The running system build is a non-debug build (but >>>> with symbols not stripped). >>>> >>>> The HoneyComb's console log shows: >>>> >>>> . . . >>>> GEOM_STRIPE: Device stripe.IMfBZr destroyed. >>>> GEOM_NOP: Device md0.nop created. >>>> g_vfs_done():md0.nop[READ(offset=5885952, length=8192)]error = 5 >>>> GEOM_NOP: Device md0.nop removed. >>>> GEOM_NOP: Device md0.nop created. >>>> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5 >>>> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5 >>>> GEOM_NOP: Device md0.nop removed. >>>> GEOM_NOP: Device md0.nop created. >>>> GEOM_NOP: Device md0.nop removed. >>>> Fatal data abort: >>>> x0: ffffa02506e64400 >>>> x1: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>>> x2: 4b >>>> x3: a343932b0b22fb30 >>>> x4: 0 >>>> x5: 3310b0d062d0e1d >>>> x6: 1d0e2d060d0b3103 >>>> x7: 0 >>>> x8: ea325df8 >>>> x9: ffff0001eec946d0 ($d.6 + 0) >>>> x10: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>>> x11: 0 >>>> x12: 0 >>>> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0) >>>> x14: 0 >>>> x15: ffffa02506e64405 >>>> x16: ffff0001eec94860 (_DYNAMIC + 160) >>>> x17: ffff00000063a450 (ifc_attach_cloner + 0) >>>> x18: ffff0001eb290400 (g_raid3_post_sync + 48a3178) >>>> x19: ffff0001eec94600 (vnet_epair_init_vnet_init + 0) >>>> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18) >>>> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x23: ffffa0000042e500 >>>> x24: ffffa0000042e500 >>>> x25: ffff000000ce0788 (linker_lookup_set_desc + 0) >>>> x26: ffffa0203cdef780 >>>> x27: ffff0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init + 0) >>>> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x29: ffff0001eb290430 (g_raid3_post_sync + 48a31a8) >>>> sp: ffff0001eb290400 >>>> lr: ffff0001eec82a4c ($x.1 + 3c) >>>> elr: ffff0001eec82a60 ($x.1 + 50) >>>> spsr: 60000045 >>>> far: ffff0002d8fba4c8 >>>> esr: 96000046 >>>> panic: vm_fault failed: ffff0001eec82a60 error 1 >>>> cpuid = 14 >>>> time = 1687625470 >>>> KDB: stack backtrace: >>>> db_trace_self() at db_trace_self >>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>>> vpanic() at vpanic+0x13c >>>> panic() at panic+0x44 >>>> data_abort() at data_abort+0x2fc >>>> handle_el1h_sync() at handle_el1h_sync+0x14 >>>> --- exception, esr 0x96000046 >>>> $x.1() at $x.1+0x50 >>>> vnet_register_sysinit() at vnet_register_sysinit+0x114 >>>> linker_load_module() at linker_load_module+0xae4 >>>> kern_kldload() at kern_kldload+0xfc >>>> sys_kldload() at sys_kldload+0x60 >>>> do_el0_sync() at do_el0_sync+0x608 >>>> handle_el0_sync() at handle_el0_sync+0x44 >>>> --- exception, esr 0x56000000 >>>> KDB: enter: panic >>>> [ thread pid 70419 tid 101003 ] >>>> Stopped at kdb_enter+0x44: str xzr, [x19, #3200] >>>> db> >>> >>> The failure appears to be initializing module if_epair. >> >> Yep: trying: >> >> # kldload if_epair.ko >> >> was enough to cause the crash. (Just a HoneyComb context at >> that point.) >> >> I tried media dd'd from the recent main snapshot, booting the >> same system. No crash. I moved my build boot media to some >> other systems and tested them: crashes. I tried my boot media >> built optimized for Cortex-A53 or Cortex-X1C/Cortex-A78C >> instead of Cortex-A72: no crashes. (But only one system can >> use the X1C/A78C code in that build.) >> >> So variation testing only gets the crashes for my builds >> that are code-optimized for Cortex-A72's. The same source >> tree vintage built for cortex-53 or Cortex-X1C/Cortex-A78C >> optimization does not get the crashes. But I also >> demonstrated an optmized for Cortex-A72 build from 2023-Mar >> that gets the crash. >> >> The last time I ran into one of these "crashes tied to >> cortex-a72 code optimization" examples it turned out to be >> some missing memory-model management code in FreeBSD's USB >> code. But being lucky enough to help identify a FreeBSD >> source code problem again seems not that likely. It could >> easily be a code generation error by clang for all I know. >> >> So, unless at some point I produce fairly solid evidence >> that the code actually running is messed up by FreeBSD >> source code, this should likely be treated as "blame the >> operator" and should likely be largely ignored as things >> are. (Just My Problem, as I want the Cortex-A72 optimized >> builds.) > > Turns out that the source code in question is the > assignment to V_epair_cloner below: > > static void > vnet_epair_init(const void *unused __unused) > { > struct if_clone_addreq req = { > .match_f = epair_clone_match, > .create_f = epair_clone_create, > .destroy_f = epair_clone_destroy, > }; > V_epair_cloner = ifc_attach_cloner(epairname, &req); > } > VNET_SYSINIT(vnet_epair_init, SI_SUB_PSEUDO, SI_ORDER_ANY, > vnet_epair_init, NULL); > > Example code when not optimizing for the Cortex-A72: > > 11a4c: d0000089 adrp x9, 0x23000 > 11a50: f9400248 ldr x8, [x18] > 11a54: f942c508 ldr x8, [x8, #1416] > 11a58: f943d929 ldr x9, [x9, #1968] > 11a5c: a9437bfd ldp x29, x30, [sp, #48] > 11a60: f9401508 ldr x8, [x8, #40] > 11a64: f8296900 str x0, [x8, x9] > > The code when optmizing for the Cortex-A72: > > 11a4c: f9400248 ldr x8, [x18] > 11a50: f942c508 ldr x8, [x8, #1416] > 11a54: d503201f nop > 11a58: 1008e3c9 adr x9, #72824 > 11a5c: f9401508 ldr x8, [x8, #40] > 11a60: f8296900 str x0, [x8, x9] > 11a64: a9437bfd ldp x29, x30, [sp, #48] > > It is the "str x0, [x8, x9]" that vm_fault's for > the optimized code. > > So: > > 11a4c: d0000089 adrp x9, 0x23000 > 11a58: f943d929 ldr x9, [x9, #1968] > > was optimized via replacement by: > > 11a58: 1008e3c9 adr x9, #72824 > > I.e., the optimization is based on the offset from > the instruction being fixed in order to produce the > value in x9, even if the instruction is relocated. > > This resulted in the specific x9 value shown in > the x8/x9 pair: > > x8: ea325df8 > x9: ffff0001eec946d0 > > which total's to the fault address (value > in far): > > far: ffff0002d8fba4c8 > > Is this the same as bug 264094?
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264094