Hello Tomas, Thanks for the report.
I'm setting up an arm64 machine to try to reproduce the crash. Could you tell me what are the steps required to run the reproducer you quoted below? I read the buildfarm wiki page and I'm not interested in running a periodic cron job... I cloned the git repo, downloaded the latest build-farm.X.tgz, the client and data. I installed gmake, bison and flex and I'm now reading the conf file I need to edit but I'm not sure how to glue everything together. Any example of setup on OpenBSD would be appreciated. Thanks, Martin On 05/12/22(Mon) 18:09, Tomas Vondra wrote: > >Synopsis: Regular crashes on rpi4 when running PostgreSQL tests > >Category: aarch64 > >Environment: > System : OpenBSD 7.2 > Details : OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 > 21:38:32 MST 2022 > > [email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > Architecture: OpenBSD.arm64 > Machine : arm64 > >Description: > > When running PostgreSQL regression tests (using the community buildfarm > tooling) > on Raspberry Pi 4 machine, the system occasionally panics - this happens > after a > small number of hours. The system is significantly slower compared to rpi4 > machines running linux (by a factor of ~5x) so the whole test suite would > finish > in about 24 hours, but I have never seen that to happen due to a crash. > > I suspected perhaps this particular rpi4 is somehow broken, so I tried booting > a Linux and ran the same set of tests - and that worked just fine. In fact, it > completed ~10 rounds of testing over ~2 days, while on OpenBSD I can't get a > single complete run. > > Another thing I suspected is faulty SD card, so I moved the work directory to > a USB flash drive and then to a reliable SSD (connected using a USB/SATA). > The SSD did improve the performance somewhat (compared to running from USB > flash drive) but the panics are still there, unfortunately. > > I managed to collect a bunch of information following the ddb page for two > crashes (I can try again, if more information is needed). > > For the first crash I have only the stuff from the console: > > Stopped at panic+0x160 cmp w21,#0x0 > TID PID UID PFFLAGS PFLAGS CPU COMMAND > *178534 88804 1000 0 0 2 postgres > 464655 67171 1000 0 0 0 postgres > 470045 34591 1000 0 0 3 postgres > 326421 84018 1000 0 0 3K postgres > > db_enter() at panic+0x15c > panic() at __assert+0x24 > panic() at uvm_fault_upper_lookup+0x258 > uvm_fault_upper() at uvm_fault+0xec > uvm_fault() at udata_abort+0x128 > udata_abort() at do_el0_sync+0xdc > do_el0_sync() at handle_el0_sync+0x74 > > For the second crash, I have more: > > Stopped at panic+0x160 cmp w21,#0x0 > TID PID UID PFFLAGS PFLAGS CPU COMMAND > *315901 52422 1000 0 0 0 postgres > 286288 16150 1000 0 0 3 postgres > 235152 96037 0 0x14000 0x200 1 zerothread > > ddb{0}> bt > db_enter() at panic+0x15c > panic() at kdata_abort+0x168 > kdata_abort() at handle_el1h_sync+0x6c > handle_el1h_sync() at pmap_copy_page+0x98 > pmap_copy_page() at pmap_copy_page+0x98 > pmap_copy_page() at uvm_fault_upper+0x13c > uvm_fault_upper() at uvm_fault+0xb4 > uvm_fault() at udata_abort+0x128 > udata_abort() at do_el0_sync+0xdc > do_el0_sync() at handle_el0_sync+0x74 > handle_el0_sync() at 0x1b02613208 > > ddb{0}> show uvm > Current UVM status: > pagesize=4096 (0×1000), pagemask=0xfff, pageshift=12 > 967776 VM pages: 44735 active, 183278 inactive, 1 wired, 344603 free > (43089 zero) > min 10% (25) anon, 10% (25) vnode, 5% (12) vtext > freemin=32259, free-target=43012, inactive-target=74248, > wired-max=322592 > faults=87269298, traps=0, intrs=0, ctxswitch=19407000 fpuswitch=8 > softint=20156649, syscalls=124374327, kmapent=21 > fault counts: > noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0 > ok relocks(total)=332129(335474), anget(retries)=35295179(0), > amapcopy-8887865 > neighbor anon/obj pg=16711715/60577268, > gets(lock/unlock)=18290527/335627 > cases: anon=32635596, anoncow=2659583, obj=16018472, prcopy-2268557, > przero=33701122 > daemon and swap counts: > woke=10195, revs=27, scans=0, obscans=0, anscans=0 > busy=0, freed=0, reactivate=0, deactivate=28030 > pageouts=0, pending=0, nswget=0 > nswapdev=1 > swpages=517387, swpginuse=0, swpgonly=0 paging=0 > kernel pointers: > objs(kern)=0xffffff80010d6f78 > > ddb{0}> show bcstats > Current Buffer Cache status: > numbufs 88448 busymapped 0, deluri 1783 > kvaslots 2855 avail kva slots 2855 > bufpages 353757, dmapages 45613, dirtypages 7132 > pendingreads 0, pendingwrites 3 > highflips 817351, highflops 0, dmaflips 79223 > > ddb{0}> show panic > *cpu: uvm_fault failed: ffffff80008e5274 esr 96000007 far ffffff80022a2338 > > # mount > /dev/sd0a on / type ffs (local) > /dev/sd0l on /home type ffs (local, nodev, nosuid) > /dev/sd0d on /tmp type ffs (local, nodev, nosuid) > /dev/sd0f on /usr type ffs (local, nodev) > /dev/sd0g on /usr/X11R6 type ffs (local, nodev) > /dev/sd0h on /usr/local type ffs (local, nodev, wxallowed) > /dev/sd0k on /usr/obj type ffs (local, nodev, nosuid) > /dev/sd0j on /usr/src type ffs (local, nodev, nosuid) > /dev/sd0e on /var type ffs (local, nodev, nosuid) > /dev/sd1c on /mnt/data type ffs (local, noatime, nodev, nosuid) > > > >How-To-Repeat: > > Get the PostgreSQL buildfarm client installed and configured, per: > > https://buildfarm.postgresql.org/ > > https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto > > I can provide more detailed instructions/config if needed. Then run the whole > test suite using > > ./run_branches.pl --run-all --nosend --nostatus --verbose > > which runs tests on all supported PotgreSQL branches (10-HEAD). I've never > seen > the whole run complete. > > >Fix: > > No idea. > > > dmesg: > OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 21:38:32 MST 2022 > [email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP > real mem = 4124958720 (3933MB) > avail mem = 3963834368 (3780MB) > random: good seed from bootblocks > mainbus0 at root: Raspberry Pi 4 Model B Rev 1.2 > cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3 > cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu0: 1024KB 64b/line 16-way L2 cache > cpu0: CRC32,ASID16 > cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3 > cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu1: 1024KB 64b/line 16-way L2 cache > cpu1: CRC32,ASID16 > cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3 > cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu2: 1024KB 64b/line 16-way L2 cache > cpu2: CRC32,ASID16 > cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3 > cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu3: 1024KB 64b/line 16-way L2 cache > cpu3: CRC32,ASID16 > efi0 at mainbus0: UEFI 2.8 > efi0: Das U-Boot rev 0x20211000 > smbios0 at efi0: SMBIOS 3.0 > smbios0: vendor U-Boot version "2021.10" date 10/01/2021 > smbios0: Unknown Unknown Product > apm0 at mainbus0 > simplefb0 at mainbus0: 1920x1280, 32bpp > wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation) > wsdisplay0: screen 1-5 added (std, vt100 emulation) > "system" at mainbus0 not configured > "axi" at mainbus0 not configured > simplebus0 at mainbus0: "soc" > bcmclock0 at simplebus0 > bcmmbox0 at simplebus0 > bcmgpio0 at simplebus0 > bcmaux0 at simplebus0 > ampintc0 at simplebus0 nirq 256, ncpu 4 ipi: 0, 1, 2: "interrupt-controller" > bcmtmon0 at simplebus0 > bcmdmac0 at simplebus0: DMA0 DMA2 DMA4 DMA5 DMA6 DMA7 DMA8 DMA9 DMA10 > "timer" at simplebus0 not configured > pluart0 at simplebus0: rev 2, 16 byte fifo > "local_intc" at simplebus0 not configured > bcmdog0 at simplebus0 > bcmirng0 at simplebus0 > "firmware" at simplebus0 not configured > "power" at simplebus0 not configured > "mailbox" at simplebus0 not configured > sdhc0 at simplebus0 > sdhc0: SDHC 3.0, 250 MHz base clock > sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed > "gpiomem" at simplebus0 not configured > "fb" at simplebus0 not configured > "vcsm" at simplebus0 not configured > "clocks" at mainbus0 not configured > "phy" at mainbus0 not configured > "clk-27M" at mainbus0 not configured > "clk-108M" at mainbus0 not configured > simplebus1 at mainbus0: "emmc2bus" > sdhc1 at simplebus1 > sdhc1: SDHC 3.0, 100 MHz base clock > sdmmc1 at sdhc1: 8-bit, sd high-speed, mmc high-speed, ddr52, dma > "arm-pmu" at mainbus0 not configured > agtimer0 at mainbus0: 54000 kHz > simplebus2 at mainbus0: "scb" > bcmpcie0 at simplebus2 > pci0 at bcmpcie0 > ppb0 at pci0 dev 0 function 0 "Broadcom BCM2711" rev 0x10 > pci1 at ppb0 bus 1 > xhci0 at pci1 dev 0 function 0 "VIA VL805 xHCI" rev 0x01: intx, xHCI 1.0 > usb0 at xhci0: USB revision 3.0 > uhub0 at usb0 configuration 1 interface 0 "VIA xHCI root hub" rev 3.00/1.00 > addr 1 > bse0 at simplebus2: address dc:a6:32:74:f0:2b > brgphy0 at bse0 phy 1: BCM54210E 10/100/1000baseT PHY, rev. 2 > "dma" at simplebus2 not configured > "hevc-decoder" at simplebus2 not configured > "rpivid-local-intc" at simplebus2 not configured > "h264-decoder" at simplebus2 not configured > "vp9-decoder" at simplebus2 not configured > gpioleds0 at mainbus0: "led0", "led1" > "sd_io_1v8_reg" at mainbus0 not configured > "sd_vcc_reg" at mainbus0 not configured > "fixedregulator_3v3" at mainbus0 not configured > "fixedregulator_5v0" at mainbus0 not configured > simplebus3 at mainbus0: "v3dbus" > "bootloader" at mainbus0 not configured > scsibus0 at sdmmc1: 2 targets, initiator 0 > sd0 at scsibus0 targ 1 lun 0: <SD/MMC, SN32G, 0080> removable > sd0: 30424MB, 512 bytes/sector, 62309376 sectors > uhub1 at uhub0 port 1 configuration 1 interface 0 "VIA Labs USB2.0 Hub" rev > 2.10/4.21 addr 2 > bwfm0 at sdmmc0 function 1 > manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 2 not configured > manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 3 not configured > uhidev0 at uhub1 port 4 configuration 1 interface 0 "SINO WEALTH USB > KEYBOARD" rev 1.10/1.00 addr 3 > uhidev0: iclass 3/1 > ukbd0 at uhidev0: 8 variable keys, 6 key codes > wskbd0 at ukbd0: console keyboard, using wsdisplay0 > uhidev1 at uhub1 port 4 configuration 1 interface 1 "SINO WEALTH USB > KEYBOARD" rev 1.10/1.00 addr 3 > uhidev1: iclass 3/0, 5 report ids > uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0 > ucc0 at uhidev1 reportid 3: 24 usages, 13 keys, enum > wskbd1 at ucc0 mux 1 > wskbd1: connecting to wsdisplay0 > uhid1 at uhidev1 reportid 5: input=0, output=0, feature=5 > umass0 at uhub0 port 2 configuration 1 interface 0 "USB SanDisk 3.2Gen1" rev > 3.20/1.00 addr 4 > umass0: using SCSI over Bulk-Only > scsibus1 at umass0: 2 targets, initiator 0 > sd1 at scsibus1 targ 1 lun 0: <USB, SanDisk 3.2Gen1, 1.00> removable > serial.078155838107b7ab67f3 > sd1: 29340MB, 512 bytes/sector, 60088320 sectors > vscsi0 at root > scsibus2 at vscsi0: 256 targets > softraid0 at root > scsibus3 at softraid0: 256 targets > root on sd0a (9b2ba6937baf4c1f.a) swap on sd0b dump on sd0b > WARNING: / was not properly unmounted > WARNING: CHECK AND RESET THE DATE! > gpio0 at bcmgpio0: 58 pins > bwfm0: address dc:a6:32:74:f0:2c > > usbdevs: > Controller /dev/usb0: > addr 01: 1106:0000 VIA, xHCI root hub > super speed, self powered, config 1, rev 1.00 > driver: uhub0 > addr 02: 2109:3431 VIA Labs, USB2.0 Hub > high speed, self powered, config 1, rev 4.21 > driver: uhub1 > addr 03: 258a:0001 SINO WEALTH, USB KEYBOARD > low speed, power 100 mA, config 1, rev 1.00 > driver: uhidev0 > driver: uhidev1 > addr 04: 0781:5583 USB, SanDisk 3.2Gen1 > super speed, power 224 mA, config 1, rev 1.00, iSerial > 050160872597eba701145d5d63b2c86e87619ad5b7d72657a7bd19b6a916325ae60a00000000000000000000e5824fefff9c171083558107b7ab67f3 > driver: umass0 > > pcidump: > Domain /dev/pci0: > 0:0:0: Broadcom BCM2711 > 0x0000: Vendor ID: 14e4, Product ID: 2711 > 0x0004: Command: 0006, Status: 0010 > 0x0008: Class: 06 Bridge, Subclass: 04 PCI, > Interface: 00, Revision: 10 > 0x000c: BIST: 00, Header Type: 01, Latency Timer: 00, > Cache Line Size: 08 > 0x0010: BAR empty (00000000) > 0x0014: BAR empty (00000000) > 0x0018: Primary Bus: 0, Secondary Bus: 1, Subordinate Bus: 1, > Secondary Latency Timer: 00 > 0x001c: I/O Base: 00, I/O Limit: 00, Secondary Status: 0000 > 0x0020: Memory Base: c000, Memory Limit: c000 > 0x0024: Prefetch Memory Base: 1001, Prefetch Memory Limit: 0001 > 0x0028: Prefetch Memory Base Upper 32 Bits: 00000000 > 0x002c: Prefetch Memory Limit Upper 32 Bits: 00000000 > 0x0030: I/O Base Upper 16 Bits: 0000, I/O Limit Upper 16 Bits: 0000 > 0x0038: Expansion ROM Base Address: 00000000 > 0x003c: Interrupt Pin: 01, Line: 00, Bridge Control: 0000 > 0x0048: Capability 0x01: Power Management > State: D0 > 0x00ac: Capability 0x10: PCI Express > Max Payload Size: 128 / 512 bytes > Max Read Request Size: 512 bytes > Link Speed: 5.0 / 5.0 GT/s > Link Width: x1 / x1 > 0x0100: Enhanced Capability 0x01: Advanced Error Reporting > 0x0180: Enhanced Capability 0x0b: Vendor-Specific > 0x0240: Enhanced Capability 0x1e: L1 PM > 0x0000: 271114e4 00100006 06040010 00010008 > 0x0010: 00000000 00000000 00010100 00000000 > 0x0020: c000c000 00011001 00000000 00000000 > 0x0030: 00000000 00000048 00000000 00000100 > 0x0040: 00000000 00000000 4813ac01 00002008 > 0x0050: 00000000 00000000 00000000 00000000 > 0x0060: 00000000 00000000 00000000 00000000 > 0x0070: 00000000 00000000 00000000 00000000 > 0x0080: 00000000 00000000 00000000 00000000 > 0x0090: 00000000 00000000 00000000 00000000 > 0x00a0: 00000000 00000000 00000000 00420010 > 0x00b0: 00008002 00002c10 00655c12 90120000 > 0x00c0: 00000000 00400000 00010000 00000000 > 0x00d0: 0008081f 00000000 80000006 00000002 > 0x00e0: 00000000 00000000 00000000 00000000 > 0x00f0: 00000000 00000000 00000000 00000000 > 1:0:0: VIA VL805 xHCI > 0x0000: Vendor ID: 1106, Product ID: 3483 > 0x0004: Command: 0006, Status: 0010 > 0x0008: Class: 0c Serial Bus, Subclass: 03 USB, > Interface: 30, Revision: 01 > 0x000c: BIST: 00, Header Type: 00, Latency Timer: 00, > Cache Line Size: 08 > 0x0010: BAR mem 64bit addr: 0x00000000c0000000/0x00001000 > 0x0018: BAR empty (00000000) > 0x001c: BAR empty (00000000) > 0x0020: BAR empty (00000000) > 0x0024: BAR empty (00000000) > 0x0028: Cardbus CIS: 00000000 > 0x002c: Subsystem Vendor ID: 1106 Product ID: 3483 > 0x0030: Expansion ROM Base Address: 00000000 > 0x0038: 00000000 > 0x003c: Interrupt Pin: 01 Line: 00 Min Gnt: 00 Max Lat: 00 > 0x0080: Capability 0x01: Power Management > State: D0 > 0x0090: Capability 0x05: Message Signalled Interrupts (MSI) > Enabled: no > 0x00c4: Capability 0x10: PCI Express > Max Payload Size: 128 / 256 bytes > Max Read Request Size: 512 bytes > Link Speed: 5.0 / 5.0 GT/s > Link Width: x1 / x1 > 0x0100: Enhanced Capability 0x01: Advanced Error Reporting > 0x0000: 34831106 00100006 0c033001 00000008 > 0x0010: c0000004 00000000 00000000 00000000 > 0x0020: 00000000 00000000 00000000 34831106 > 0x0030: 00000000 00000080 00000000 00000100 > 0x0040: 00000000 00000100 39df4009 00000004 > 0x0050: 000138a1 00000000 00000000 34831106 > 0x0060: 00002030 00000000 00000000 00000000 > 0x0070: 00000000 00000000 00000000 00000000 > 0x0080: 89c39001 00000000 00000000 00000000 > 0x0090: 0084c405 00000000 00000000 00000000 > 0x00a0: 00000000 00000000 00000000 00000000 > 0x00b0: 00000000 00000000 00000000 00000000 > 0x00c0: 00002000 00020010 00008001 00192810 > 0x00d0: 00065c12 10120043 00000000 00000000 > 0x00e0: 00000000 00000000 00000012 00000000 > 0x00f0: 00000000 00010022 00000000 00000000 > > acpidump: >
