Synopsis: Regular crashes on rpi4 when running PostgreSQL tests
Category: aarch64
Environment:
System : OpenBSD 7.2
Details : OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19
21:38:32 MST 2022
[email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP
Architecture: OpenBSD.arm64
Machine : arm64
Description:
When running PostgreSQL regression tests (using the community buildfarm tooling)
on Raspberry Pi 4 machine, the system occasionally panics - this happens after a
small number of hours. The system is significantly slower compared to rpi4
machines running linux (by a factor of ~5x) so the whole test suite would finish
in about 24 hours, but I have never seen that to happen due to a crash.
I suspected perhaps this particular rpi4 is somehow broken, so I tried booting
a Linux and ran the same set of tests - and that worked just fine. In fact, it
completed ~10 rounds of testing over ~2 days, while on OpenBSD I can't get a
single complete run.
Another thing I suspected is faulty SD card, so I moved the work directory to
a USB flash drive and then to a reliable SSD (connected using a USB/SATA).
The SSD did improve the performance somewhat (compared to running from USB
flash drive) but the panics are still there, unfortunately.
I managed to collect a bunch of information following the ddb page for two
crashes (I can try again, if more information is needed).
For the first crash I have only the stuff from the console:
Stopped at panic+0x160 cmp w21,#0x0
TID PID UID PFFLAGS PFLAGS CPU COMMAND
*178534 88804 1000 0 0 2 postgres
464655 67171 1000 0 0 0 postgres
470045 34591 1000 0 0 3 postgres
326421 84018 1000 0 0 3K postgres
db_enter() at panic+0x15c
panic() at __assert+0x24
panic() at uvm_fault_upper_lookup+0x258
uvm_fault_upper() at uvm_fault+0xec
uvm_fault() at udata_abort+0x128
udata_abort() at do_el0_sync+0xdc
do_el0_sync() at handle_el0_sync+0x74
For the second crash, I have more:
Stopped at panic+0x160 cmp w21,#0x0
TID PID UID PFFLAGS PFLAGS CPU COMMAND
*315901 52422 1000 0 0 0 postgres
286288 16150 1000 0 0 3 postgres
235152 96037 0 0x14000 0x200 1 zerothread
ddb{0}> bt
db_enter() at panic+0x15c
panic() at kdata_abort+0x168
kdata_abort() at handle_el1h_sync+0x6c
handle_el1h_sync() at pmap_copy_page+0x98
pmap_copy_page() at pmap_copy_page+0x98
pmap_copy_page() at uvm_fault_upper+0x13c
uvm_fault_upper() at uvm_fault+0xb4
uvm_fault() at udata_abort+0x128
udata_abort() at do_el0_sync+0xdc
do_el0_sync() at handle_el0_sync+0x74
handle_el0_sync() at 0x1b02613208
ddb{0}> show uvm
Current UVM status:
pagesize=4096 (0×1000), pagemask=0xfff, pageshift=12
967776 VM pages: 44735 active, 183278 inactive, 1 wired, 344603 free
(43089 zero)
min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
freemin=32259, free-target=43012, inactive-target=74248, wired-max=322592
faults=87269298, traps=0, intrs=0, ctxswitch=19407000 fpuswitch=8
softint=20156649, syscalls=124374327, kmapent=21
fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
ok relocks(total)=332129(335474), anget(retries)=35295179(0),
amapcopy-8887865
neighbor anon/obj pg=16711715/60577268,
gets(lock/unlock)=18290527/335627
cases: anon=32635596, anoncow=2659583, obj=16018472, prcopy-2268557,
przero=33701122
daemon and swap counts:
woke=10195, revs=27, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=28030
pageouts=0, pending=0, nswget=0
nswapdev=1
swpages=517387, swpginuse=0, swpgonly=0 paging=0
kernel pointers:
objs(kern)=0xffffff80010d6f78
ddb{0}> show bcstats
Current Buffer Cache status:
numbufs 88448 busymapped 0, deluri 1783
kvaslots 2855 avail kva slots 2855
bufpages 353757, dmapages 45613, dirtypages 7132
pendingreads 0, pendingwrites 3
highflips 817351, highflops 0, dmaflips 79223
ddb{0}> show panic
*cpu: uvm_fault failed: ffffff80008e5274 esr 96000007 far ffffff80022a2338
# mount
/dev/sd0a on / type ffs (local)
/dev/sd0l on /home type ffs (local, nodev, nosuid)
/dev/sd0d on /tmp type ffs (local, nodev, nosuid)
/dev/sd0f on /usr type ffs (local, nodev)
/dev/sd0g on /usr/X11R6 type ffs (local, nodev)
/dev/sd0h on /usr/local type ffs (local, nodev, wxallowed)
/dev/sd0k on /usr/obj type ffs (local, nodev, nosuid)
/dev/sd0j on /usr/src type ffs (local, nodev, nosuid)
/dev/sd0e on /var type ffs (local, nodev, nosuid)
/dev/sd1c on /mnt/data type ffs (local, noatime, nodev, nosuid)
How-To-Repeat:
Get the PostgreSQL buildfarm client installed and configured, per:
https://buildfarm.postgresql.org/
https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto
I can provide more detailed instructions/config if needed. Then run the whole
test suite using
./run_branches.pl --run-all --nosend --nostatus --verbose
which runs tests on all supported PotgreSQL branches (10-HEAD). I've never seen
the whole run complete.
Fix:
No idea.
dmesg:
OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 21:38:32 MST 2022
[email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem = 4124958720 (3933MB)
avail mem = 3963834368 (3780MB)
random: good seed from bootblocks
mainbus0 at root: Raspberry Pi 4 Model B Rev 1.2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3
cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu0: 1024KB 64b/line 16-way L2 cache
cpu0: CRC32,ASID16
cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3
cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu1: 1024KB 64b/line 16-way L2 cache
cpu1: CRC32,ASID16
cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3
cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu2: 1024KB 64b/line 16-way L2 cache
cpu2: CRC32,ASID16
cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3
cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu3: 1024KB 64b/line 16-way L2 cache
cpu3: CRC32,ASID16
efi0 at mainbus0: UEFI 2.8
efi0: Das U-Boot rev 0x20211000
smbios0 at efi0: SMBIOS 3.0
smbios0: vendor U-Boot version "2021.10" date 10/01/2021
smbios0: Unknown Unknown Product
apm0 at mainbus0
simplefb0 at mainbus0: 1920x1280, 32bpp
wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"system" at mainbus0 not configured
"axi" at mainbus0 not configured
simplebus0 at mainbus0: "soc"
bcmclock0 at simplebus0
bcmmbox0 at simplebus0
bcmgpio0 at simplebus0
bcmaux0 at simplebus0
ampintc0 at simplebus0 nirq 256, ncpu 4 ipi: 0, 1, 2: "interrupt-controller"
bcmtmon0 at simplebus0
bcmdmac0 at simplebus0: DMA0 DMA2 DMA4 DMA5 DMA6 DMA7 DMA8 DMA9 DMA10
"timer" at simplebus0 not configured
pluart0 at simplebus0: rev 2, 16 byte fifo
"local_intc" at simplebus0 not configured
bcmdog0 at simplebus0
bcmirng0 at simplebus0
"firmware" at simplebus0 not configured
"power" at simplebus0 not configured
"mailbox" at simplebus0 not configured
sdhc0 at simplebus0
sdhc0: SDHC 3.0, 250 MHz base clock
sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed
"gpiomem" at simplebus0 not configured
"fb" at simplebus0 not configured
"vcsm" at simplebus0 not configured
"clocks" at mainbus0 not configured
"phy" at mainbus0 not configured
"clk-27M" at mainbus0 not configured
"clk-108M" at mainbus0 not configured
simplebus1 at mainbus0: "emmc2bus"
sdhc1 at simplebus1
sdhc1: SDHC 3.0, 100 MHz base clock
sdmmc1 at sdhc1: 8-bit, sd high-speed, mmc high-speed, ddr52, dma
"arm-pmu" at mainbus0 not configured
agtimer0 at mainbus0: 54000 kHz
simplebus2 at mainbus0: "scb"
bcmpcie0 at simplebus2
pci0 at bcmpcie0
ppb0 at pci0 dev 0 function 0 "Broadcom BCM2711" rev 0x10
pci1 at ppb0 bus 1
xhci0 at pci1 dev 0 function 0 "VIA VL805 xHCI" rev 0x01: intx, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "VIA xHCI root hub" rev 3.00/1.00
addr 1
bse0 at simplebus2: address dc:a6:32:74:f0:2b
brgphy0 at bse0 phy 1: BCM54210E 10/100/1000baseT PHY, rev. 2
"dma" at simplebus2 not configured
"hevc-decoder" at simplebus2 not configured
"rpivid-local-intc" at simplebus2 not configured
"h264-decoder" at simplebus2 not configured
"vp9-decoder" at simplebus2 not configured
gpioleds0 at mainbus0: "led0", "led1"
"sd_io_1v8_reg" at mainbus0 not configured
"sd_vcc_reg" at mainbus0 not configured
"fixedregulator_3v3" at mainbus0 not configured
"fixedregulator_5v0" at mainbus0 not configured
simplebus3 at mainbus0: "v3dbus"
"bootloader" at mainbus0 not configured
scsibus0 at sdmmc1: 2 targets, initiator 0
sd0 at scsibus0 targ 1 lun 0: <SD/MMC, SN32G, 0080> removable
sd0: 30424MB, 512 bytes/sector, 62309376 sectors
uhub1 at uhub0 port 1 configuration 1 interface 0 "VIA Labs USB2.0 Hub" rev
2.10/4.21 addr 2
bwfm0 at sdmmc0 function 1
manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 2 not configured
manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 3 not configured
uhidev0 at uhub1 port 4 configuration 1 interface 0 "SINO WEALTH USB KEYBOARD"
rev 1.10/1.00 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub1 port 4 configuration 1 interface 1 "SINO WEALTH USB KEYBOARD"
rev 1.10/1.00 addr 3
uhidev1: iclass 3/0, 5 report ids
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
ucc0 at uhidev1 reportid 3: 24 usages, 13 keys, enum
wskbd1 at ucc0 mux 1
wskbd1: connecting to wsdisplay0
uhid1 at uhidev1 reportid 5: input=0, output=0, feature=5
umass0 at uhub0 port 2 configuration 1 interface 0 "USB SanDisk 3.2Gen1" rev
3.20/1.00 addr 4
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, initiator 0
sd1 at scsibus1 targ 1 lun 0: <USB, SanDisk 3.2Gen1, 1.00> removable
serial.078155838107b7ab67f3
sd1: 29340MB, 512 bytes/sector, 60088320 sectors
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (9b2ba6937baf4c1f.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted
WARNING: CHECK AND RESET THE DATE!
gpio0 at bcmgpio0: 58 pins
bwfm0: address dc:a6:32:74:f0:2c
usbdevs:
Controller /dev/usb0:
addr 01: 1106:0000 VIA, xHCI root hub
super speed, self powered, config 1, rev 1.00
driver: uhub0
addr 02: 2109:3431 VIA Labs, USB2.0 Hub
high speed, self powered, config 1, rev 4.21
driver: uhub1
addr 03: 258a:0001 SINO WEALTH, USB KEYBOARD
low speed, power 100 mA, config 1, rev 1.00
driver: uhidev0
driver: uhidev1
addr 04: 0781:5583 USB, SanDisk 3.2Gen1
super speed, power 224 mA, config 1, rev 1.00, iSerial
050160872597eba701145d5d63b2c86e87619ad5b7d72657a7bd19b6a916325ae60a00000000000000000000e5824fefff9c171083558107b7ab67f3
driver: umass0
pcidump:
Domain /dev/pci0:
0:0:0: Broadcom BCM2711
0x0000: Vendor ID: 14e4, Product ID: 2711
0x0004: Command: 0006, Status: 0010
0x0008: Class: 06 Bridge, Subclass: 04 PCI,
Interface: 00, Revision: 10
0x000c: BIST: 00, Header Type: 01, Latency Timer: 00,
Cache Line Size: 08
0x0010: BAR empty (00000000)
0x0014: BAR empty (00000000)
0x0018: Primary Bus: 0, Secondary Bus: 1, Subordinate Bus: 1,
Secondary Latency Timer: 00
0x001c: I/O Base: 00, I/O Limit: 00, Secondary Status: 0000
0x0020: Memory Base: c000, Memory Limit: c000
0x0024: Prefetch Memory Base: 1001, Prefetch Memory Limit: 0001
0x0028: Prefetch Memory Base Upper 32 Bits: 00000000
0x002c: Prefetch Memory Limit Upper 32 Bits: 00000000
0x0030: I/O Base Upper 16 Bits: 0000, I/O Limit Upper 16 Bits: 0000
0x0038: Expansion ROM Base Address: 00000000
0x003c: Interrupt Pin: 01, Line: 00, Bridge Control: 0000
0x0048: Capability 0x01: Power Management
State: D0
0x00ac: Capability 0x10: PCI Express
Max Payload Size: 128 / 512 bytes
Max Read Request Size: 512 bytes
Link Speed: 5.0 / 5.0 GT/s
Link Width: x1 / x1
0x0100: Enhanced Capability 0x01: Advanced Error Reporting
0x0180: Enhanced Capability 0x0b: Vendor-Specific
0x0240: Enhanced Capability 0x1e: L1 PM
0x0000: 271114e4 00100006 06040010 00010008
0x0010: 00000000 00000000 00010100 00000000
0x0020: c000c000 00011001 00000000 00000000
0x0030: 00000000 00000048 00000000 00000100
0x0040: 00000000 00000000 4813ac01 00002008
0x0050: 00000000 00000000 00000000 00000000
0x0060: 00000000 00000000 00000000 00000000
0x0070: 00000000 00000000 00000000 00000000
0x0080: 00000000 00000000 00000000 00000000
0x0090: 00000000 00000000 00000000 00000000
0x00a0: 00000000 00000000 00000000 00420010
0x00b0: 00008002 00002c10 00655c12 90120000
0x00c0: 00000000 00400000 00010000 00000000
0x00d0: 0008081f 00000000 80000006 00000002
0x00e0: 00000000 00000000 00000000 00000000
0x00f0: 00000000 00000000 00000000 00000000
1:0:0: VIA VL805 xHCI
0x0000: Vendor ID: 1106, Product ID: 3483
0x0004: Command: 0006, Status: 0010
0x0008: Class: 0c Serial Bus, Subclass: 03 USB,
Interface: 30, Revision: 01
0x000c: BIST: 00, Header Type: 00, Latency Timer: 00,
Cache Line Size: 08
0x0010: BAR mem 64bit addr: 0x00000000c0000000/0x00001000
0x0018: BAR empty (00000000)
0x001c: BAR empty (00000000)
0x0020: BAR empty (00000000)
0x0024: BAR empty (00000000)
0x0028: Cardbus CIS: 00000000
0x002c: Subsystem Vendor ID: 1106 Product ID: 3483
0x0030: Expansion ROM Base Address: 00000000
0x0038: 00000000
0x003c: Interrupt Pin: 01 Line: 00 Min Gnt: 00 Max Lat: 00
0x0080: Capability 0x01: Power Management
State: D0
0x0090: Capability 0x05: Message Signalled Interrupts (MSI)
Enabled: no
0x00c4: Capability 0x10: PCI Express
Max Payload Size: 128 / 256 bytes
Max Read Request Size: 512 bytes
Link Speed: 5.0 / 5.0 GT/s
Link Width: x1 / x1
0x0100: Enhanced Capability 0x01: Advanced Error Reporting
0x0000: 34831106 00100006 0c033001 00000008
0x0010: c0000004 00000000 00000000 00000000
0x0020: 00000000 00000000 00000000 34831106
0x0030: 00000000 00000080 00000000 00000100
0x0040: 00000000 00000100 39df4009 00000004
0x0050: 000138a1 00000000 00000000 34831106
0x0060: 00002030 00000000 00000000 00000000
0x0070: 00000000 00000000 00000000 00000000
0x0080: 89c39001 00000000 00000000 00000000
0x0090: 0084c405 00000000 00000000 00000000
0x00a0: 00000000 00000000 00000000 00000000
0x00b0: 00000000 00000000 00000000 00000000
0x00c0: 00002000 00020010 00008001 00192810
0x00d0: 00065c12 10120043 00000000 00000000
0x00e0: 00000000 00000000 00000012 00000000
0x00f0: 00000000 00010022 00000000 00000000
acpidump: