Hi Martin,

I think it's probably easier to just try PostgreSQL build and tests directly, without the buildfarm tooling. Ultimately that's what the buildfarm tooling is doing, except that it tests multiple branches.

I'd try cloning e.g. https://github.com/postgres/postgres, and then something like this:


./configure --enable-cassert --enable-debug --enable-nls --with-perl \
        --with-python --with-tcl --with-openssl --with-libxml \
        --with-libxslt --enable-tap-tests --with-icu

# build
make -s -j4

# run tests in a loop
while /bin/true; do make check-world; done


The --enable-tap-tests may require a couple perl packages to support the TAP stuff. I don't have the list at hand, but I can share that tomorrow when I have access to the rpi4.


regards


On 2/19/23 22:33, Martin Pieuchot wrote:
Hello Tomas,

Thanks for the report.

I'm setting up an arm64 machine to try to reproduce the crash.

Could you tell me what are the steps required to run the reproducer you
quoted below?  I read the buildfarm wiki page and I'm not interested in
running a periodic cron job...

I cloned the git repo, downloaded the latest build-farm.X.tgz, the
client and data.  I installed gmake, bison and flex and I'm now reading
the conf file I need to edit but I'm not sure how to glue everything
together.  Any example of setup on OpenBSD would be appreciated.

Thanks,
Martin


On 05/12/22(Mon) 18:09, Tomas Vondra wrote:
Synopsis:       Regular crashes on rpi4 when running PostgreSQL tests
Category:       aarch64
Environment:
        System      : OpenBSD 7.2
        Details     : OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 
21:38:32 MST 2022
                         
[email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP

        Architecture: OpenBSD.arm64
        Machine     : arm64
Description:

When running PostgreSQL regression tests (using the community buildfarm tooling)
on Raspberry Pi 4 machine, the system occasionally panics - this happens after a
small number of hours. The system is significantly slower compared to rpi4
machines running linux (by a factor of ~5x) so the whole test suite would finish
in about 24 hours, but I have never seen that to happen due to a crash.

I suspected perhaps this particular rpi4 is somehow broken, so I tried booting
a Linux and ran the same set of tests - and that worked just fine. In fact, it
completed ~10 rounds of testing over ~2 days, while on OpenBSD I can't get a
single complete run.

Another thing I suspected is faulty SD card, so I moved the work directory to
a USB flash drive and then to a reliable SSD (connected using a USB/SATA).
The SSD did improve the performance somewhat (compared to running from USB
flash drive) but the panics are still there, unfortunately.

I managed to collect a bunch of information following the ddb page for two
crashes (I can try again, if more information is needed).

For the first crash I have only the stuff from the console:

     Stopped at           panic+0x160      cmp      w21,#0x0
         TID     PID     UID   PFFLAGS    PFLAGS   CPU   COMMAND
     *178534   88804    1000         0         0     2   postgres
      464655   67171    1000         0         0     0   postgres
      470045   34591    1000         0         0     3   postgres
      326421   84018    1000         0         0     3K  postgres
db_enter() at panic+0x15c
     panic() at __assert+0x24
     panic() at uvm_fault_upper_lookup+0x258
     uvm_fault_upper() at uvm_fault+0xec
     uvm_fault() at udata_abort+0x128
     udata_abort() at do_el0_sync+0xdc
     do_el0_sync() at handle_el0_sync+0x74

For the second crash, I have more:

     Stopped at           panic+0x160      cmp      w21,#0x0
         TID     PID     UID   PFFLAGS    PFLAGS   CPU   COMMAND
     *315901   52422    1000         0         0     0   postgres
      286288   16150    1000         0         0     3   postgres
      235152   96037       0   0x14000     0x200     1   zerothread
ddb{0}> bt
     db_enter() at panic+0x15c
     panic() at kdata_abort+0x168
     kdata_abort() at handle_el1h_sync+0x6c
     handle_el1h_sync() at pmap_copy_page+0x98
     pmap_copy_page() at pmap_copy_page+0x98
     pmap_copy_page() at uvm_fault_upper+0x13c
     uvm_fault_upper() at uvm_fault+0xb4
     uvm_fault() at udata_abort+0x128
     udata_abort() at do_el0_sync+0xdc
     do_el0_sync() at handle_el0_sync+0x74
     handle_el0_sync() at 0x1b02613208

     ddb{0}> show uvm
     Current UVM status:
       pagesize=4096 (0×1000), pagemask=0xfff, pageshift=12
       967776 VM pages: 44735 active, 183278 inactive, 1 wired, 344603 free 
(43089 zero)
       min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
       freemin=32259, free-target=43012, inactive-target=74248, wired-max=322592
       faults=87269298, traps=0, intrs=0, ctxswitch=19407000 fpuswitch=8
       softint=20156649, syscalls=124374327, kmapent=21
       fault counts:
         noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
         ok relocks(total)=332129(335474), anget(retries)=35295179(0), 
amapcopy-8887865
         neighbor anon/obj pg=16711715/60577268, 
gets(lock/unlock)=18290527/335627
         cases: anon=32635596, anoncow=2659583, obj=16018472, prcopy-2268557, 
przero=33701122
       daemon and swap counts:
         woke=10195, revs=27, scans=0, obscans=0, anscans=0
         busy=0, freed=0, reactivate=0, deactivate=28030
         pageouts=0, pending=0, nswget=0
         nswapdev=1
         swpages=517387, swpginuse=0, swpgonly=0 paging=0
       kernel pointers:
         objs(kern)=0xffffff80010d6f78
ddb{0}> show bcstats
     Current Buffer Cache status:
     numbufs 88448 busymapped 0, deluri 1783
     kvaslots 2855 avail kva slots 2855
     bufpages 353757, dmapages 45613, dirtypages 7132
     pendingreads 0, pendingwrites 3
     highflips 817351, highflops 0, dmaflips 79223
ddb{0}> show panic
     *cpu: uvm_fault failed: ffffff80008e5274 esr 96000007 far ffffff80022a2338

     # mount
     /dev/sd0a on / type ffs (local)
     /dev/sd0l on /home type ffs (local, nodev, nosuid)
     /dev/sd0d on /tmp type ffs (local, nodev, nosuid)
     /dev/sd0f on /usr type ffs (local, nodev)
     /dev/sd0g on /usr/X11R6 type ffs (local, nodev)
     /dev/sd0h on /usr/local type ffs (local, nodev, wxallowed)
     /dev/sd0k on /usr/obj type ffs (local, nodev, nosuid)
     /dev/sd0j on /usr/src type ffs (local, nodev, nosuid)
     /dev/sd0e on /var type ffs (local, nodev, nosuid)
     /dev/sd1c on /mnt/data type ffs (local, noatime, nodev, nosuid)


How-To-Repeat:

Get the PostgreSQL buildfarm client installed and configured, per:

     https://buildfarm.postgresql.org/

     https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto

I can provide more detailed instructions/config if needed. Then run the whole
test suite using

     ./run_branches.pl --run-all --nosend --nostatus --verbose

which runs tests on all supported PotgreSQL branches (10-HEAD). I've never seen
the whole run complete.

Fix:

No idea.


dmesg:
OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 21:38:32 MST 2022
     [email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 4124958720 (3933MB)
avail mem = 3963834368 (3780MB)
random: good seed from bootblocks
mainbus0 at root: Raspberry Pi 4 Model B Rev 1.2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3
cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu0: 1024KB 64b/line 16-way L2 cache
cpu0: CRC32,ASID16
cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3
cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu1: 1024KB 64b/line 16-way L2 cache
cpu1: CRC32,ASID16
cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3
cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu2: 1024KB 64b/line 16-way L2 cache
cpu2: CRC32,ASID16
cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3
cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu3: 1024KB 64b/line 16-way L2 cache
cpu3: CRC32,ASID16
efi0 at mainbus0: UEFI 2.8
efi0: Das U-Boot rev 0x20211000
smbios0 at efi0: SMBIOS 3.0
smbios0: vendor U-Boot version "2021.10" date 10/01/2021
smbios0: Unknown Unknown Product
apm0 at mainbus0
simplefb0 at mainbus0: 1920x1280, 32bpp
wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"system" at mainbus0 not configured
"axi" at mainbus0 not configured
simplebus0 at mainbus0: "soc"
bcmclock0 at simplebus0
bcmmbox0 at simplebus0
bcmgpio0 at simplebus0
bcmaux0 at simplebus0
ampintc0 at simplebus0 nirq 256, ncpu 4 ipi: 0, 1, 2: "interrupt-controller"
bcmtmon0 at simplebus0
bcmdmac0 at simplebus0: DMA0 DMA2 DMA4 DMA5 DMA6 DMA7 DMA8 DMA9 DMA10
"timer" at simplebus0 not configured
pluart0 at simplebus0: rev 2, 16 byte fifo
"local_intc" at simplebus0 not configured
bcmdog0 at simplebus0
bcmirng0 at simplebus0
"firmware" at simplebus0 not configured
"power" at simplebus0 not configured
"mailbox" at simplebus0 not configured
sdhc0 at simplebus0
sdhc0: SDHC 3.0, 250 MHz base clock
sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed
"gpiomem" at simplebus0 not configured
"fb" at simplebus0 not configured
"vcsm" at simplebus0 not configured
"clocks" at mainbus0 not configured
"phy" at mainbus0 not configured
"clk-27M" at mainbus0 not configured
"clk-108M" at mainbus0 not configured
simplebus1 at mainbus0: "emmc2bus"
sdhc1 at simplebus1
sdhc1: SDHC 3.0, 100 MHz base clock
sdmmc1 at sdhc1: 8-bit, sd high-speed, mmc high-speed, ddr52, dma
"arm-pmu" at mainbus0 not configured
agtimer0 at mainbus0: 54000 kHz
simplebus2 at mainbus0: "scb"
bcmpcie0 at simplebus2
pci0 at bcmpcie0
ppb0 at pci0 dev 0 function 0 "Broadcom BCM2711" rev 0x10
pci1 at ppb0 bus 1
xhci0 at pci1 dev 0 function 0 "VIA VL805 xHCI" rev 0x01: intx, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "VIA xHCI root hub" rev 3.00/1.00 
addr 1
bse0 at simplebus2: address dc:a6:32:74:f0:2b
brgphy0 at bse0 phy 1: BCM54210E 10/100/1000baseT PHY, rev. 2
"dma" at simplebus2 not configured
"hevc-decoder" at simplebus2 not configured
"rpivid-local-intc" at simplebus2 not configured
"h264-decoder" at simplebus2 not configured
"vp9-decoder" at simplebus2 not configured
gpioleds0 at mainbus0: "led0", "led1"
"sd_io_1v8_reg" at mainbus0 not configured
"sd_vcc_reg" at mainbus0 not configured
"fixedregulator_3v3" at mainbus0 not configured
"fixedregulator_5v0" at mainbus0 not configured
simplebus3 at mainbus0: "v3dbus"
"bootloader" at mainbus0 not configured
scsibus0 at sdmmc1: 2 targets, initiator 0
sd0 at scsibus0 targ 1 lun 0: <SD/MMC, SN32G, 0080> removable
sd0: 30424MB, 512 bytes/sector, 62309376 sectors
uhub1 at uhub0 port 1 configuration 1 interface 0 "VIA Labs USB2.0 Hub" rev 
2.10/4.21 addr 2
bwfm0 at sdmmc0 function 1
manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 2 not configured
manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 3 not configured
uhidev0 at uhub1 port 4 configuration 1 interface 0 "SINO WEALTH USB KEYBOARD" 
rev 1.10/1.00 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub1 port 4 configuration 1 interface 1 "SINO WEALTH USB KEYBOARD" 
rev 1.10/1.00 addr 3
uhidev1: iclass 3/0, 5 report ids
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
ucc0 at uhidev1 reportid 3: 24 usages, 13 keys, enum
wskbd1 at ucc0 mux 1
wskbd1: connecting to wsdisplay0
uhid1 at uhidev1 reportid 5: input=0, output=0, feature=5
umass0 at uhub0 port 2 configuration 1 interface 0 "USB SanDisk 3.2Gen1" rev 
3.20/1.00 addr 4
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, initiator 0
sd1 at scsibus1 targ 1 lun 0: <USB, SanDisk 3.2Gen1, 1.00> removable 
serial.078155838107b7ab67f3
sd1: 29340MB, 512 bytes/sector, 60088320 sectors
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (9b2ba6937baf4c1f.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted
WARNING: CHECK AND RESET THE DATE!
gpio0 at bcmgpio0: 58 pins
bwfm0: address dc:a6:32:74:f0:2c

usbdevs:
Controller /dev/usb0:
addr 01: 1106:0000 VIA, xHCI root hub
         super speed, self powered, config 1, rev 1.00
         driver: uhub0
addr 02: 2109:3431 VIA Labs, USB2.0 Hub
         high speed, self powered, config 1, rev 4.21
         driver: uhub1
addr 03: 258a:0001 SINO WEALTH, USB KEYBOARD
         low speed, power 100 mA, config 1, rev 1.00
         driver: uhidev0
         driver: uhidev1
addr 04: 0781:5583 USB, SanDisk 3.2Gen1
         super speed, power 224 mA, config 1, rev 1.00, iSerial 
050160872597eba701145d5d63b2c86e87619ad5b7d72657a7bd19b6a916325ae60a00000000000000000000e5824fefff9c171083558107b7ab67f3
         driver: umass0

pcidump:
Domain /dev/pci0:
  0:0:0: Broadcom BCM2711
        0x0000: Vendor ID: 14e4, Product ID: 2711
        0x0004: Command: 0006, Status: 0010
        0x0008: Class: 06 Bridge, Subclass: 04 PCI,
                Interface: 00, Revision: 10
        0x000c: BIST: 00, Header Type: 01, Latency Timer: 00,
                Cache Line Size: 08
        0x0010: BAR empty (00000000)
        0x0014: BAR empty (00000000)
        0x0018: Primary Bus: 0, Secondary Bus: 1, Subordinate Bus: 1,
                Secondary Latency Timer: 00
        0x001c: I/O Base: 00, I/O Limit: 00, Secondary Status: 0000
        0x0020: Memory Base: c000, Memory Limit: c000
        0x0024: Prefetch Memory Base: 1001, Prefetch Memory Limit: 0001
        0x0028: Prefetch Memory Base Upper 32 Bits: 00000000
        0x002c: Prefetch Memory Limit Upper 32 Bits: 00000000
        0x0030: I/O Base Upper 16 Bits: 0000, I/O Limit Upper 16 Bits: 0000
        0x0038: Expansion ROM Base Address: 00000000
        0x003c: Interrupt Pin: 01, Line: 00, Bridge Control: 0000
        0x0048: Capability 0x01: Power Management
                State: D0
        0x00ac: Capability 0x10: PCI Express
                Max Payload Size: 128 / 512 bytes
                Max Read Request Size: 512 bytes
                Link Speed: 5.0 / 5.0 GT/s
                Link Width: x1 / x1
        0x0100: Enhanced Capability 0x01: Advanced Error Reporting
        0x0180: Enhanced Capability 0x0b: Vendor-Specific
        0x0240: Enhanced Capability 0x1e: L1 PM
        0x0000: 271114e4 00100006 06040010 00010008
        0x0010: 00000000 00000000 00010100 00000000
        0x0020: c000c000 00011001 00000000 00000000
        0x0030: 00000000 00000048 00000000 00000100
        0x0040: 00000000 00000000 4813ac01 00002008
        0x0050: 00000000 00000000 00000000 00000000
        0x0060: 00000000 00000000 00000000 00000000
        0x0070: 00000000 00000000 00000000 00000000
        0x0080: 00000000 00000000 00000000 00000000
        0x0090: 00000000 00000000 00000000 00000000
        0x00a0: 00000000 00000000 00000000 00420010
        0x00b0: 00008002 00002c10 00655c12 90120000
        0x00c0: 00000000 00400000 00010000 00000000
        0x00d0: 0008081f 00000000 80000006 00000002
        0x00e0: 00000000 00000000 00000000 00000000
        0x00f0: 00000000 00000000 00000000 00000000
  1:0:0: VIA VL805 xHCI
        0x0000: Vendor ID: 1106, Product ID: 3483
        0x0004: Command: 0006, Status: 0010
        0x0008: Class: 0c Serial Bus, Subclass: 03 USB,
                Interface: 30, Revision: 01
        0x000c: BIST: 00, Header Type: 00, Latency Timer: 00,
                Cache Line Size: 08
        0x0010: BAR mem 64bit addr: 0x00000000c0000000/0x00001000
        0x0018: BAR empty (00000000)
        0x001c: BAR empty (00000000)
        0x0020: BAR empty (00000000)
        0x0024: BAR empty (00000000)
        0x0028: Cardbus CIS: 00000000
        0x002c: Subsystem Vendor ID: 1106 Product ID: 3483
        0x0030: Expansion ROM Base Address: 00000000
        0x0038: 00000000
        0x003c: Interrupt Pin: 01 Line: 00 Min Gnt: 00 Max Lat: 00
        0x0080: Capability 0x01: Power Management
                State: D0
        0x0090: Capability 0x05: Message Signalled Interrupts (MSI)
                Enabled: no
        0x00c4: Capability 0x10: PCI Express
                Max Payload Size: 128 / 256 bytes
                Max Read Request Size: 512 bytes
                Link Speed: 5.0 / 5.0 GT/s
                Link Width: x1 / x1
        0x0100: Enhanced Capability 0x01: Advanced Error Reporting
        0x0000: 34831106 00100006 0c033001 00000008
        0x0010: c0000004 00000000 00000000 00000000
        0x0020: 00000000 00000000 00000000 34831106
        0x0030: 00000000 00000080 00000000 00000100
        0x0040: 00000000 00000100 39df4009 00000004
        0x0050: 000138a1 00000000 00000000 34831106
        0x0060: 00002030 00000000 00000000 00000000
        0x0070: 00000000 00000000 00000000 00000000
        0x0080: 89c39001 00000000 00000000 00000000
        0x0090: 0084c405 00000000 00000000 00000000
        0x00a0: 00000000 00000000 00000000 00000000
        0x00b0: 00000000 00000000 00000000 00000000
        0x00c0: 00002000 00020010 00008001 00192810
        0x00d0: 00065c12 10120043 00000000 00000000
        0x00e0: 00000000 00000000 00000012 00000000
        0x00f0: 00000000 00010022 00000000 00000000

acpidump:


Reply via email to