On 2021/03/02 00:09, Mark Schneider wrote:
> Hi,
>
> Thank you for your feeeback.
>
> Also OpenBSD 6.9beta snapshot is crashing when I setup RAID5 with three
> "Samsung PRO 860 1TB" SSDs.
> OpenBSD obsd69b.it-infra.org 6.9 GENERIC.MP#368 amd64
>
> obsd69b# dmesg | grep -i bios
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdc312018 (61 entries)
> bios0: vendor American Megatrends Inc. version "2201" date 03/23/2015
> bios0: ASUSTeK COMPUTER INC. CROSSHAIR V FORMULA-Z
> acpi0 at bios0: ACPI 5.0
Can you isolate softraid from the equation? Are the drives reliable with
this hardware configuration when not using softraid? I guess it would
need testing with simultaneous writes to the 3 drives to give a closer
match to the situation with softraid.
> > > bs=10M count=1024
> > >
> > > # Error messages
> > >
> > > uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at sr_validate_io+0x44: cmpl $0,0x40(%r9)
> > > ddb{2}>
$ objdump -dlr softraid.o | less
...skipping...
0000000000009cc0 <sr_validate_io>:
sr_validate_io():
/usr/src/sys/dev/softraid.c:4569
9cc0: 4c 8b 1d 00 00 00 00 mov 0(%rip),%r11 # 9cc7
<sr_validate_io+0x7>
9cc3: R_X86_64_PC32
__retguard_3962+0xfffffffffffffffc
9cc7: 4c 33 1c 24 xor (%rsp),%r11
9ccb: 55 push %rbp
9ccc: 48 89 e5 mov %rsp,%rbp
9ccf: 57 push %rdi
9cd0: 56 push %rsi
9cd1: 52 push %rdx
9cd2: 57 push %rdi
9cd3: 41 53 push %r11
9cd5: 50 push %rax
/usr/src/sys/dev/softraid.c:4570
9cd6: 4c 8b 47 08 mov 0x8(%rdi),%r8
/usr/src/sys/dev/softraid.c:4577
9cda: 49 8b 88 70 09 00 00 mov 0x970(%r8),%rcx
9ce1: 83 b9 94 00 00 00 00 cmpl $0x0,0x94(%rcx)
9ce8: 0f 84 a2 01 00 00 je 9e90 <sr_validate_io+0x1d0>
9cee: b8 01 00 00 00 mov $0x1,%eax
/usr/src/sys/dev/softraid.c:4580
9cf3: 41 83 b8 20 0a 00 00 cmpl $0x1,0xa20(%r8)
9cfa: 01
9cfb: 0f 84 69 01 00 00 je 9e6a <sr_validate_io+0x1aa>
9d01: 4c 8b 0f mov (%rdi),%r9
/usr/src/sys/dev/softraid.c:4586
9d04: 41 83 79 40 00 cmpl $0x0,0x40(%r9)
9d09: 74 47 je 9d52 <sr_validate_io+0x92>
/usr/src/sys/dev/softraid.c:4592
putting sr_validate_io+0x44 at the xs->datalen dereference,
4580 if (sd->sd_vol_status == BIOC_SVOFFLINE) {
4581 DNPRINTF(SR_D_DIS, "%s: %s device offline\n",
4582 DEVNAME(sd->sd_sc), func);
4583 goto bad;
4584 }
4585
4586 if (xs->datalen == 0) {
4587 printf("%s: %s: illegal block count for %s\n",
4588 DEVNAME(sd->sd_sc), func, sd->sd_meta->ssd_devname)
;
4589 goto bad;
4590 }
...so null/invalid xs?
"trace" and "sh reg" from ddb would give more clues.