> Date: Wed, 24 Mar 2021 20:58:48 +0100
> From: Marcus Glocker <mar...@nazgul.ch>
> 
> On Tue, Mar 23, 2021 at 09:52:42AM -0600, Theo de Raadt wrote:
> 
> > Mark Kettenis <mark.kette...@xs4all.nl> wrote:
> > 
> > > > > Index: sys/dev/sdmmc/sdmmc_scsi.c
> > > > > ===================================================================
> > > > > RCS file: /cvs/src/sys/dev/sdmmc/sdmmc_scsi.c,v
> > > > > retrieving revision 1.59
> > > > > diff -u -p -u -p -r1.59 sdmmc_scsi.c
> > > > > --- sys/dev/sdmmc/sdmmc_scsi.c        15 Oct 2020 13:22:13 -0000      
> > > > > 1.59
> > > > > +++ sys/dev/sdmmc/sdmmc_scsi.c        23 Mar 2021 07:32:52 -0000
> > > > > @@ -365,9 +365,8 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
> > > > >       /* A read or write operation. */
> > > > >       sdmmc_scsi_decode_rw(xs, &blockno, &blockcnt);
> > > > >  
> > > > > -     if (blockno >= tgt->card->csd.capacity ||
> > > > > -         blockno + blockcnt > tgt->card->csd.capacity) {
> > > > > -             DPRINTF(("%s: out of bounds %u-%u >= %u\n", DEVNAME(sc),
> > > > > +     if (blockno + blockcnt >= tgt->card->csd.capacity) {
> > > > > +             DPRINTF(("%s: out of bounds %u+%u >= %u\n", DEVNAME(sc),
> > > > >                   blockno, blockcnt, tgt->card->csd.capacity));
> > > > >               xs->error = XS_DRIVER_STUFFUP;
> > > > >               scsi_done(xs);
> > > 
> > > Apart from a potential overflow, the original condition looks correct
> > > to me.  If csd.capacity is 16 and blockno is 0 and blockcnt is 16, the
> > > operation should succeed, shouldn't it?
> 
> Yes, you're right.  In my case the patch suppressed the issue by
> skipping the last block of the disk.
>  
> > Hmm.  Maybe this is a specific SD card with a bug accessing the last sector.
> > 
> > A manual dd using
> > 
> >   skip='c partition size -1' count=1 bs=512
> > 
> > Is failing on that SD card, but it is working on other SD cards.  Which 
> > appear
> > to all be SD v2 cards.
> > 
> > Maybe his specific SD card has an off-by-one bug relating to the last 
> > sector,
> > and we need to determine a way to handle that, or skip the last sector on
> > all devices of this sub-class?
> > 
> > I don't see any issue in the controller code.   Marcus, can you move the 
> > card
> > to another machine (different controller) and use dd to read the last 
> > sector?
> > Though I suspect that might not provide enough clue either, the "fail to 
> > read"
> > behaviour might vary between controllers...
> > 
> > > > blockno and blockcnt are both 32-bit, so this feels dangerously close
> > > > to integer wrap, and I believe the first condition cannot be deleted.
> > > 
> > > But the second condition can still wrap.  So it probably needs to be
> > > something like:
> > > 
> > >    if (blockno >= tgt->card->csd.capacity ||
> > >        blockcnt > tgt->card->csd.capacity - blockno)
> > 
> > All 3 variables are int.  Not sure how moving the + to - on the other
> > side changes anything.
> 
> In the meantime I figured out how to trigger the SD controller command
> to fail.  It always happens if you read more than one block through one
> command, where the last block is accessing the last disk block.  If you
> only access the last disk block directly, the command doesn't fail.
> Some examples:
> 
>       # dd if=/dev/rsd0c of=/dev/null skip=15591935 bs=4096 count=1
>       nblks=8, datalen=4096, blklen=512, arg=0x76f4ff8, cmd=0x8052
>       CMD FAIL!
>       bcmsdhost0: transfer timeout!
> 
>       # dd if=/dev/rsd0c of=/dev/null skip=31183871 bs=2048 count=1
>       CMD FAIL!
>       bcmsdhost0: transfer timeout!
> 
>       # dd if=/dev/rsd0c of=/dev/null skip=62367743 bs=1024 count=1
>       CMD FAIL!
>       bcmsdhost0: transfer timeout!
> 
>       # dd if=/dev/rsd0c of=/dev/null skip=124735487 bs=512 count=1
>       < all good >
> 
> I don't think it's an issue with the microSD card since I have NetBSD
> installed on a different card of exactly the same type/size, and there I
> also can trigger the issue:
> 
>       # dd if=/dev/rld0c of=/dev/null skip=15591935 bs=4096 count=1
>       [ 179.4170221] ld0c: error reading fsbn 124735480 of 124735480-124735487
>       (ld0 bn 124735480; cn 60905 tn 63 sn 24), retrying
> 
> Also I have tried that on the Pine64/sximmc(4) with the same microSD
> card, and there all is hunky-dory.
> 
> It seems to be a behavior/bug of the BCM2835 SD controller.  But I'm
> not sure what's the right approach to work around that in the driver, or
> even if there is a real fix by changing the command code.
> 
> I mean one can just change the disklabel to skip the last block, but
> it's not really a solution since somebody lazy like me, who does a
> very simple disklabel where 'a'=root, and 'b'=swap is the rest of the
> disk can easily trigger that "bug" through e.g. savecore(8) which we
> run by default during boot.  At least in this case it's a very good
> debugger :-)
> 

So SD cards have different commands for reading (and writing) a single
block and multiple blocks.  The command to read a single block does
just that.  But the command that reads multiple blocks will make the
card continue to read blocks until you tell it to stop.  Most SD
controllers will automatically send a stop command when the number of
blocks you told it to read has been received.  Some controllers
require us to explicitly send a stop command.  Not sure what category
bcmsdhost(4) belongs to.

What I suspect is happening here is that the stop command isn't sent
at the right moment.  So the card returns too many blocks.  That
doesn't really matter since the DMA controller will only transfer the
date we requested to memory.  But if this happens when reading the
last block the card returns an error and this error is reported by the
driver even though all the data we are interested in has been
transferred to memory.

Reply via email to