On 27.11.2019 01:09, Kevin Wolf wrote: > Am 26.11.2019 um 22:24 hat Alexander Popov geschrieben: >> Hello Kevin, >> >> Thanks for your review, >> >> On 21.11.2019 18:03, Kevin Wolf wrote: >>> Am 14.11.2019 um 18:25 hat Alexander Popov geschrieben: >>>> The commit a718978ed58a from July 2015 introduced the assertion which >>>> implies that the size of successful DMA transfers handled in ide_dma_cb() >>>> should be multiple of 512 (the size of a sector). But guest systems can >>>> initiate DMA transfers that don't fit this requirement. >>>> >>>> PoC for Linux that uses SCSI_IOCTL_SEND_COMMAND to perform such an ATA >>>> command and crash qemu: >> ... >>> >>> It would be nicer to turn the reproducer into a test case for >>> tests/ide-test.c. >> >> Yes, I can do that. >> >>>> For fixing that let's check the number of bytes prepared for the transfer >>>> by the prepare_buf() handler. If it is not a multiple of 512 then end >>>> the DMA transfer with an error. >>>> >>>> That also fixes the I/O stall in guests after a DMA transfer request >>>> for less than the size of a sector. >>>> >>>> Signed-off-by: Alexander Popov <[email protected]> >>> >>> This patch makes ide-test fail: >>> >>> TEST check-qtest-x86_64: tests/ide-test >>> ** >>> ERROR:tests/ide-test.c:469:test_bmdma_short_prdt: assertion failed (status >>> == 0): (0x00000004 == 0x00000000) >>> ERROR - Bail out! ERROR:tests/ide-test.c:469:test_bmdma_short_prdt: >>> assertion failed (status == 0): (0x00000004 == 0x00000000) >> >> Thanks for the notice. >> Yes, I can reproduce it too with `make check-qtest-i386`. >> >>>> diff --git a/hw/ide/core.c b/hw/ide/core.c >>>> index 754ff4dc34..85aac614f0 100644 >>>> --- a/hw/ide/core.c >>>> +++ b/hw/ide/core.c >>>> @@ -849,6 +849,7 @@ static void ide_dma_cb(void *opaque, int ret) >>>> int64_t sector_num; >>>> uint64_t offset; >>>> bool stay_active = false; >>>> + int32_t prepared = 0; >>>> >>>> if (ret == -EINVAL) { >>>> ide_dma_error(s); >>>> @@ -892,12 +893,10 @@ static void ide_dma_cb(void *opaque, int ret) >>>> n = s->nsector; >>>> s->io_buffer_index = 0; >>>> s->io_buffer_size = n * 512; >>>> - if (s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size) < >>>> 512) { >>>> - /* The PRDs were too short. Reset the Active bit, but don't raise >>>> an >>>> - * interrupt. */ >>>> - s->status = READY_STAT | SEEK_STAT; >>>> - dma_buf_commit(s, 0); >>>> - goto eot; >>>> + prepared = s->bus->dma->ops->prepare_buf(s->bus->dma, >>>> s->io_buffer_size); >>>> + if (prepared % 512) { >>>> + ide_dma_error(s); >>> >>> Which I assume is because you changed the error mode here compared to >>> the old version. >> >> Yes, you are right. >> >>> I'm not sure offhand what the correct behaviour is for non-aligned >>> values > 512. I think we actually have two cases here: Either a short or >>> a long PRD. The commit message should explain this with spec references >>> and a test case should be added for both cases. >> >> I've found the "Programming Interface for Bus Master IDE Controller" >> (revision >> 1.0 5/16/94). The chapter 3.1 (Status Bit Interpretation) provides some >> answers. > > Yes, I think that's the same that I've used before. I assume it's the > relevant spec. > >> It says that: >> 1. If PRD's specified a smaller size than the IDE transfer size, then the >> Interrupt and Active bits in the Controller status register are not set. > >> 2. If the size of the physical memory regions was larger than the IDE >> device >> transfer size, the Interrupt and Active bits in the Controller status >> register >> are both set to 1. >> >> So my changing of the error mode in short PRD's case was wrong, and the >> test_bmdma_short_prdt() is correct. > > Yes, I think 1. is implemented correctly for PRDs that are too small and > smaller than a sector. > > I think the assumption may have been that if the PRDT contains at least > one more full sector, we'll do that one sector before coming back to the > same place and hitting the code path for a too short PRDT. > > However, the code neglects to actually use the return value of > .prepare_buf() to limit the number of sectors accessed. So if we ask for > a scatter/gather list for 5 sectors and we get 3 sectors, we still > assume we can write to all 5. This is obviously wrong. > >> Now let's think about the proper fix of the qemu crash. >> >> Currently I don't really understand how ide_dma_cb() emulates the logic >> described in Status Bit Interpretation chapter. I don't see any comparison >> between the DMA transfer size and PRD's size. >> >> We only have this check against the size of a sector (512 bytes), which >> doesn't >> catch all short PRD's cases (PRD in my PoC is 1337 bytes). > > I think for making the above assumption work, we'd have to check the > return value, which gets us something like: > > ret = s->bus->dma->ops->prepare_buf() > if (ret < 512) { > ... short PRDT code ... > } else if (ret < n * 512) { > n = ret / 512; > } > > Instead of doing the extra iteration and executing I/O for the first > part of the request, maybe this would work, too: > > ret = s->bus->dma->ops->prepare_buf() > if (ret < n * 512) { > ... short PRDT code ... > } > > We need to check in the spec whether we're supposed to actually do > partial I/O for short PRDTs. I couldn't find a clear answer with a quick > look, but I'm leaning towards doing the partial I/O (i.e. implementing > the first pseudo-code piece above). > > > As for handling long PRDTs, we have code that looks like it's meant to > handle the case: > > n = s->io_buffer_size >> 9; > if (n > s->nsector) { > /* The PRDs were longer than needed for this request. Shorten them so > * we don't get a negative remainder. The Active bit must remain set > * after the request completes. */ > n = s->nsector; > stay_active = true; > } > > bmdma_prepare_buf() does potentially set s->io_buffer_size to a value > larger than the passed limit, so maybe this is already correct. We have > a basic test for it in test_bmdma_long_prdt(), but I can't rule out that > there are more complicated cases where it fails. > > I'm pretty sure we must handle the long PRDT case only after doing I/O > (like we currently do) because the operation is supposed to have > completed by the time we signal that the PRDT was long, so the guest can > trust that a read has actually read something and a write has reached > the disk. The spec says "This is a valid completion case". > > > Does this make sense to you?
Thanks a lot, Kevin! First of all I'll improve the unit-tests to cover all cases. Then I'll try both approaches you described and return with the results. I'll also try to find more info about partial I/O behavior in other datasheets. Best regards, Alexander
