On Thu, 2010-12-02 at 08:51 +0100, Hannes Reinecke wrote: > On 12/02/2010 01:14 AM, Nicholas A. Bellinger wrote: > > On Wed, 2010-12-01 at 16:46 +0100, Hannes Reinecke wrote: > >> On 12/01/2010 03:18 PM, Hannes Reinecke wrote:
<SNIP> > >> Hmpf. Using a new vista x86 image (build 6002) with SP2 preloaded > >> megasas works, too. > >> Dodgy build I had, apparently. > >> > > > > Thanks for the update.. After testing the lastest megasas.v3 HEAD > > at commit: > > > > * megasas.v3 978e61e megasas: Fixup PD query return value > > > > it appears that the same Win7 64-bit Build 7600 that is functioning with > > v0.12.5 windows7-megasas-working will now BSOD the guest. After further > > checking > > it appears that this is not megasas HBA specific, and is due to your tree > > being > > slightly more out of date than mine. ;) > > > Yes, this is totally weird. AFAICS the MMIO register data is > _exactly_ identical for both, the old working one and the new > implementation. Yet Win7 is behaving differently in both cases. > So it must be indeed the qemu base which is doing odd things here. > Ok, after spending the better part of the evening trying to identify differences between the two resv, I am inclined to agree with you here. After merging megasas.v3 into megasas-upstream-v1 and pushing into qemu-kvm.git, I did finally run into a semi meaningful BSOD with the 64-bit guest here: http://linux-iscsi.org/builds/megasas-emulation-logs/win7-64bit-megasas-BSOD-12022010-1.png which is happening after the initial run of DCMDs complete successfully, and for the first 16-byte INQUIRY frame into megasas_handle_scsi().. Here is a snippet from the log: <SNIP past initial DCMDs completed successfully> megasas: Enqueue frame 1 count 0 context 3e6 tail 0 busy 1 megasas: frame 1: MFI DCMD opcode 1040500 megasas: DCMD controller event wait megasas: MFI DCMD wrote 0 bytes megasas: Complete frame context 3e6 <Last DMCD before first MFI_CMD_PD_SCSI_IO frame: megasas: writel mmio 0xa0: ffffffff megasas: Update reply queue head 0 busy 0 megasas: writel mmio 0x34: 7ffffffb megasas: writel mmio 0x40: 1fd76381 megasas: Received frame addr 1fd76380 count 0 megasas: MFI cmd 4 context 0 count 0 megasas: Return new frame 2 cmd 0x7fb7cebd53e0 megasas: Enqueue frame 2 count 0 context 0 tail 0 busy 1 megasas: PD SCSI physical dev 0 lun 0 sdev 0x139b9f0 xfer 16 megasas: 16 bytes of data available for reading megasas: PD SCSI req 0x13bb1c0 cmd 0x7fb7cebd53e0 lun 0x139b9f0 command completed, arg 16 megasas: PD SCSI req 0x13bb1c0 cmd 0x7fb7cebd53e0 lun 0x139b9f0 read finished, len 16 megasas: PD SCSI req 0x13bb1c0 cmd 0x7fb7cebd53e0 lun 0x139b9f0 command completed, arg 0 megasas: PD SCSI req 0x13bb1c0 cmd 0x7fb7cebd53e0 lun 0x139b9f0 finished with status 0 len 16 megasas: Complete frame context 0 tail 0 busy 0 doorbell 0 megasas: readl mmio 0x30: 80000001 megasas: writel mmio 0xa0: 80000001 megasas: Update reply queue head 1 busy 0 .... and at this point a BSOD is triggered in the 64-bit Win7 guest with DRIVER_IRQL_NOT_LESS_OR_EQUAL Stop Code = 0xD1, which seems from a quick google search to indicate a problem wrt to paging and DMA transfers. So, after that I started to compare both versions w/ all megasas debugging enabled, and had another look at the code I did notice a few subtle differences however between megasas.v3 in megasas_mmio_writel:MFI_IQP* that you changed recently. So your recent change here revets back to v0.12.5 logic, but the frame_count assignment is still different: case MFI_IQP: /* Received MFI frame address */ frame_addr = (val & ~0xF); /* Add possible 64 bit offset */ frame_addr |= (uint64_t)s->frame_hi; s->frame_hi = 0; frame_count = (val & 0xF) >> 1; <SNIP> and v0.12.5 windows7-megasas-working: case MFI_IQP: /* Received MFI frames; up to 8 contiguous frames */ frame_addr = (val & ~0xF); /* Add possible 64 bit offset */ frame_addr |= (uint64_t)s->frame_hi; s->frame_hi = 0; frame_count = (val >> 1) & 0x7; <SNIP> Unfortuately this does not appear to make a difference when changing megasas.v3 follow the existing windows7-megasas-working code, and the frame_addr assignment recently changed back in megasas.v3 now matches v0.12.5 code. Both logs are attached for reference, and aside from the frame_count, the only other thing that I am noticing is that the struct megasas_cmd_t %p pointers in the working v0.12.5 are showing low memory addresses, for example: megasas: writel mmio 40: 1f15c381 megasas: Received frame addr 1f15c380 count 0 megasas: MFI cmd 4 context 0 count 0 megasas: Return new frame 2 cmd 0xf077a8 megasas: Enqueue frame context 0 tail 0 busy 1 megasas: PD SCSI dev 0 lun 0 sdev 0xf1e5a0 xfer 16 megasas: PD SCSI req 0xf38120 cmd 0xf077a8 lun 0xf1e5a0 finished with status 0 len 16 megasas: Complete frame context 0 tail 0 busy 0 doorbell 0 and the latest code is showing the same pointers for *cmd as: megasas: writel mmio 0x40: 1fd76381 megasas: Received frame addr 1fd76380 count 0 megasas: MFI cmd 4 context 0 count 0 megasas: Return new frame 2 cmd 0x7fb7cebd53e0 megasas: Enqueue frame 2 count 0 context 0 tail 0 busy 1 megasas: PD SCSI physical dev 0 lun 0 sdev 0x139b9f0 xfer 16 megasas: 16 bytes of data available for reading megasas: PD SCSI req 0x13bb1c0 cmd 0x7fb7cebd53e0 lun 0x139b9f0 command completed, arg 16 megasas: PD SCSI req 0x13bb1c0 cmd 0x7fb7cebd53e0 lun 0x139b9f0 read finished, len 16 megasas: PD SCSI req 0x13bb1c0 cmd 0x7fb7cebd53e0 lun 0x139b9f0 command completed, arg 0 megasas: PD SCSI req 0x13bb1c0 cmd 0x7fb7cebd53e0 lun 0x139b9f0 finished with status 0 len 16 megasas: Complete frame context 0 tail 0 busy 0 doorbell 0 I am not sure if this is related, but this seems like it could be something worth investigating. Also forceed fw_sge=8 and fw_cmds=1000 in megasas_scsi_init() to follow the defaults with the working v0.12.5, the again, the MMIO writes and reads up until the first 16-byte INQUIRY do appear to be identical AFAICT. Here is the full log for the megasas.v3 -> megasas-upstream-v1 code: http://linux-iscsi.org/builds/megasas-emulation-logs/win764-bit-megasas-v3.txt and the working v0.12.5 boot: http://linux-iscsi.org/builds/megasas-emulation-logs/win764-bit-megasas-v1.txt > But that's a good hint, I'll be updating my tree and see how far > I'll progress. > > > But the good news is that WinXP SP2 is now working via scsi-generic -> > > TCM_Loop in megasas.v3, and even w/o the original sync ioctl patch we > > required in v0.12.5 megasas code. Very excellent work Hannes! > > > > So, I will be merging the latest changes from megasas.v3 -> > > megasas-upstream-v1 > > shortly and retesting with 64-bit Build 7600. > > > Cool. Thanks. I'll be rebasing my patches, too. I guess it's time > for megasas.v4. > Sounds good, and please let me know if you have any other ideas or would like me to test something else. Thanks Hannes! --nab
