> I'll submit a RFC V2 patch with a proposed fix. This will have to wait.
Recent commits have caused Solaris 10 to error out of booting much earlier than previously. Jasper Lowell. On Thu, 2020-02-27 at 05:10 +0000, jasper.low...@bt.com wrote: > I've since looked at a Ultra 5 board and can confirm that it shipped > with a CMD646U chip like the Ultra 10. > > > Since both the generic PCI IDE and CMD646 Linux drivers mention irq > > is > > cleared on reading status I think using ide_ioport_read in > > hw/ide/pci.c > > may be correct for the generic case. > > For clarity, the Linux drivers mention that for some chips reading > CFR > or ARTTIM23 will clear interrupts. Here in > /linux/drivers/ata/pata_cmd64x.c, for instance: > > static bool cmd64x_sff_irq_check(struct ata_port *ap) > { > struct pci_dev *pdev = to_pci_dev(ap->host->dev); > int irq_mask = ap->port_no ? ARTTIM23_INTR_CH1 : CFR_INTR_CH0; > int irq_reg = ap->port_no ? ARTTIM23 : CFR; > u8 irq_stat; > > /* NOTE: reading the register should clear the interrupt */ > pci_read_config_byte(pdev, irq_reg, &irq_stat); > > return irq_stat & irq_mask; > } > > I might be misunderstanding but isn't this a different side effect > than > clearing interrupts from the IDE device when the IDE device status > register is read? This is saying that reading ARTTIM23 or CFR will > clear INTA#. > > This code is also only used for the CMD643, CMD646, and CMD646 rev 1 > - > none of which I believe QEMU is attempting to emulate. This doesn't > look relevant to us. > > > I'm not sure either but from what I've seen so far I think CMD646 > > either > > refers to the whole family (i.e. all versions) or early versions > > depending > > on context while there are at least two more newer versions > > referred > > to as > > CMD646U and CMD646U2 but probably there are more revisions within > > these as > > U2 seems to be rev5. QEMU sets the revision field to 7 but I'm not > > sure > > that's the same Linux checks for. There's some more info on this > > here: > > > > https://ata.wiki.kernel.org/index.php/Pata_cmd64x > > I've seen CMD646 used to refer to the family as well. > > > QEMU sets the revision field to 7 > > In /linux/drivers/ide/cmd64x.c I found the following comment: > > * UltraDMA only supported on PCI646U and PCI646U2, which > * correspond to revisions 0x03, 0x05 and 0x07 respectively. > > I guess the PCI646U2 is both revisions 0x05 and 0x07. > > According to /linux/drivers/ata/pata_cmd64x.c, interrupt behaviour > doesn't change across the CMD646U, CMD646U2, CMD648, and the CMD649. > These chips set the interrupt bit explicitly and do not have the > comment you highlighted earlier about clearing interrupts when > reading > ARTTIM23 or CFR. > > > This may explain why > > Linux > > checks alt status and clears interrupt instead of reading status > > register. > > I think Linux does this when there is no PCI IDE controller. The code > in /linux/drivers/ide/ide-io.c acts directly on IDE devices. > > > I don't know but sounds plausible that reading the status reg > > clears > > irq > > but reading the pci config words that mirrors it won't clear it. > > But > > the > > traces you had show that ide_ioport_read was called so driver was > > likely > > reading status and not the config regs? > > When in PCI native mode, ide_ioport_read is called because of the > following code in /qemu/hw/ide/pci.c. > > static uint64_t pci_ide_data_read(void *opaque, hwaddr addr, unsigned > size) > { > IDEBus *bus = opaque; > > if (size == 1) { > return ide_ioport_read(bus, addr); > } else if (addr == 0) { > if (size == 2) { > return ide_data_readw(bus, addr); > } else { > return ide_data_readl(bus, addr); > } > } > return ((uint64_t)1 << (size * 8)) - 1; > } > > ide_ioport_read calls qemu_irq_lower when the IDE device status > register is read. This will propagate all the way to the root bus. I > think that when there is a PCI IDE controller inbetween and in PCI > native mode, this is not always propagated past the PCI IDE > controller. > In the case of the CMD646U2, I think the lowering of interrupts is > only > propagated when the controller is in legacy/compatibility mode. > > From what I can tell cmd646_set_irq is only ever called from IDE > device > code, ie. ide_ioport_read. If the controller is not supposed to > propagate the lowering of interrupts, I think a possible fix would be > changing cmd646_set_irq to do nothing when the provided level is 0. > Interrupts can still be cleared by writing to CFR, ARTTIM23, and > MRDMODE which leverage cmd646_update_irq instead. This fix is not > suitable if the emulated CMD646 can be used in compatibility mode but > I > don't think we have support for that anyways. > > I'll submit a RFC V2 patch with a proposed fix. > > Thanks, > Jasper Lowell. > > On Wed, 2020-02-26 at 12:07 +0100, BALATON Zoltan wrote: > > On Wed, 26 Feb 2020, jasper.low...@bt.com wrote: > > > > Problem with that patch is that it removes this clearing from > > > > the > > > > func > > > > that's also used to emulate ISA IDE ioports which according to > > > > their > > > > spec should clear irq on read so that function should be OK but > > > > maybe > > > > should not be called by PCI IDE code? > > > > > > This might be it. > > > > > > The patch I provided is definitely incorrect and deviates from > > > the > > > specification as Mark mentioned earlier. I misunderstood what > > > ide_ioport_read/write were for and haven't been thinking about > > > legacy > > > mode. > > > > > > The bug that I believe exists is present when the CMD646 is > > > operating > > > in PCI native mode. Yeah, I think a possible solution might be to > > > avoid > > > using the ioport_read/write functions from the PCI code if they > > > have > > > side effects that assume the device is in legacy mode. I'll have > > > to > > > spend more time reading through the code and documentation. > > > > Since both the generic PCI IDE and CMD646 Linux drivers mention irq > > is > > cleared on reading status I think using ide_ioport_read in > > hw/ide/pci.c > > may be correct for the generic case. Not sure if the CMD646 has > > some > > pecularity but maybe the difference in drivers is to avoid bugs > > not > > because of CMD646 not clearing irq. The wikipedia page of CMD640: > > > > https://en.wikipedia.org/wiki/CMD640 > > > > mentions some versions of it has a bug similar to RZ-1000 for > > which > > there's a doc referenced that says the problem is that it forgets > > last > > data word after raising (or clearing?) IRQ and a workaround is to > > avoid > > checking status until all data transferred. This may explain why > > Linux > > checks alt status and clears interrupt instead of reading status > > register. > > > > > According to the CMD646U2 specification: > > > "When an IDE port is in PCI IDE Legacy Mode, the PCI646U2 is > > > compatible > > > with standard ISA IDE. The IDE task file registers are mapped to > > > the > > > standard ISA port addresses, and IDE drive interrupts occur at > > > IRQ14 > > > (primary) or IRQ15 (secondary)." > > > > > > In legacy mode, IRQ14 and IRQ15 mirror the state of INTRQ on each > > > of > > > the selected IDE devices. QEMU appears to emulate this correctly. > > > > > > In PCI native mode, INTRQ is not mirrored or given a single IRQ. > > > Interrupts are provided by the PCI IDE controller depending on > > > the > > > controller's logic. For instance, an IDE device can raise an > > > interrupt > > > but the CMD646 may not propagate that interrupt if MRDMODE has > > > certain > > > bits set. I'm thinking that maybe the controller does not have > > > logic to > > > unset the interrupt bits in CFR and ARTTIM23 when the IDE device > > > lowers > > > INTRQ. This might mean that the controller will continue to > > > assert > > > an > > > interrupt while bits in CFR and ARTTIM23 remain set, even if the > > > IDE > > > device lowers INTRQ. This would explain why the CMD646 > > > documentation > > > instructs developers to lower them explicitly. > > > > I don't know but sounds plausible that reading the status reg > > clears > > irq > > but reading the pci config words that mirrors it won't clear it. > > But > > the > > traces you had show that ide_ioport_read was called so driver was > > likely > > reading status and not the config regs? > > > > I've found some further logs: > > > > https://forums.gentoo.org/viewtopic-t-270357.html > > https://www.redhat.com/archives/axp-list/2000-October/msg00070.html > > https://www.linuxtopia.org/online_books/linux_beginner_books/debian_linux_desktop_survival_guide/Docking_Station.shtml > > > > These show Linux messages for early CMD646 revisions that had bugs > > but > > what I've noticed is that they say something about not 100% native > > mode > > which seems to be similar to what I had with via-ide when it uses > > IRQ14-15 > > even in native mode. Could your problem be similar? Maybe you could > > search > > for more such logs for Linux booting on Sun Ultra machines and see > > what > > those say and check how it determines which IRQ number it should > > use. > > This > > may depend on some setting that's not emulated correctly. > > > > > The Linux driver code appears to be consistent with the behaviour > > > that > > > I'm seeing from Solaris 10. > > > > > > The following appears to be used to initialise the CMD646U. > > > > > > { /* CMD 646U with broken UDMA */ > > > .flags = ATA_FLAG_SLAVE_POSS, > > > .pio_mask = ATA_PIO4, > > > .mwdma_mask = ATA_MWDMA2, > > > .port_ops = &cmd646r3_port_ops > > > }, > > > > > > The port operations it uses are defined as so: > > > > > > static struct ata_port_operations cmd646r3_port_ops = { > > > .inherits = &cmd64x_base_ops, > > > .sff_irq_check = cmd648_sff_irq_check, > > > .sff_irq_clear = cmd648_sff_irq_clear, > > > .cable_detect = ata_cable_40wire, > > > } > > > > > > As you mention, cmd648_sff_irq_clear clears interrupts explicitly > > > by > > > setting bits in MRDMODE - consistent with the CMD646U2 > > > documentation. > > > This behaviour is very similar to Solaris 10. > > > > I think this may be to avoid bug with CMD646U. > > > > > > Although if I got > > > > that correctly Linux thinks revisions over 5 are OK and QEMU > > > > has > > > > 7. > > > > > > I'm not sure how revision numbers work with these chips. Do > > > CMD646 > > > and > > > CMD646U2 refer to different revisions of the CMD646 chip? > > > > I'm not sure either but from what I've seen so far I think CMD646 > > either > > refers to the whole family (i.e. all versions) or early versions > > depending > > on context while there are at least two more newer versions > > referred > > to as > > CMD646U and CMD646U2 but probably there are more revisions within > > these as > > U2 seems to be rev5. QEMU sets the revision field to 7 but I'm not > > sure > > that's the same Linux checks for. There's some more info on this > > here: > > > > https://ata.wiki.kernel.org/index.php/Pata_cmd64x > > > > Regards, > > BALATON Zoltan > >