I had email communication with Tejun Heo, and here is the chain of thoughts:
>> I am looking at a launchpad bug >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/340014 >> >> >> >> What I am doing is: >> >> >> >> 1. suspend/resume with AC plugged in >> >> 2. suspend/resume with AC unplugged >> >> 3. suspend/resume with AC plugged in >> >> 4. suspend plugged in, remove AC power, resume >> >> 5. suspend with AC removed, plug in AC power, resume >> >> >> >> Usually after 3 (or 4) I get: >> >> >> >> ata1: link is slow to respond, please be patient (ready=0) >> >> ata1: SRST failed (errno=-16) >> >> ata1: soft resetting link >> >> >> >> ata1: reset failed, giving up >> >> ata1.00: disabled >> >> ata1: EH complete >> >> sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET >> >> driverbyte=DRIVER_OK,SUGGEST_OK >> >> end_request: I/O error, dev sda, sector 152855535 >> >> Aborting journal on device sda1. >> >> >> >> After some long hours of debugging I find that when the above happens >> >> the ata taskfile status register ap->ioaddr.status_addr is set to >> >> 0xd0. returned by function: ata_sff_check_status(), as a result, >> >> ata_eh_reset() fails. >> >> >> >> Any idea what might be going on? Any pointers to help me debug/fix >> >> this further is really appreciated. >> Looks like the device or HSM (likely the device) is locking up after some disturbance at the link. Link problem isn't too surprising after suspend/resume cycle or AC plug in/out. libata EH can handle those event just fine. The problem here seems to be... * The device needs hardreset to get back to operational state again. * Unfortunately, ICH7 currently can't access sata control registers in ata_piix mode, so it can only do softreset. From ICH8 on, hardreset works because SCRs are accessible via SIDPR. ICH7 has two mechanisms to access SCRs - via ABAR or SIRI/STRD. The former requires AHCI BAR to be mapped (which isn't true for most BIOSen) and SCRAE turned on. The SIRI/STRD pair provides windowed access into ABAR area which includes the SCRs but I've never got around to do it and I don't know how it would actually work. There's also another mechanism. Toggling port enable bits in PCS might generate hardreset. After the device is gone, if you unload and reload ata_piix, does the device come back? Thanks. manoj.i...@canonical.com wrote: >> There's also another mechanism. Toggling port enable bits in PCS >> >> might generate hardreset. >> > > > This is something I can try, can you point me in the right direction ? > Well, there's no code yet. rmmoding and insmoding is the easiest it gets at this point. >> After the device is gone, if you unload and reload ata_piix, does the >> >> device come back? >> > > > in Jaunty kernel, ata_piix is not built as a module, rather, built into > > the kernel. > Can you roll a vanilla kernel and test it? I wanna make sure that hardreset is what's required before proceeding further. ============ I installed jaunty on a USB stick, chrooted into it, and tried my exercise. Interesting observation below: 1. Boot from HDD, when I remain chrooted to the USB stick suspend/resume ( with AC plugged in/ unplugged ) no ATA errors. I tried this exercise a few times and noticed that system behaves well. I even ran suspend/resume multiple times as described in #3. 2. I boot from HDD, - chroot into USB drive, exit. suspend/resume with AC plugged in, chroot into USB drive quickly and run dmesg. OK - exit chroot, unplug AC, suspend/resume, quickly chroot into USB, run dmesg. OK - exit chroot, plug in AC, suspend/resume, quickly chroot into USB, run dmesg. ATA errors appear. I rmmod -f ata_piix, it waits for the softresets to timeout, then removed the modules. I see EXT3 errors now, then I insmod ata_piix.ko. No change, still see ATA errors, hdd does not seem to reset, cannot use the filesystem. I rmmod and insmod a few times just in case, no luck, still seeing EXT3 errors and filesystem on HDD is un-usable. 3. I boot from USB stick, mount the HDD as /jaunty, and go through my whole exercise of suspend/resume tests with AC plugged in AC unplugged, even suspend/resume for 30 sec each with 30 sec intervel in between and repeated 60 times. OK. No ATA errors, (using ata_piix as module). Now I am thoughly confused. Cheers --- manjo -- Samsung NC10 fails suspend/Resume tests https://bugs.launchpad.net/bugs/340014 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs