This issue appears to be present again on kernsl 3.13 (all) 3.16 (all) and 3.17 (all)
upon shifting sata link power from min_power state to max_performance state all kernels report various forms of this error: [ 45.200582] ata3.00: exception Emask 0x10 SAct 0x8000 SErr 0x50000 action 0xe frozen [ 45.200586] ata3.00: irq_stat 0x00400000, PHY RDY changed [ 45.200589] ata3: SError: { PHYRdyChg CommWake } [ 45.200592] ata3.00: failed command: WRITE FPDMA QUEUED [ 45.200596] ata3.00: cmd 61/e8:78:00:3f:48/00:00:04:00:00/40 tag 15 ncq 118784 out [ 45.200596] res 40/00:7c:00:3f:48/00:00:04:00:00/40 Emask 0x10 (ATA bus error) [ 45.200597] ata3.00: status: { DRDY } [ 45.200601] ata3: hard resetting link [ 45.925051] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 45.925911] ata3.00: configured for UDMA/133 [ 45.941016] ahci 0000:00:1f.2: port does not support device sleep [ 45.941029] ata3: EH complete With the current 3.13 kernel reporting the most severe errors of block write failures, etc. The machine this is being tested on is an A05 bios Dell XPS13 (9333) [ 2.288104] ata3.00: ATA-8: LITEONIT LMT-256L9M-11 MSATA 256GB, HM8110B, max UDMA/133 [ 2.288554] scsi 2:0:0:0: Direct-Access ATA LITEONIT LMT-256 10B PQ: 0 ANSI: 5 As this machine is brand new, it's possible that the HW is actually failing, however SMART doesn't indicate any problems with the block device smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.17.0-031700-generic] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: LITEONIT LMT-256L9M-11 MSATA 256GB Serial Number: TW0N42H75508548P1854 Firmware Version: HM8110B User Capacity: 256,060,514,304 bytes [256 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 4a SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Oct 10 13:39:25 2014 MDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 10) seconds. Offline data collection capabilities: (0x15) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 10) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0003 100 100 000 Pre-fail Always - 0 12 Power_Cycle_Count 0x0003 100 100 000 Pre-fail Always - 46 175 Program_Fail_Count_Chip 0x0003 100 100 000 Pre-fail Always - 0 176 Erase_Fail_Count_Chip 0x0003 100 100 000 Pre-fail Always - 0 177 Wear_Leveling_Count 0x0003 100 100 000 Pre-fail Always - 1946 178 Used_Rsvd_Blk_Cnt_Chip 0x0003 100 100 000 Pre-fail Always - 0 179 Used_Rsvd_Blk_Cnt_Tot 0x0003 100 100 000 Pre-fail Always - 0 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 100 100 000 Pre-fail Always - 1216 181 Program_Fail_Cnt_Total 0x0003 100 100 000 Pre-fail Always - 0 182 Erase_Fail_Count_Total 0x0003 100 100 000 Pre-fail Always - 0 187 Reported_Uncorrect 0x0003 100 100 000 Pre-fail Always - 0 195 Hardware_ECC_Recovered 0x0003 100 100 000 Pre-fail Always - 0 241 Total_LBAs_Written 0x0003 100 100 000 Pre-fail Always - 8704 242 Total_LBAs_Read 0x0003 100 100 000 Pre-fail Always - 1385 SMART Error Log Version: 0 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 0 - # 2 Short offline Completed without error 00% 0 - Selective Self-tests/Logging not supported -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/539467 Title: SATA link power management causes disk errors and corruption Status in The Linux Kernel: Expired Status in “linux” package in Ubuntu: Invalid Status in “pm-utils” package in Ubuntu: Fix Released Status in “pm-utils-powersave-policy” package in Ubuntu: Invalid Status in “linux” source package in Lucid: Won't Fix Status in “pm-utils” source package in Lucid: Invalid Status in “pm-utils-powersave-policy” source package in Lucid: Fix Released Status in “linux” source package in Maverick: Invalid Status in “pm-utils” source package in Maverick: Invalid Status in “pm-utils-powersave-policy” source package in Maverick: Invalid Status in “linux” source package in Natty: Invalid Status in “pm-utils” source package in Natty: Fix Released Status in “pm-utils-powersave-policy” source package in Natty: Invalid Bug description: SRU Justification for pm-utils-powersave-policy: Impact: On certain hardware, enabling power saving for the SATA link can cause data corruption. How Addressed: The proposed branch removes the sata link power policy script. This will cause the link to be maintained at the normal power usage instead of dropping when the power is removed from the machine. Reproduction: On an affected machine, unplug and plug in the power a few times. Data corruption will result. Regression Potential: Removing the script will cause the SATA link to stay fully powered at all times. This may cause an increase in the battery usage for some machines. There should be no functionality regressions or bugs introduced by this change. ===== Using Lucid on my laptop, I see errors like this in dmesg quite frequently (every few hours): Mar 14 23:00:09 chris-laptop kernel: [42987.460608] ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x50000 action 0xe frozen Mar 14 23:00:09 chris-laptop kernel: [42987.460618] ata1.00: irq_stat 0x00400000, PHY RDY changed Mar 14 23:00:09 chris-laptop kernel: [42987.460627] ata1: SError: { PHYRdyChg CommWake } Mar 14 23:00:09 chris-laptop kernel: [42987.460635] ata1.00: failed command: READ FPDMA QUEUED Mar 14 23:00:09 chris-laptop kernel: [42987.460649] ata1.00: cmd 60/08:00:97:23:44/00:00:01:00:00/40 tag 0 ncq 4096 in Mar 14 23:00:09 chris-laptop kernel: [42987.460652] res 40/00:04:97:23:44/00:00:01:00:00/40 Emask 0x10 (ATA bus error) Mar 14 23:00:09 chris-laptop kernel: [42987.460669] ata1.00: status: { DRDY } Mar 14 23:00:09 chris-laptop kernel: [42987.460681] ata1: hard resetting link Mar 14 23:00:09 chris-laptop kernel: [42987.523336] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen Mar 14 23:00:09 chris-laptop kernel: [42987.523346] ata2: irq_stat 0x00400000, PHY RDY changed Mar 14 23:00:09 chris-laptop kernel: [42987.523355] ata2: SError: { PHYRdyChg CommWake } Mar 14 23:00:09 chris-laptop kernel: [42987.523368] ata2: hard resetting link Mar 14 23:00:09 chris-laptop kernel: [42988.202586] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 14 23:00:09 chris-laptop kernel: [42988.205443] ata1.00: configured for UDMA/133 Mar 14 23:00:09 chris-laptop kernel: [42988.205459] ata1: EH complete Mar 14 23:00:09 chris-laptop kernel: [42988.280089] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 14 23:00:09 chris-laptop kernel: [42988.285567] ata2.00: configured for UDMA/100 Mar 14 23:00:09 chris-laptop kernel: [42988.289370] ata2: EH complete Every couple of days, this results in data corruption and my filesystem being remounted read-only: [ 6148.305806] Aborting journal on device sda1-8. [ 6148.325011] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal [ 6148.325018] EXT4-fs (sda1): Remounting filesystem read-only [ 6148.326702] journal commit I/O error [ 6148.330975] EXT4-fs error (device sda1) in ext4_reserve_inode_write: Journal has aborted [ 6148.462572] __ratelimit: 15 callbacks suppressed Those messages generally appear at the end of dmesg after the event, just after the "hard resetting link" message. I then have to boot a live CD and manually run fsck, as I can no longer boot the laptop. This is happening every couple of days generally, although it happened 3 times in one day last Thursday. I did contemplate it being a hardware issue, but I tried running the kernel from Karmic for a couple of days, and that worked ok without a single error message ProblemType: Bug AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21. Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC0: chr1s 4010 F.... pulseaudio /dev/snd/controlC1: chr1s 4010 F.... pulseaudio CRDA: Error: [Errno 2] No such file or directory Card0.Amixer.info: Card hw:0 'Intel'/'HDA Intel at 0xf6afc000 irq 21' Mixer name : 'Intel G45 DEVCTG' Components : 'HDA:111d76b2,10280263,00100302 HDA:80862802,80860101,00100000' Controls : 22 Simple ctrls : 11 Card1.Amixer.info: Card hw:1 'U0x46d0x9a4'/'USB Device 0x46d:0x9a4 at usb-0000:00:1a.7-3.3, high speed' Mixer name : 'USB Mixer' Components : 'USB046d:09a4' Controls : 2 Simple ctrls : 1 Card1.Amixer.values: Simple mixer control 'Mic',0 Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum Capture channels: Mono Limits: Capture 0 - 14 Mono: Capture 0 [0%] [23.75dB] [on] Date: Tue Mar 16 10:07:41 2010 DistroRelease: Ubuntu 10.04 Frequency: Once a day. HibernationDevice: RESUME=UUID=762f3439-67ac-4828-aa94-caf2a2ba0f9a InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027) LiveMediaBuild: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027) MachineType: Dell Inc. Latitude E5500 Package: linux-image-2.6.32-16-generic 2.6.32-16.25 PccardctlIdent: Socket 0: no product info available PccardctlStatus: Socket 0: no card ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-16-generic root=UUID=4ce5e12b-6e82-4fa4-90ff-7d9859d7504e ro quiet splash ProcEnviron: LANG=en_GB.utf8 SHELL=/bin/bash ProcVersionSignature: Ubuntu 2.6.32-16.25-generic Regression: Yes RelatedPackageVersions: linux-firmware 1.32 Reproducible: No SourcePackage: linux TestedUpstream: No Uname: Linux 2.6.32-16-generic x86_64 dmi.bios.date: 11/05/2009 dmi.bios.vendor: Dell Inc. dmi.bios.version: A15 dmi.board.name: 0DW635 dmi.board.vendor: Dell Inc. dmi.chassis.type: 8 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvrA15:bd11/05/2009:svnDellInc.:pnLatitudeE5500:pvr:rvnDellInc.:rn0DW635:rvr:cvnDellInc.:ct8:cvr: dmi.product.name: Latitude E5500 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/539467/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp