[Kernel-packages] [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Bug Watch Updater Fri, 27 Oct 2017 06:17:35 -0700

Launchpad has imported 37 comments from the remote bug at
https://bugzilla.redhat.com/show_bug.cgi?id=459202.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2008-08-14T22:17:43+00:00 Michal wrote:

Description of problem:
I am unable to use my Ethernet controller: Intel Corporation 82566DC Gigabit 
Network Connection (rev 03). System does not see it. Pleae find dmesg output.

e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
e1000e: Copyright (c) 1999-2007 Intel Corporation.
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:19.0 to 64
iTCO_vendor_support: vendor-support=0
0000:00:19.0: The NVM Checksum Is Not Valid
ACPI: PCI interrupt for device 0000:00:19.0 disabled
e1000e: probe of 0000:00:19.0 failed with error -5

Version-Release number of selected component (if applicable):
Driver version 0.2.0

How reproducible:
Happens everytime

Steps to Reproduce:
1.Boot computer

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/0

------------------------------------------------------------------------
On 2008-08-15T00:44:06+00:00 Yanko wrote:

What kernel version is this? Has this adapter ever worked under Fedora.
If yes when did it stop?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/1

------------------------------------------------------------------------
On 2008-08-15T23:36:51+00:00 Michal wrote:

I am sorry, i totally forgot about these details.
Kernels which i have:
2.6.25.11-97.fc9.x86_64
2.6.25.14-108.fc9.x86_64

I guess it stopped shortly after i upgraded to F9. It must have been one
of first kernel updates. I am not sure if that ever worked in F9.

Strange thing, on ubuntu i can not use it too. I do not have dmesg
output yet. I will try and see if this matches.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/2

------------------------------------------------------------------------
On 2008-08-22T01:42:03+00:00 Chuck wrote:

Can you post the output of 'lspci -nn -s 0000:00:19.0'?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/3

------------------------------------------------------------------------
On 2008-08-22T06:41:39+00:00 Michal wrote:

Output you have requested:

00:19.0 Ethernet controller [0200]: Intel Corporation 82566DC Gigabit
Network Connection [8086:104b] (rev 03)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/4

------------------------------------------------------------------------
On 2008-09-11T17:27:22+00:00 Jesse wrote:

The driver you have supports your hardware, but is erroring out on load.
The "NVM checksum is not valid" means that something corrupted your system BIOS 
flash.

Can you please give us details about the hardware in your system, attach the 
output of 
# lspci -vvv > lspci.txt

# dmidecode > dmiout.txt

we have some reports that Lenovo systems (a lot of them) are starting to
have this issue.

Please DO NOT run ibautil as some sites on the web suggest to try to fix
this issue.  It will likely cause you to have to replace your
motherboard to get LAN functionality back.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/10

------------------------------------------------------------------------
On 2008-09-11T22:31:41+00:00 Michal wrote:

Created attachment 316491
dmiout.txt

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/11

------------------------------------------------------------------------
On 2008-09-11T22:32:07+00:00 Michal wrote:

Created attachment 316492
lspci.txt

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/12

------------------------------------------------------------------------
On 2008-09-12T06:28:13+00:00 Michal wrote:

I have messed around a little with my card. Just wanted to check some
suggestions point out here
http://www.thinkwiki.org/wiki/Problem_with_e1000:_EEPROM_Checksum_Is_Not_Valid#Solutions

Little orange led on my ethernet is constantly flashing, when i tried with 
unloading e1000e module it did not changed anything. When i plugged in cable it 
stopped and green led showed up, meaning that connection is ok though driver 
still failed to load.
If you need any other info i will gladly help.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/13

------------------------------------------------------------------------
On 2008-09-12T16:28:34+00:00 Jesse wrote:

okay, so you have an HP machine with an ICH8 chipset.  I don't know what
the little orange LED flashing means, I will have to check on that.

can you get into the iAMT setup just after BIOS completes by pressing CTRL-p?
not sure if that might help you or not.

If I attach a debug driver here would you be willing to compile and run
it?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/14

------------------------------------------------------------------------
On 2008-09-14T14:42:31+00:00 Michal wrote:

I am not able to open iAMT setup. I believe that i do not have that
option as i have found that to enable that i need to go to my BIOS
settings and turn it on in Power section. Well, i do not have it there.

Yes, please attach driver.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/15

------------------------------------------------------------------------
On 2008-09-22T23:08:54+00:00 Jesse wrote:

Created attachment 317425
driver with csum check bypass

here is a driver that just prints the message but doesn't error out if
the checksum validation fails.

This should allow you to run ethtool -e ethX after loading the driver.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/18

------------------------------------------------------------------------
On 2008-09-22T23:10:02+00:00 Jesse wrote:

the difference in the driver I just attached is:
diff -rup e1000e-0.4.1.7.orig/src/netdev.c e1000e-0.4.1.7/src/netdev.c
--- e1000e-0.4.1.7.orig/src/netdev.c    2008-06-23 09:27:33.000000000 -0700
+++ e1000e-0.4.1.7/src/netdev.c 2008-09-22 16:06:59.000000000 -0700
@@ -56,7 +56,7 @@

 #define DRV_DEBUG

-#define DRV_VERSION "0.4.1.7" DRV_NAPI DRV_DEBUG
+#define DRV_VERSION "0.4.1.7_nocsum" DRV_NAPI DRV_DEBUG
 char e1000e_driver_name[] = "e1000e";
 const char e1000e_driver_version[] = DRV_VERSION;

@@ -5309,8 +5309,10 @@ static int __devinit e1000_probe(struct
                        break;
                if (i == 2) {
                        e_err("The NVM Checksum Is Not Valid\n");
+                       /* JJJ skip around error path
                        err = -EIO;
                        goto err_eeprom;
+                        JJJ end */
                }
        }

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/19

------------------------------------------------------------------------
On 2008-09-22T23:35:50+00:00 Jesse wrote:

also, whole piles of reports now starting to converge, many of them
linked here:

http://bugzilla.kernel.org/show_bug.cgi?id=11382

I'm trying to work a plan to help address this soonest.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/21

------------------------------------------------------------------------
On 2008-09-23T01:40:51+00:00 Chuck wrote:

Michal, have you ever booted a Fedora 10 Alpha or rawhide disk on that
system?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/24

------------------------------------------------------------------------
On 2008-09-23T06:51:35+00:00 Michal wrote:

Yes, i have rawhide on my system.
Last two kernels i have
2.6.27-0.226.rc1.git5.fc10.i686
2.6.27-0.244.rc2.git1.fc10.i686

I do not know which one killed my port. If you want me to run it or
something i am unable to have any internet connection on that kernels,
wifi does not work, eth you know.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/28

------------------------------------------------------------------------
On 2008-09-23T14:16:33+00:00 Warren wrote:

Does this mean Fedora 9 is not to blame for killing e1000e?

Slashdot reported that Fedora 9 and 10 are affected, but it sounds like
only rawhide has the problem.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/33

------------------------------------------------------------------------
On 2008-09-23T15:16:21+00:00 Jon wrote:

FWIW, I've heard of similar problems with recent -RT kernels.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/34

------------------------------------------------------------------------
On 2008-09-23T20:37:36+00:00 Jesse wrote:

I suggest this is severity urgent now.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/48

------------------------------------------------------------------------
On 2008-09-24T00:15:30+00:00 John wrote:

Patches to the e1000e driver to protect the NVM were posted to netdev a
few ours ago.  They need to be tried on this problem.  Either it will
fix the problem or it should point to what is causing the problem.  The
patches are obviously for the 2.6.27-rc kernels.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/58

------------------------------------------------------------------------
On 2008-09-24T13:01:47+00:00 Renato wrote:

Someone try this patchs from Jeff Kirsher (Intel)?
http://lkml.org/lkml/2008/9/23/427
http://lkml.org/lkml/2008/9/23/431
http://lkml.org/lkml/2008/9/23/432

And I think that is a good idea change priority and severity to higher,
because this bug can DAMAGED a hardware.

Best regards,
Renato

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/75

------------------------------------------------------------------------
On 2008-09-24T15:20:31+00:00 Will wrote:

kernel-2.6.27-0.352.rc7.git1.fc10
(http://koji.fedoraproject.org/koji/buildinfo?buildID=64060) includes a
fix for e1000 and (temporarily) disables e1000e.

This is probably sufficient for F10Beta (pending some regression
testing)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/80

------------------------------------------------------------------------
On 2008-09-24T15:26:44+00:00 Andy wrote:

I guess that will work, but you've now killed the wired network on quite
a few hardware platforms.  Pulling the patches from comment #20 would
probably be better for F10Beta.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/81

------------------------------------------------------------------------
On 2008-09-25T19:33:01+00:00 Warren wrote:

> And I think that is a good idea change priority and severity to higher,
> because this bug can DAMAGED a hardware.

Nobody is changing priority and severity because those fields are
meaningless.  We should really remove those fields from the interface.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/98

------------------------------------------------------------------------
On 2008-09-26T02:02:31+00:00 Jesse wrote:

please see my message on lkml titled "e1000e NVM corruption issue
status"

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/108

------------------------------------------------------------------------
On 2008-09-26T04:20:09+00:00 Warren wrote:

http://lkml.org/lkml/2008/9/25/510
This appears to be the post Jesse is referring to.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/111

------------------------------------------------------------------------
On 2008-09-29T13:47:36+00:00 Luis wrote:

Another message from Jesse Brandeburg in LKML isd a list of the patches
being used to debug the issue and under test as possible fixes to the
issue:

  http://lkml.org/lkml/2008/9/25/515

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/122

------------------------------------------------------------------------
On 2008-10-04T00:35:10+00:00 David wrote:

*** Bug 465127 has been marked as a duplicate of this bug. ***

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/172

------------------------------------------------------------------------
On 2008-10-04T13:11:59+00:00 Boricua wrote:

I was just hit by this bud after doing preupgrade from F9 64bit to F10 beta 
64bit. The system states "no network device available".  I'm including the 
output I got after running dmesg and other commands (hope it helps):
[Francisco@localhost ~]$ su -
Password: 
[root@localhost ~]# /sbin/ifconfig
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:124 errors:0 dropped:0 overruns:0 frame:0
          TX packets:124 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:10080 (9.8 KiB)  TX bytes:10080 (9.8 KiB)

[root@localhost ~]# dmesg | grep eth
Driver 'sd' needs updating - please use bus_type methods
Driver 'sr' needs updating - please use bus_type methods
[root@localhost ~]# "dhclient eth0" //
-bash: dhclient eth0: command not found
[root@localhost ~]# dhclient eth0
Device "eth0" does not exist.
Cannot find device "eth0"
[root@localhost ~]# dhclient eth1
Device "eth1" does not exist.
Cannot find device "eth1"
[root@localhost ~]# lscpi -v|grep -i ethernet
-bash: lscpi: command not found
[root@localhost ~]# lspci -v|grep -i ethernet
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network 
Connection (rev 02)
[root@localhost ~]# ifconfig -a
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:668 errors:0 dropped:0 overruns:0 frame:0
          TX packets:668 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:55232 (53.9 KiB)  TX bytes:55232 (53.9 KiB)
[root@localhost ~]#

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/174

------------------------------------------------------------------------
On 2008-10-04T14:04:02+00:00 Boricua wrote:

I was able to solve this by manual installation of the latest available
kernel, 2.6.27-0.382.rc8.git4.fc10, along with the equivalent kernel-
firmware. Worked immediately.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/175

------------------------------------------------------------------------
On 2008-10-04T22:07:48+00:00 Renato wrote:

Fixed?
<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a7703582836f55a1cbad0e2c1c6ebbee3f9b3a7>

Best regards,
Renato S. Yamane

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/177

------------------------------------------------------------------------
On 2008-10-11T12:34:43+00:00 Michal wrote:

I have tried newest rawhide kernel and it does not help.
I have also tried attached drivers. Did not change anything. Still no ethernet. 
 Now i did not mess aorund with no ethtool nor some intel soft.

Output of dmesg:

e1000e: Intel(R) PRO/1000 Network Driver - 0.4.1.7_nocsum-NAPI                  
e1000e: Copyright (c) 1999-2008 Intel Corporation.                              
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 22 (level, low) -> IRQ 22            
PCI: Setting latency timer of device 0000:00:19.0 to 64                         
0000:00:19.0: : Failed to initialize MSI interrupts.  Falling back to legacy 
interrupts.                                                                     

0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid                       
BUG: soft lockup - CPU#0 stuck for 61s! [modprobe:3703]                         
Modules linked in: e1000e(+) rfkill_input bridge bnep rfcomm l2cap vboxdrv 
ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi 
scsi_transport_iscsi fuse sunrpc arc4 ecb crypto_blkcipher b43 ssb rfkill 
mac80211 cfg80211 input_polldev ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state 
nf_conntrack iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq 
freq_table dm_mirror dm_log dm_multipath dm_mod ipv6 sr_mod cdrom pcspkr 
snd_hda_intel serio_raw joydev snd_seq_dummy sg snd_seq_oss snd_seq_midi_event 
i915 snd_seq ata_piix snd_seq_device pata_acpi snd_pcm_oss snd_mixer_oss video 
output ata_generic wmi battery ac drm hci_usb snd_pcm i2c_algo_bit i2c_core 
iTCO_wdt iTCO_vendor_support snd_timer snd_page_alloc bluetooth snd_hwdep snd 
soundcore ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd 
ehci_hcd [last unloaded: e1000e]        
CPU 0:                                                                          
Modules linked in: e1000e(+) rfkill_input bridge bnep rfcomm l2cap vboxdrv 
ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi 
scsi_transport_iscsi fuse sunrpc arc4 ecb crypto_blkcipher b43 ssb rfkill 
mac80211 cfg80211 input_polldev ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state 
nf_conntrack iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq 
freq_table dm_mirror dm_log dm_multipath dm_mod ipv6 sr_mod cdrom pcspkr 
snd_hda_intel serio_raw joydev snd_seq_dummy sg snd_seq_oss snd_seq_midi_event 
i915 snd_seq ata_piix snd_seq_device pata_acpi snd_pcm_oss snd_mixer_oss video 
output ata_generic wmi battery ac drm hci_usb snd_pcm i2c_algo_bit i2c_core 
iTCO_wdt iTCO_vendor_support snd_timer snd_page_alloc bluetooth snd_hwdep snd 
soundcore ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd 
ehci_hcd [last unloaded: e1000e]        
Pid: 3703, comm: modprobe Not tainted 2.6.26.5-45.fc9.x86_64 #1                 
RIP: 0010:[<ffffffffa0649c24>]  [<ffffffffa0649c24>] 
:e1000e:e1000_flash_cycle_ich8lan+0x34/0x60                                     

RSP: 0018:ffff81003c0699d8  EFLAGS: 00000202                                    
RAX: 000000000000e028 RBX: ffff81003c0699f8 RCX: 000000005351a052               
RDX: 00000000000006e8 RSI: 00000000000001f4 RDI: 00000000000006c3               
RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000001               
R10: 00000018bd3ebd94 R11: 0000000000000000 R12: ffff81003c069958               
R13: 0000000000000246 R14: 0000000000000010 R15: ffffffff810121a1               
FS:  00007f5cb44006f0(0000) GS:ffffffff81417000(0000) knlGS:0000000000000000    
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b                               
CR2: 0000000000cf6000 CR3: 0000000060d3e000 CR4: 00000000000006e0               
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000               
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400               

Call Trace:
 [<ffffffffa0649c34>] ? :e1000e:e1000_flash_cycle_ich8lan+0x44/0x60
 [<ffffffffa0649e55>] ? :e1000e:e1000_read_flash_data_ich8lan+0xa5/0x110
 [<ffffffffa064a0e7>] ? :e1000e:e1000_read_nvm_ich8lan+0x117/0x150      
 [<ffffffffa064dc21>] ? :e1000e:e1000_validate_nvm_checksum_generic+0x41/0x80
 [<ffffffffa064a600>] ? :e1000e:e1000_validate_nvm_checksum_ich8lan+0x80/0x90
 [<ffffffffa06517a7>] ? :e1000e:e1000_probe+0x5e7/0xd10                      
 [<ffffffff810f7869>] ? sysfs_addrm_finish+0x69/0x205                        
 [<ffffffff810f73c6>] ? sysfs_find_dirent+0x1c/0x31                          
 [<ffffffff81138628>] ? kobject_get+0x1a/0x22                                
 [<ffffffff8114932c>] ? pci_device_probe+0xb3/0x10a                          
 [<ffffffff811b8a2a>] ? driver_probe_device+0xc0/0x16e                       
 [<ffffffff811b8b27>] ? __driver_attach+0x4f/0x79                            
 [<ffffffff811b8ad8>] ? __driver_attach+0x0/0x79                             
 [<ffffffff811b82cb>] ? bus_for_each_dev+0x4f/0x89                           
 [<ffffffff811b8875>] ? driver_attach+0x1c/0x1e                              
 [<ffffffff811b7beb>] ? bus_add_driver+0xb7/0x201                            
 [<ffffffff811b8d18>] ? driver_register+0xa8/0x128                           
 [<ffffffff811495a3>] ? __pci_register_driver+0x53/0x8c                      
 [<ffffffffa00de054>] ? :e1000e:e1000_init_module+0x54/0x75                  
 [<ffffffff81059f14>] ? sys_init_module+0x199c/0x1af8                        
 [<ffffffff810ac2f4>] ? do_sync_read+0xe7/0x12d                              
 [<ffffffff8109eac4>] ? alloc_pages_current+0x0/0xc2                         
 [<ffffffff8100c291>] ? tracesys+0xd0/0xd5

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/203

------------------------------------------------------------------------
On 2008-10-11T12:57:46+00:00 Thomas wrote:

As far as I know the current fixes in the newest kernel only prevent
this from happening to undamanged hardware. But they don't fix it, if
it's already damaged.

Some people from Intel and Novell were talking about developing a tool
to repair it, if you have a backup of the original eeprom contents or
access to an identical system. However, I don't know if that tool is
already done or where you can get it from.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/204

------------------------------------------------------------------------
On 2008-10-11T13:02:44+00:00 Michal wrote:

Well, i did not backup my eeprom, my laptop is popular so i may have
access to someones eeprom image to restore it. I'll just ask someone for
image.

Thing is i had to disable e1000e loading (i am using drivers attached to
this bug) as it constantly crashes with message i pasted above and i can
not boot my kernel unless i blacklist module e1000e.

I hope guys will find way to fix it soon.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/205

------------------------------------------------------------------------
On 2008-10-16T01:03:21+00:00 John wrote:

It looks like the root cause of this problem has been found.  Included
here is the work-around for it as well as the reference to the 2.6.28-rc
fix for the problem.

>---------- Forwarded message ----------
>From: Steven Rostedt <rost...@goodmis.org>
>Date: Wed, Oct 15, 2008 at 3:21 PM
>Subject: [PATCH -stable] disable CONFIG_DYNAMIC_FTRACE due to possible
>memory corruption on module unload
>To: LKML <linux-ker...@vger.kernel.org>, sta...@kernel.org
>Cc: Linus Torvalds <torva...@linux-foundation.org>, Andrew Morton
><a...@linux-foundation.org>, Arjan van de Ven <ar...@infradead.org>,
>gre...@suse.de, jesse.brandeb...@intel.com, Thomas Gleixner
><t...@linutronix.de>, Ingo Molnar <mi...@elte.hu>
>
>
>
>While debugging the e1000e corruption bug with Intel, we discovered
>today that the dynamic ftrace code in mainline is the likely source of
>this bug.
>
>For the stable kernel we are providing the only viable fix 
>patch: labeling
>CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)
>
>We will follow up with a backport patch that contains the 
>fixes. But since
>the fixes are not a one liner, the safest approach for now is to
>disable the code in question.
>
>The cause of the bug is due to the way the current code in mainline
>handles dynamic ftrace.  When dynamic ftrace is turned on, it also
>turns on CONFIG_FTRACE which enables the -pg config in gcc that places
>a call to mcount at every function call. With just CONFIG_FTRACE this
>causes a noticeable overhead.  CONFIG_DYNAMIC_FTRACE works to ease this
>overhead by dynamically updating the mcount call sites into nops.
>
>The problem arises when we trace functions and modules are unloaded.
>The first time a function is called, it will call mcount and the mcount
>call will call ftrace_record_ip. This records the calling site and
>stores it in a preallocated hash table. Later on a daemon will
>wake up and call kstop_machine and convert any mcount callers into
>nops.
>
>The evolution of this code first tried to do this without the 
>kstop_machine
>and used cmpxchg to update the callers as they were called. But I
>was informed that this is dangerous to do on SMP machines if another
>CPU is running that same code. The solution was to do this with
>kstop_machine.
>
>We still used cmpxchg to test if the code that we are modifying is
>indeed code that we expect to be before updating it - as a final
>line of defense.
>
>But on 32bit machines, ioremapped memory and modules share the same
>address space. When a module would load its code into memory 
>and execute
>some code, that would register the function.
>
>On module unload, ftrace incorrectly did not zap these functions from
>its hash (this was the bug). The cmpxchg could have saved us in most
>cases (via luck) - but with ioremap-ed memory that was exactly 
>the wrong
>thing to do - the results of cmpxchg on device memory are undefined.
>(and will likely result in a write)
>
>The pending .28 ftrace tree does not have this bug anymore, as 
>a general push
>towards more robustness of code patching, this is done 
>differently: we do not
>use cmpxchg and we do a WARN_ON and turn the tracer off if 
>anything deviates
>from its expected state. Furthermore, patch sites are 
>statically identified
>during build time so there's no runtime discovery of dynamic code areas
>anymore, and no room for code unmaps to cause the hash to 
>become out of date.
>
>We believe the fragility of dynamic patching has been sufficiently
>addressed in the development code via the static patching 
>method, but further
>suggestions to make it more robust are welcome.
>
>Signed-off-by: Steven Rostedt <srost...@goodmis.org>
>Acked-by: Ingo Molnar <mi...@elte.hu>
>Acked-by: Thomas Gleixner <t...@linutronix.de>
>---
> kernel/trace/Kconfig |    3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>Index: linux-compile.git/kernel/trace/Kconfig
>===================================================================
>--- linux-compile.git.orig/kernel/trace/Kconfig 2008-10-02
>10:18:49.000000000 -0400
>+++ linux-compile.git/kernel/trace/Kconfig      2008-10-15
>17:29:34.000000000 -0400
>@@ -103,7 +103,8 @@ config CONTEXT_SWITCH_TRACER
>         all switching of tasks.
>
> config DYNAMIC_FTRACE
>-       bool "enable/disable ftrace tracepoints dynamically"
>+       bool "enable/disable ftrace tracepoints dynamically (BROKEN)"
>+       depends on BROKEN
>       depends on FTRACE
>       depends on HAVE_DYNAMIC_FTRACE
>       default y
>
>--
>To unsubscribe from this list: send the line "unsubscribe 
>linux-kernel" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/212

------------------------------------------------------------------------
On 2008-11-04T20:34:12+00:00 Jesse wrote:

that cpu-stuck bug was a problem in the way the e1000e driver loops to
read the NVM.

part of the threads on lkml covered a fix for that issue.

Please contact me directly for assistance restoring your eeprom image if
you need help.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/223

------------------------------------------------------------------------
On 2010-12-31T07:20:17+00:00 Neo wrote:

Anybody can provide me the fix for the cpu-stuck fix?

Also I need to get an eeprom to restore my Intel® 82573L Ethernet LAN
Controller supporting Gigabit Ethernet on the motherboard D5400XS.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/233

** Changed in: linux (Fedora)
   Importance: Unknown => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/263555

Title:
  [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE
  chipsets at risk

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux-lpia package in Ubuntu:
  Fix Released
Status in linux source package in Intrepid:
  Fix Released
Status in linux-lpia source package in Intrepid:
  Fix Released
Status in linux package in Fedora:
  Fix Released
Status in linux package in Gentoo Linux:
  Fix Released
Status in linux package in Mandriva:
  Fix Released
Status in linux package in Suse:
  Fix Released

Bug description:
  In some circumstances it appears possible for the 2.6.27-rc kernels to 
corrupt the NVRAM used by some Intel network parts to store data such as MAC 
addresses.
  This is limited to the new e1000e driver, and reports have only appeared from 
users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). 
The reports seem to be isolated to laptops, but it is not clear if this is 
because desktop/server parts are not vulnerable, or if use cases simply 
increase the chances of laptop users being hit.

  Once this corruption has occurred, recovery may be possible via a BIOS
  update, but may well require replacement of the hardware. Use of
  Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the
  problem to the point where the network part will no longer appear on
  the PCI bus.

  (this is a new description, the original one was based on too much guesswork. 
Below are the URLs originally referenced)
  (the driver i blacklisted in Ubuntu for  2.6.27-rc in the latest releases, so 
if your network is not working, it doesn't have to be damaged, but just 
disabled in order to prevent any accidents until this bug is solved, don't 
wary!) 
  http://www.blahonga.org/~art/rant.html (search for "em0")
  http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg00360.html
  http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg00398.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/263555/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Reply via email to