On 5/7/07, Florian Kulzer <[EMAIL PROTECTED]> wrote:
On Sat, May 05, 2007 at 14:01:45 -0700, jason.public wrote:
> Every once in a while, my eth1 interface dies with the following messages.
>
> # dmesg
> [...]
> APIC error on CPU0: 02(02)
> APIC error on CPU0: 02(02)
> APIC error on CPU0: 02(02)
> APIC error on CPU0: 02(02)
> APIC error on CPU0: 02(02)
> APIC error on CPU0: 02(02)

You might want to try booting with the option "noapic". Maybe the
network interface hang-ups are caused by problem with the interrupts.

> NETDEV WATCHDOG: eth1_rename: transmit timed out
> eth1_rename: transmit timed out, tx_status 00 status 6000.
>  diagnostics: net 0cc0 media 8802 dma 005000a1 fifo 0000
>  Flags; bus-master 1, dirty 392071(7) current 392087(7)
>  Transmit list 1f030660 vs. df030660.
>  0: @df030200  length 80000042 status 00000042
>  1: @df0302a0  length 80000042 status 00000042
>  2: @df030340  length 8000002a status 0000002a
>  3: @df0303e0  length 8000002a status 0000002a
>  4: @df030480  length 8000004d status 0000004d
>  5: @df030520  length 8000002a status 8000002a
>  6: @df0305c0  length 8000002a status 8000002a
>  7: @df030660  length 800000ea status 000000ea
>  8: @df030700  length 800000ea status 000000ea
>  9: @df0307a0  length 8000004a status 0000004a
>  10: @df030840  length 8000004c status 0000004c
>  11: @df0308e0  length 8000004c status 0000004c
>  12: @df030980  length 8000004d status 0000004d
>  13: @df030a20  length 8000004d status 0000004d
>  14: @df030ac0  length 8000004a status 0000004a
>  15: @df030b60  length 8000004a status 0000004a
> eth1_rename: Resetting the Tx ring pointer.
>
> Either I have to restart it with "/etc/init.d/networking restart", or
> it restarts by itself after a minute or so.  Any idea what the problem
> is?

Which ethernet card are we talking about? Please post the relevant part
of the output of "lspci", as well as the output of "/sbin/ifconfig".

I am also not too happy about seeing "eth1_rename" up there. This may
not be related to your main problem, but we might as well fix it while
we are wrangling with the networking anyway. Therefore I would like to
see the content of your /etc/udev/rules.d/z25_persistent-net.rules file.


I'll try booting with the noapic option, but it might take a while to
see results, as this only happens a couple of times a week on average.

There are two ethernet cards, but it's the 3Com that has been causing
problems.  When eth1_rename dies, it usually causes eth0 (the Realtek)
to stop working as well.


# lspci|grep Ethernet
00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169
Gigabit Ethernet (rev 10)
00:0a.0 Ethernet controller: 3Com Corporation 3c905 100BaseTX [Boomerang]


Here's the output of ifconfig:


# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:08:54:B3:70:2E
         inet addr:192.168.1.5  Bcast:192.168.1.255  Mask:255.255.255.0
         inet6 addr: fe80::208:54ff:feb3:702e/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:5213103 errors:0 dropped:0 overruns:0 frame:0
         TX packets:20599066 errors:0 dropped:46 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:429758527 (409.8 MiB)  TX bytes:363937475 (347.0 MiB)
         Interrupt:209 Base address:0x4000

eth1_rena Link encap:Ethernet  HWaddr 00:60:97:7D:43:37
         inet addr:192.168.0.3  Bcast:192.168.0.255  Mask:255.255.255.0
         inet6 addr: fe80::260:97ff:fe7d:4337/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:1645586 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1372484 errors:2 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:1575637794 (1.4 GiB)  TX bytes:165806431 (158.1 MiB)
         Interrupt:201 Base address:0xa000

lo        Link encap:Local Loopback
         inet addr:127.0.0.1  Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:16436  Metric:1
         RX packets:287 errors:0 dropped:0 overruns:0 frame:0
         TX packets:287 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:536268 (523.6 KiB)  TX bytes:536268 (523.6 KiB)


And the contents of /etc/udev/rules.d/z25_persistent-net.rules:


# This file was automatically generated by the /lib/udev/write_net_rules
# program, probably run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single line.

# PCI device 10b7:9050 (3c59x)
SUBSYSTEM=="net", DRIVERS=="?*", SYSFS{address}=="00:60:97:7d:43:37",
NAME="eth0"
# PCI device 0x10ec:0x8169 (r8169)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:08:54:b3:70:2e",
NAME="eth0"


The "_rename" suffix used to bother me (it doesn't make sense for a
developer to assume that a user will want to figure out how to rename
a device), but I've gotten used to it.

I've searched google for this error, and a lot of discussions about
bugs in a recent kernel come up.  Maybe the solution will be to
upgrade my kernel once the bug is fixed.  Any other ideas?

Thanks,
Jason


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to