arp-scan triggers via-velocity "eth0: excessive work at interrupt"
It kind of surprised me that sending 254 arp packets by using the arp-scan tool (http://www.nta-monitor.com/tools/arp-scan/) on a /24 consistently triggers a burst of "eth0: excessive work at interrupt." This is a 600 MHz PIII, 2.6.22-rc4, via-velocity driver. model name : Pentium III (Katmai) stepping: 3 cpu MHz : 601.406 cache size : 512 KB 00:09.0 Ethernet controller [0200]: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter [1106:3119] (rev 11) Just double-checking... the program actually sent 463 packets (256 + a retry to all those that didn't respond to the first one), and triggers 11 copies of the kernel message. Command line: arp-scan -I eth0 -l [-v] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IC Plus Corp IC Plus IP1000
[EMAIL PROTECTED] wrote: > I wonder if it at some time will be included in the standard Linux kernel? > I am of course interested because my main board has it built in, so I > would be willing to test it. "Me, too!" This has been discussed sporadically for the last year, and I can confirm that the driver source from the manufacturer's web page is starting to suffer bit rot, but after patching the more egregious breakage (references to , UTS_RELEASE and pci_module_init() stop it from compiling), it works. It doesn't even spew "eth0: excessive work at interrupt" when running arp-scan, unlike certain in-tree drivers. :-) I got a bit of a rude shock today after doing an emergency replacement on a socket 939 motherboard and blandly assuring a Windows-experienced co-worker that despite a change from nForce to VIA KT890 chipset, the system should "just work". One round of floppy shuffle and code-fixing later, my co-worker is not impressed by the Linux version of "Have driver disk". :-) Is anyone able to push it to completion? I have a vague idea that the vendor lost interest. (Should I write to Greg K-H and tell him "Free Linux Driver Developed!"?) I can play testing guinea-pig if needed. Thanks! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IC Plus Corp IC Plus IP1000
> Use the 'sundance' driver that's been in the kernel for quite a while. Er... that driver specifically does not list the IP1000's PCI device ID (13f0:1023), nor does it support anything over 100 Mbit/s. Are you *quite* sure that adding 13f0:1023 to the sundance_pci_tbl is all that's required? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IC Plus Corp IC Plus IP1000
The following hacks to bring it up to date got the vendor-supplied driver working for me. This is just fixing the things the compiler complained about; there may be other issues, but they don't seem to interfere with basic funtionality. diff --git a/Makefile b/Makefile index c91b384..31e4172 100644 --- a/Makefile +++ b/Makefile @@ -77,10 +77,10 @@ ifeq ($(kernelFlag26),kernel26x) EXTRA_CFLAGS+=$(MAPPING_MODE) all: - $(MAKE) -C $(KernelBuildDir) SUBDIRS=$(PWD) modules + $(MAKE) -C $(KernelBuildDir) M=$(PWD) install: - install -m 644 -c ipg.$(kernelExtension) $(kernelMisc) + $(MAKE) -C $(KernelBuildDir) M=$(PWD) modules_install ipg-objs:=$(OBJS) obj-m+=$(TARGET) diff --git a/ipg.h b/ipg.h index 2d184d4..cefe5c8 100644 --- a/ipg.h +++ b/ipg.h @@ -98,8 +98,8 @@ */ -#include #include +#include #include #if ((LINUX_VERSION_CODE < KERNEL_VERSION(2,3,0)) && defined(MODVERSIONS)) diff --git a/ipg_main.c b/ipg_main.c index c39ff4a..3a0dfd4 100644 --- a/ipg_main.c +++ b/ipg_main.c @@ -172,9 +172,11 @@ int ipg_io_config(IPG_DEVICE_TYPE *ipg_ethernet_device); #if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0) voidipg_interrupt_handler(int ipg_irq, void *device_instance, struct pt_regs *regs); -#else +#elif LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) static irqreturn_t ipg_interrupt_handler(int ipg_irq, void *device_instance, struct pt_regs *regs); +#else +static irqreturn_t ipg_interrupt_handler(int ipg_irq, void *device_instance); #endif voidipg_nic_txcleanup(IPG_DEVICE_TYPE *ipg_ethernet_device); @@ -1425,9 +1427,11 @@ int ipg_io_config(IPG_DEVICE_TYPE *ipg_ethernet_device) #if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0) void ipg_interrupt_handler(int ipg_irq, void *device_instance, struct pt_regs *regs) -#else +#elif LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) static irqreturn_t ipg_interrupt_handler(int ipg_irq, void *device_instance, struct pt_regs *regs) +#else +static irqreturn_t ipg_interrupt_handler(int ipg_irq, void *device_instance) #endif { int error; @@ -1957,7 +1961,7 @@ int ipg_nic_open(IPG_DEVICE_TYPE *ipg_ethernet_device) */ if ((error = request_irq(sp->ipg_pci_device->irq, &ipg_interrupt_handler, -SA_SHIRQ, +IRQF_SHARED, ipg_ethernet_device->name, ipg_ethernet_device)) < 0) { @@ -4041,7 +4045,10 @@ int init_module(void) #endif IPG_DEBUG_MSG("init_module\n"); -#if LINUX_VERSION_CODE > KERNEL_VERSION(2,5,0) + +#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,12) + return pci_register_driver(&ipg_pci_driver); +#elif LINUX_VERSION_CODE > KERNEL_VERSION(2,5,0) return pci_module_init(&ipg_pci_driver); #else - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipg: add IP1000A driver to kernel tree
(Resend to netdev; already sent to relevant individuals.) Here's a possible fix for the p[] array issues akpm noticed. This replaces them with calls to a new mdio_write_bits function. Boot-tested, passes net traffic, and mii-tool and mii-diag produce sensible output (including noticing link status changes). Also, regarding >> +for (i = 0; i < IPG_TFDLIST_LENGTH; i++) { >> +offset = (u32) &sp->txd[i].next_desc - (u32) sp->txd; >> +printk(KERN_INFO "%2.2x %4.4x TFDNextPtr = %16.16lx\n", i, >> + offset, (unsigned long) sp->txd[i].next_desc); >> + >> +offset = (u32) &sp->txd[i].tfc - (u32) sp->txd; > > Is the u32 cast safe here on all architectures? IPG_TFDLIST_LENGTH is 256, and sp->txd is an array of struct ipg_tx, which are 24 bytes each, so the most it can be is 6K. The result fits into 32 bits, so the inputs can be safely truncated. A more awkward way to write it would be offset = i * sizeof(struct ipg_tx) + offsetof(struct ipg_tx, tfc); This patch is placed in the public domain; copyright abandoned. (The final hunk is a space-TAB whitespace repair that git complained about when I imported the patch.) diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index 87a674c..6267a34 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -180,12 +180,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity) } /* + * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1) + * of the PhyCtrl register. 1 <= len <= 32. "ioaddr" is the register + * address, and "otherbits" are the values of the other bits. + */ +static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, unsigned len) +{ + otherbits |= IPG_PC_MGMTDIR; + do { + /* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */ + u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA; + /* + rather than | lets compiler microoptimize better */ + ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits); + } while (len); +} + +/* * Read a register from the Physical Layer device located * on the IPG NIC, using the IPG PHYCTRL register. */ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg) { void __iomem *ioaddr = ipg_ioaddr(dev); + u8 const polarity = ipg_r8(PHY_CTRL) & + (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY); + unsigned i, data = 0; /* * The GMII mangement frame structure for a read is as follows: * @@ -199,75 +218,30 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg) * D = bit of read data (MSB first) * * Transmission order is 'Preamble' field first, bits transmitted -* left to right (first to last). +* left to right (msbit-first). */ - struct { - u32 field; - unsigned int len; - } p[] = { - { GMII_PREAMBLE,32 }, /* Preamble */ - { GMII_ST, 2 }, /* ST */ - { GMII_READ,2 }, /* OP */ - { phy_id, 5 }, /* PHYAD */ - { phy_reg, 5 }, /* REGAD */ - { 0x, 2 }, /* TA */ - { 0x, 16 }, /* DATA */ - { 0x, 1 }/* IDLE */ - }; - unsigned int i, j; - u8 polarity, data; - - polarity = ipg_r8(PHY_CTRL); - polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY); - - /* Create the Preamble, ST, OP, PHYAD, and REGAD field. */ - for (j = 0; j < 5; j++) { - for (i = 0; i < p[j].len; i++) { - /* For each variable length field, the MSB must be -* transmitted first. Rotate through the field bits, -* starting with the MSB, and move each bit into the -* the 1st (2^1) bit position (this is the bit position -* corresponding to the MgmtData bit of the PhyCtrl -* register for the IPG). -* -* Example: ST = 01; -* -* First write a '0' to bit 1 of the PhyCtrl -* register, then write a '1' to bit 1 of the -* PhyCtrl register. -* -* To do this, right shift the MSB of ST by the value: -* [field length - 1 - #ST bits already written] -* then left shift this result by 1. -*/ - data = (p[j].field >> (p[j].len - 1 - i)) << 1; - data &= IPG_PC_MGMTDATA; - data |= polarity | IPG_PC_MGMTDIR; - -
2.6.23-rc8 network problem. Mem leak? ip1000a?
Uniprocessor Althlon 64, 64-bit kernel, 2G ECC RAM, 2.6.23-rc8 + linuxpps (5.0.0) + ip1000a driver. (patch from http://marc.info/?l=linux-netdev&m=118980588419882) After a few hours of operation, ntp loses the ability to send packets. sendto() returns -EAGAIN to everything, including the 24-byte UDP packet that is a response to ntpq. -EAGAIN on a sendto() makes me think of memory problems, so here's meminfo at the time: ### FAILED state ### # cat /proc/meminfo MemTotal: 2059384 kB MemFree: 15332 kB Buffers:665608 kB Cached: 18212 kB SwapCached: 0 kB Active: 380384 kB Inactive: 355020 kB SwapTotal: 5855208 kB SwapFree: 5854552 kB Dirty: 28504 kB Writeback: 0 kB AnonPages: 51608 kB Mapped: 11852 kB Slab: 1285348 kB SReclaimable: 152968 kB SUnreclaim:1132380 kB PageTables: 3888 kB NFS_Unstable:0 kB Bounce: 0 kB CommitLimit: 6884900 kB Committed_AS: 590528 kB VmallocTotal: 34359738367 kB VmallocUsed:265628 kB VmallocChunk: 34359472059 kB Killing and restarting ntpd gets it running again for a few hours. Here's after about two hours of successful operation. (I'll try to remember to run slabinfo before killing ntpd next time.) ### WORKING state ### # cat /proc/meminfo MemTotal: 2059384 kB MemFree: 20252 kB Buffers:242688 kB Cached: 41556 kB SwapCached:200 kB Active: 285012 kB Inactive: 147348 kB SwapTotal: 5855208 kB SwapFree: 5854212 kB Dirty: 36 kB Writeback: 0 kB AnonPages: 148052 kB Mapped: 12756 kB Slab: 1582512 kB SReclaimable: 134348 kB SUnreclaim:1448164 kB PageTables: 4500 kB NFS_Unstable:0 kB Bounce: 0 kB CommitLimit: 6884900 kB Committed_AS: 689956 kB VmallocTotal: 34359738367 kB VmallocUsed:265628 kB VmallocChunk: 34359472059 kB # /usr/src/linux/Documentation/vm/slabinfo Name Objects ObjsizeSpace Slabs/Part/Cpu O/S O %Fr %Ef Flg :016 1478 1624.5K 6/3/1 256 0 50 96 * :024 170 24 4.0K 1/0/1 170 0 0 99 * :032 1339 3245.0K 11/2/1 128 0 18 95 * :040 102 40 4.0K 1/0/1 102 0 0 99 * :064 5937 64 413.6K 101/15/1 64 0 14 91 * :07256 72 4.0K 1/0/1 56 0 0 98 * :088 6946 88 618.4K151/0/1 46 0 0 98 * :096 23851 96 2.5M 616/144/1 42 0 23 90 * :128 730 128 114.6K 28/6/1 32 0 21 81 * :136 232 13636.8K 9/6/1 30 0 66 85 * :192 474 19298.3K 24/4/1 21 0 16 92 * :256 1385376 256 354.6M 86587/0/1 16 0 0 99 * :32012 304 4.0K 1/0/1 12 0 0 89 *A :384 359 384 180.2K44/23/1 10 0 52 76 *A :512 1384316 512 708.7M 173040/1/18 0 0 99 * :64072 61653.2K 13/5/16 0 38 83 *A :704 1870 696 1.3M170/0/1 11 1 0 93 *A :0001024 4271024 454.6K111/9/14 0 8 96 * :0001472 1501472 245.7K 30/0/15 1 0 89 * :00020481589912048 325.7M 39759/25/14 1 0 99 * :0004096514096 245.7K 30/9/12 1 30 85 * Acpi-State 51 80 4.0K 1/0/1 51 0 0 99 anon_vma 1032 1628.6K 7/5/1 170 0 71 57 bdev_cache 43 72036.8K 9/1/15 0 11 83 Aa blkdev_requests 42 28812.2K 3/0/1 14 0 0 98 buffer_head 59173 10411.1M2734/1690/1 39 0 61 54 a cfq_io_context 223 15240.9K 10/6/1 26 0 60 82 dentry 98641 19219.7M 4813/274/1 21 0 5 96 a ext3_inode_cache115690 68886.3M 10545/77/1 11 1 0 92 a file_lock_cache 23 168 4.0K 1/0/1 23 0 0 94 idr_layer_cache118 52869.6K 17/1/17 0 5 89 inode_cache 1365 528 798.7K195/0/17 0 0 90 a kmalloc-131072 1 131072 131.0K 1/0/11 5 0 100 kmalloc-163848 16384 131.0K 8/0/11 2 0 100 kmalloc-327681 3276832.7K 1/0/11 3 0 100 kmalloc-8 1535 812.2K 3/1/1 512 0 33 99 kmalloc-819
Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?
> ntpd. Sounds like pps leaking to me. That's what I'd think, except that pps does no allocation in the normal running state, so there's nothing to leak. The interrupt path just records the time in some preallocated, static buffers and wakes up blocked readers. The read path copies the latest data out of those static buffers. There's allocation when the PPS device is created, and more when it's opened. >> Can anyone offer some diagnosis advice? > CONFIG_DEBUG_SLAB_LEAK? Ah, thanks you; I've been using SLUB which doesn't support this option. Here's what I've extracted. I've only presented the top few slab_allocators and a small subset of the oom-killer messages, but I have full copies if desired. Unfortunately, I've discovered that the machine doesn't live in this unhappy state forever. Indeed, I'm not sure if killing ntpd "fixes" anything; my previous observations may have been optimistic ignorance. (For my own personal reference looking for more oom-kill, I nuked ntpd at 06:46:56. And the oom-kills are continuing, with the latest at 07:43:52.) Anyway, I have a bunch of information from the slab_allocators file, but I'm not quire sure how to make sense of it. With a machine in the unhappy state and firing the OOM killer, the top 20 slab_allocators are: $ sort -rnk2 /proc/slab_allocators | head -20 skbuff_head_cache: 1712746 __alloc_skb+0x31/0x121 size-512: 1706572 tcp_send_ack+0x23/0x102 skbuff_fclone_cache: 149113 __alloc_skb+0x31/0x121 size-2048: 148500 tcp_sendmsg+0x1b5/0xae1 sysfs_dir_cache: 5289 sysfs_new_dirent+0x4b/0xec size-512: 2613 sock_alloc_send_skb+0x93/0x1dd Acpi-Operand: 2014 acpi_ut_allocate_object_desc_dbg+0x34/0x6e size-32: 1995 sysfs_new_dirent+0x29/0xec vm_area_struct: 1679 mmap_region+0x18f/0x421 size-512: 1618 tcp_xmit_probe_skb+0x1f/0xcd size-512: 1571 arp_create+0x4e/0x1cd vm_area_struct: 1544 copy_process+0x9f1/0x1108 anon_vma: 1448 anon_vma_prepare+0x29/0x74 filp: 1201 get_empty_filp+0x44/0xcd UDP: 1173 sk_alloc+0x25/0xaf size-128: 1048 r1bio_pool_alloc+0x23/0x3b size-128: 1024 nfsd_cache_init+0x2d/0xcf Acpi-Namespace: 973 acpi_ns_create_node+0x2c/0x45 vm_area_struct: 717 split_vma+0x33/0xe5 dentry: 594 d_alloc+0x24/0x177 I'm not sure quite what "normal" numbers are, but I do wonder why there are 1.7 million TCP acks buffered in the system. Shouldn't they be transmitted and deallocated pretty quickly? This machine receives more data than it sends, so I'd expect acks to outnumber "real" packets. Could the ip1000a driver's transmit path be leaking skbs somehow? that would also explain the "flailing" of the oom-killer; it can't associate the allocations with a process. Here's /proc/meminfo: MemTotal: 1035756 kB MemFree: 43508 kB Buffers: 72920 kB Cached: 224056 kB SwapCached: 344916 kB Active: 664976 kB Inactive: 267656 kB SwapTotal: 4950368 kB SwapFree: 3729384 kB Dirty:6460 kB Writeback: 0 kB AnonPages: 491708 kB Mapped: 79232 kB Slab:41324 kB SReclaimable:25008 kB SUnreclaim: 16316 kB PageTables: 8132 kB NFS_Unstable:0 kB Bounce: 0 kB CommitLimit: 5468244 kB Committed_AS: 1946008 kB VmallocTotal: 253900 kB VmallocUsed: 2672 kB VmallocChunk: 251228 kB I have a lot of oom-killer messages, that I have saved but am not posting for size reasons, but here are some example backtraces. They're not very helpful to me; do they enlighten anyone else? 02:50:20: apcupsd invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0 02:50:22: 02:50:22: Call Trace: 02:50:22: [] out_of_memory+0x71/0x1ba 02:50:22: [] __alloc_pages+0x255/0x2d7 02:50:22: [] cache_alloc_refill+0x2f4/0x60a 02:50:22: [] hiddev_ioctl+0x579/0x919 02:50:22: [] kmem_cache_alloc+0x57/0x95 02:50:22: [] hiddev_ioctl+0x579/0x919 02:50:22: [] cp_new_stat+0xe5/0xfd 02:50:22: [] hiddev_read+0x199/0x1f6 02:50:22: [] default_wake_function+0x0/0xe 02:50:22: [] do_ioctl+0x45/0x50 02:50:22: [] vfs_ioctl+0x1f9/0x20b 02:50:22: [] sys_ioctl+0x3c/0x5d 02:50:22: [] system_call+0x7e/0x83 02:52:18: postgres invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0 02:52:18: 02:52:18: Call Trace: 02:52:18: [] out_of_memory+0x71/0x1ba 02:52:18: [] __alloc_pages+0x255/0x2d7 02:52:18: [] poison_obj+0x26/0x2f 02:52:18: [] __get_free_pages+0x40/0x79 02:52:18: [] copy_process+0xb0/0x1108 02:52:18: [] alloc_pid+0x1f/0x27d 02:52:18: [] do_fork+0xb1/0x1a7 02:52:18: [] copy_user_generic_string+0x17/0x40 02:52:18: [] system_call+0x7e/0x83 02:52:18: [] ptregscall_common+0x67/0xb0 02:52:18: kthreadd invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0 02:52:18: 02:52:18: Call Trace: 02:52:18: [] out_of_memory+0x71/0x1ba 02:52:18: [] __alloc_pages+0x255/0x2d7 02:52:18: [] __get_free_pages+0x40/0x79 02:52:18: [] copy_process+0xb0/0x1108 02:52:18: [] alloc_pid+0x1f/0x27d 02:52:18: [] do_fork+0xb1/0x1a7 02:52:18: [] update_curr+0xe6/0x10b 02:52:18: [] de
Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?
> OK. Did you try to reproduce it without the pps patch applied? No. But I've yanked the ip1000a driver (using old crufy vendor-supplied out-of-kernel module) and the problems are GONE. >> This machine receives more data than it sends, so I'd expect acks to >> outnumber "real" packets. Could the ip1000a driver's transmit path be >> leaking skbs somehow? > Absolutely. Normally a driver's transmit completion interrupt handler will > run dev_kfree_skb_irq() against the skbs which have been fully sent. > > However it'd be darned odd if the driver was leaking only tcp acks. It's leaking lots of things... you can see ARP packets in there and all sorts of stuff. But the big traffic hog is BackupPC doing inbound rsyncs all night long, which generates a lot of acks. Those are the packets it sends, so those are the packets that get leaked. > I can find no occurrence of "dev_kfree_skb" in drivers/net/ipg.c, which is > suspicious. Look for "IPG_DEV_KFREE_SKB", which is a wrapper macro. (Or just add "-i" to your grep.) It should probably be deleted (it just expands to dev_kfree_skb), but was presumably useful to someone during development. > Where did you get your ipg.c from, btw? davem's tree? rc8-mm1? rc8-mm2?? As I wrote originally, I got it from http://marc.info/?l=linux-netdev&m=118980588419882 which was a reuqest for mainline submission. If there are other patches floating around, I'm happy to try them. Now that I know what to look for, it's easy to spot the leak before OOM. > I assume that meminfo was not captured when the system was ooming? There > isn't much slab there. Oops, sorry. I captured slabinfo but not meminfo. Thank you very much! Sorry to jump the gun and post a lot before I had all the data, but if it WAS a problem in -rc8, I wanted to mention it before -final. Now, the rush is to get the ip1000a driver fixed before the merge window opens. I've added all the ip1000a developers to the Cc: list in an attempt to speed that up. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
The mess that is SIGPIPE
I noticed that FreeBSD has a useful SOL_SOCKET option, SO_NOSIGPIPE, which is a "sticky" version of MSG_NOSIGNAL. Particularly useful for libraries, it disables SIGPIPE on a particular socket without disabling it globally. Anyway, this led me to look at the implementation of SIGPIPE and MSG_NOSIGNAL and... it's a bit of a mess. Some places honor MSG_NOSIGNAL, but there are a lot of code paths that don't appear to. So I started thinking about a cleanup... Currently, SIGPIPE is sent from dozens of places that return EPIPE. What if they could all be deleted and just a few system calls: write(), writev(), send(), sendto() and sendmsg() (oh, yes, and sendfile()) could check for EPIPE from the VFS calls they make and generate SIGPIPE appropriately? The only thing is figuring out where in the call chain to put it. sys_send() -> sys_sendto() -> sock_sendmsg() -> __sock_sendmsg() -> sock->ops->sendmsg sys_write() -> vfs_write() -> do_sync_write -> filp->f_op->aio_write -> sock_aio_write() -> do_sock_write() -> __sock_(sendmsg() sys_writev() -> vfs_writev() -> do_readv_writev() -> do_loop_readv_writev() -> file->f_op->write -> -> sock_aio_write() -> do_sock_write() -> __sock_(sendmsg() kernel_sendmsg() also calls sock_sendmsg(), and it would save a bunch of fiddling with MSG_NOSIGNAL if kernel_sendmsg() never generated signals. That implies that the check should be at the sys_sendto() layer or higher. Anyway, looking into implementing this, I found a zillion places where the logic looked a little unclear, such as OCFS2 code. I'm not convinced that a bug there can't generate SIGPIPE unexpectedly. Anyway, before I tackle this rewrite, I'd like to ask if someone knows what the code is *supposed* to be doing, and can confirm that SIGPIPE should be generated if and only if the write is done by a user-level system call that can return EPIPE. So all the buried network file systems should never generate it. Is that right? Thanks for the insights. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 oops in net_tx_action
> [EMAIL PROTECTED] <[EMAIL PROTECTED]> : >> Kernel is 2.6.24-rc6 + linuxpps patches, which are all to the serial >> port driver. >> >> 2.6.23 was known stable. I haven't tested earlier 2.6.24 releases. >> I think it happened once before; I got a black-screen lockup with >> keyboard LEDs blinking, but that was with X running so I couldn't see a >> console oops. But given that I installed 2.6.24-rc6 about 24 hours ago, >> that's a disturbing pattern. > It is probably this one: > > http://marc.info/?t=11978279403&r=1&w=2 Thanks! I got the patch from http://marc.info/?l=linux-netdev&m=119756785219214 (Which didn't make it into -rc7; please fix!) and am recompiling now. Actually, I grabbed the hardware mitigation followon patch while I was at it. I notice that the comment explaining the format of CSR11 and what 0x80F1 means got lost; perhaps it would be nice to resurrect it? 0x80F1 8000 = Cycle size (timer control) 7800 = TX timer in 16 * Cycle size 0700 = No. pkts before Int. (0 = interrupt per packet) 00F0 = Rx timer in Cycle size 000E = No. pkts before Int. 0001 = Continues mode (CM) (Boy, that tulip driver could use a whitespace overhaul.) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 oops in net_tx_action
>> Thanks! I got the patch from >> http://marc.info/?l=linux-netdev&m=119756785219214 >> (Which didn't make it into -rc7; please fix!) >> and am recompiling now. > Jeff is busy so he's asked me to pick up the more important > driver bug fixes that get posted. > > I'll push this around, thanks. Much obliged. It's only 11 hours of uptime, but no problems so far, even trying abusive things like "ping -f -l64 -s8000". -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] via-velocity big-endian support
It doesn't look like you need a test report, but here's one anyway... I grabbed the patch series from git and am running it successfully right now. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?
Just to keep the issue open, drivers/net/ipg.c currently in 2.6.24-rc6 still leaks skbuffs like a sieve. Run it for a few hours with network traffic and the machine swaps like crazy while the oom killer goes nuts. diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index d9107e5..4fa392c 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -172,6 +172,10 @@ config IP1000 select MII ---help--- This driver supports IP1000 gigabit Ethernet cards. + It works, but suffers from a memory leak. Signifcant + use will consume unswappable kernel memory until the + machine runs out of memory and crashes. Thus, this + driver cannot be considered usable at the the present time. To compile this driver as a module, choose M here: the module will be called ipg. This is recommended. Or should it be demoted to BROKEN? It compiles, and sends and receives packets, which is better than a lot of BROKEN drivers. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
Prompted by davem, this attempt at fixing the memory leak actually appears to work. At least, leaving ping -f -s1472 -l64 running doesn't drop packets and doesn't show up in /proc/slabinfo. --- diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index dbd23bb..a0dfba5 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -1110,10 +1110,9 @@ enum { Frame_WithStart_WithEnd = 11 }; -inline void ipg_nic_rx_free_skb(struct net_device *dev) +inline void ipg_nic_rx_free_skb(struct net_device *dev, unsigned entry) { struct ipg_nic_private *sp = netdev_priv(dev); - unsigned int entry = sp->rx_current % IPG_RFDLIST_LENGTH; if (sp->RxBuff[entry]) { struct ipg_rx *rxfd = sp->rxd + entry; @@ -1308,7 +1307,7 @@ static void ipg_nic_rx_with_end(struct net_device *dev, jumbo->CurrentSize = 0; jumbo->skb = NULL; - ipg_nic_rx_free_skb(dev); + ipg_nic_rx_free_skb(dev, entry); } else { IPG_DEV_KFREE_SKB(jumbo->skb); jumbo->FoundStart = 0; @@ -1337,7 +1336,7 @@ static void ipg_nic_rx_no_start_no_end(struct net_device *dev, } } dev->last_rx = jiffies; - ipg_nic_rx_free_skb(dev); + ipg_nic_rx_free_skb(dev, entry); } } else { IPG_DEV_KFREE_SKB(jumbo->skb); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] drivers/net/ipg.c: fix horrible mdio_read and _write
akpm noticed that this code was awful. ipg.c | 158 +- 1 file changed, 43 insertions(+), 115 deletions(-) should be sufficient justification all by itself. --- diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index 3860fcd..b3d3fc8 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -202,12 +202,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity) } /* + * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1) + * of the PhyCtrl register. 1 <= len <= 32. "ioaddr" is the register + * address, and "otherbits" are the values of the other bits. + */ +static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, unsigned len) +{ + otherbits |= IPG_PC_MGMTDIR; + do { + /* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */ + u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA; + /* + rather than | lets compiler microoptimize better */ + ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits); + } while (len); +} + +/* * Read a register from the Physical Layer device located * on the IPG NIC, using the IPG PHYCTRL register. */ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg) { void __iomem *ioaddr = ipg_ioaddr(dev); + u8 const polarity = ipg_r8(PHY_CTRL) & + (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY); + unsigned i, data = 0; /* * The GMII mangement frame structure for a read is as follows: * @@ -221,75 +240,30 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg) * D = bit of read data (MSB first) * * Transmission order is 'Preamble' field first, bits transmitted -* left to right (first to last). +* left to right (msbit-first). */ - struct { - u32 field; - unsigned int len; - } p[] = { - { GMII_PREAMBLE,32 }, /* Preamble */ - { GMII_ST, 2 }, /* ST */ - { GMII_READ,2 }, /* OP */ - { phy_id, 5 }, /* PHYAD */ - { phy_reg, 5 }, /* REGAD */ - { 0x, 2 }, /* TA */ - { 0x, 16 }, /* DATA */ - { 0x, 1 }/* IDLE */ - }; - unsigned int i, j; - u8 polarity, data; - - polarity = ipg_r8(PHY_CTRL); - polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY); - - /* Create the Preamble, ST, OP, PHYAD, and REGAD field. */ - for (j = 0; j < 5; j++) { - for (i = 0; i < p[j].len; i++) { - /* For each variable length field, the MSB must be -* transmitted first. Rotate through the field bits, -* starting with the MSB, and move each bit into the -* the 1st (2^1) bit position (this is the bit position -* corresponding to the MgmtData bit of the PhyCtrl -* register for the IPG). -* -* Example: ST = 01; -* -* First write a '0' to bit 1 of the PhyCtrl -* register, then write a '1' to bit 1 of the -* PhyCtrl register. -* -* To do this, right shift the MSB of ST by the value: -* [field length - 1 - #ST bits already written] -* then left shift this result by 1. -*/ - data = (p[j].field >> (p[j].len - 1 - i)) << 1; - data &= IPG_PC_MGMTDATA; - data |= polarity | IPG_PC_MGMTDIR; - - ipg_drive_phy_ctl_low_high(ioaddr, data); - } - } - - send_three_state(ioaddr, polarity); - - read_phy_bit(ioaddr, polarity); + mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32); + mdio_write_bits(ioaddr, polarity, GMII_ST<<12 | GMII_READ << 10 | + phy_id << 5 | phy_reg, 14); /* * For a read cycle, the bits for the next two fields (TA and * DATA) are driven by the PHY (the IPG reads these bits). */ - for (i = 0; i < p[6].len; i++) { - p[6].field |= - (read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i)); - } + send_three_state(ioaddr, polarity); /* TA first bit */ + (void)read_phy_bit(ioaddr, polarity); /* TA second bit */ + + for (i = 0; i < 16; i++) + data += data + read_phy_bit(ioaddr, polarity); + /* Trailing idle */
[PATCH 2/3] drivers/net/ipg.c: convert Jumbo.FoundStart to bool
This is a fairly basic code cleanup that annoyed me while working on the first patch. --- diff --git a/drivers/net/ipg.h b/drivers/net/ipg.h index d5d092c..5d7cc84 100644 --- a/drivers/net/ipg.h +++ b/drivers/net/ipg.h @@ -789,11 +789,6 @@ struct ipg_rx { __le64 frag_info; }; -struct SJumbo { - int FoundStart; - int CurrentSize; - struct sk_buff *skb; -}; /* Structure of IPG NIC specific data. */ struct ipg_nic_private { void __iomem *ioaddr; @@ -809,7 +804,11 @@ struct ipg_nic_private { unsigned int rx_dirty; // Add by Grace 2005/05/19 #ifdef JUMBO_FRAME - struct SJumbo Jumbo; + struct SJumbo { + bool FoundStart; + int CurrentSize; + struct sk_buff *skb; + } Jumbo; #endif unsigned int rx_buf_sz; struct pci_dev *pdev; diff --git a/drivers/net/ipg.h b/drivers/net/ipg.h index d5d092c..5d7cc84 100644 --- a/drivers/net/ipg.h +++ b/drivers/net/ipg.h @@ -789,11 +789,6 @@ struct ipg_rx { __le64 frag_info; }; -struct SJumbo { - int FoundStart; - int CurrentSize; - struct sk_buff *skb; -}; /* Structure of IPG NIC specific data. */ struct ipg_nic_private { void __iomem *ioaddr; @@ -809,7 +804,11 @@ struct ipg_nic_private { unsigned int rx_dirty; // Add by Grace 2005/05/19 #ifdef JUMBO_FRAME - struct SJumbo Jumbo; + struct SJumbo { + bool FoundStart; + int CurrentSize; + struct sk_buff *skb; + } Jumbo; #endif unsigned int rx_buf_sz; struct pci_dev *pdev; diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index a0dfba5..3860fcd 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -1206,7 +1206,7 @@ static void ipg_nic_rx_with_start_and_end(struct net_device *dev, if (jumbo->FoundStart) { IPG_DEV_KFREE_SKB(jumbo->skb); - jumbo->FoundStart = 0; + jumbo->FoundStart = false; jumbo->CurrentSize = 0; jumbo->skb = NULL; } @@ -1257,7 +1257,7 @@ static void ipg_nic_rx_with_start(struct net_device *dev, skb_put(skb, IPG_RXFRAG_SIZE); - jumbo->FoundStart = 1; + jumbo->FoundStart = true; jumbo->CurrentSize = IPG_RXFRAG_SIZE; jumbo->skb = skb; @@ -1303,14 +1303,14 @@ static void ipg_nic_rx_with_end(struct net_device *dev, } dev->last_rx = jiffies; - jumbo->FoundStart = 0; + jumbo->FoundStart = false; jumbo->CurrentSize = 0; jumbo->skb = NULL; ipg_nic_rx_free_skb(dev, entry); } else { IPG_DEV_KFREE_SKB(jumbo->skb); - jumbo->FoundStart = 0; + jumbo->FoundStart = false; jumbo->CurrentSize = 0; jumbo->skb = NULL; } @@ -1340,7 +1340,7 @@ static void ipg_nic_rx_no_start_no_end(struct net_device *dev, } } else { IPG_DEV_KFREE_SKB(jumbo->skb); - jumbo->FoundStart = 0; + jumbo->FoundStart = false; jumbo->CurrentSize = 0; jumbo->skb = NULL; } @@ -1840,7 +1840,7 @@ static int ipg_nic_open(struct net_device *dev) #ifdef JUMBO_FRAME /* initialize JUMBO Frame control variable */ - sp->Jumbo.FoundStart = 0; + sp->Jumbo.FoundStart = false; sp->Jumbo.CurrentSize = 0; sp->Jumbo.skb = 0; dev->mtu = IPG_TXFRAG_SIZE; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
I take that back. This patch does NOT fix the leak, at least if ping: sendmsg: No buffer space available is any indication... I think I was reading slabinfo wrong. kmalloc-2048 42111 42112 204842 : tunables000 : slabdata 10528 10528 0 Sorry for the false hope. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] drivers/net/ipg.c: fix horrible mdio_read and _write
>> +do { >> +/* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */ >> +u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA; >> +/* + rather than | lets compiler microoptimize better */ >> +ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits); >> +} while (len); > Imho something is not quite right when the code needs a comment every line > and I am mildly convinced that we really want to honk an "optimizing mdio > methods is ok" signal around. Oh, but those are SPACE-saving optimiztions. :-) I know it's not time-critical; it's really pure hack value, but is it that evil? > "while (len--) {" is probably more akpm-ish btw. Well spotted. [...] >> static int mdio_read(struct net_device * dev, int phy_id, int phy_reg) >> { >> void __iomem *ioaddr = ipg_ioaddr(dev); >> +u8 const polarity = ipg_r8(PHY_CTRL) & >> +(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY); > (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY) appears twice. I would not > mind a #define for it. I'm hardly going to go to war over over the matter, but actually I disagree. There's a non-zero mental cost to keeping track of an additional name, and when it's only used two times, and is pretty simple, I think reducing the number of layers of #defines to understand is a positive advantage. The above reads "the two polarity bits from the PHY_CTRL register" to a person who's never read ipg.h. Adding IPG_PC_POLARITY_BITS just requires mentally dereferencing another layer of pointers. Think of it as a function small enough that it can be inlined. >> @@ -221,75 +240,30 @@ static int mdio_read(struct net_device * dev, int >> phy_id, int phy_reg) >[...] >> -for (i = 0; i < p[6].len; i++) { >> -p[6].field |= >> -(read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i)); >> -} >> +send_three_state(ioaddr, polarity); /* TA first bit */ >> +(void)read_phy_bit(ioaddr, polarity); /* TA second bit */ >> + >> +for (i = 0; i < 16; i++) >> +data += data + read_phy_bit(ioaddr, polarity); > Huh ? Okay, I guess you prefer + data = 2*data + read_phy_bit(ioaddr, polarity); That's only one character longer and easier to understand. Or even four characters: + data = (data<<1) + read_phy_bit(ioaddr, polarity); That's just the synonym that happened to come out of my fingers at the time. There's no particular meaning to it. >> @@ -299,11 +273,13 @@ static int mdio_read(struct net_device * dev, int >> phy_id, int phy_reg) >> static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int >> val) [...] >> +mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32); >> +mdio_write_bits(ioaddr, polarity, GMII_ST << 14 | GMII_WRITE << 12 | >> + phy_id << 7 | phy_reg << 2 | >> + 0x2, 16); > Use the 80 cols luke: > phy_id << 7 | phy_reg << 2 | 0x2, 16); Good spotting, thanks. Here's a revised patch: drivers/net/ipg.c: Fixed style problems that AKPM noticed. (And a few more while at it. Including an actual bug in enabling multicast due to & vs. && confusion.) diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c index 3860fcd..fb69374 100644 --- a/drivers/net/ipg.c +++ b/drivers/net/ipg.c @@ -188,9 +188,9 @@ static void send_end(void __iomem *ioaddr, u8 phyctrlpolarity) phyctrlpolarity) & IPG_PC_RSVD_MASK, PHY_CTRL); } -static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity) +static unsigned read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity) { - u16 bit_data; + unsigned bit_data; ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | phyctrlpolarity); @@ -202,12 +202,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity) } /* + * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1) + * of the PhyCtrl register. 1 <= len <= 32. "ioaddr" is the register + * address, and "otherbits" are the values of the other bits. + */ +static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, unsigned len) +{ + otherbits |= IPG_PC_MGMTDIR; + while (len--) { + /* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */ + u8 d = ((data >> len) & 1) * IPG_PC_MGMTDATA; + /* + rather than | allows slight code size microoptimization */ + ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits); + } +} + +/* * Read a register from the Physical Layer device located * on the IPG NIC, using the IPG PHYCTRL register. */ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg) { void __iomem *ioaddr = ipg_ioaddr(dev); + u8 const polarity = ipg_r8(PHY_CTRL) & + (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
> Can you try the patch below ? Testing now... (I presume you noticed the one-character typo in my earlier patch. That should be "mc = mc->next", not "mv = mc->next".) That doesn't seem to do it. Not entirely, at least. After downloading and partially re-uploading an 800M file, slabtop reports: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 341576 341574 99%0.50K 426978170788K kmalloc-512 342006 341953 99%0.19K 16286 21 65144K kmalloc-192 30592 30575 99%2.00K 76484 61184K kmalloc-2048 30213 30193 99%0.44K 33579 13428K skbuff_fclone_cache 7650 7643 99%0.08K150 51 600K sysfs_dir_cache 4000 3938 98%0.12K125 32 500K kmalloc-128 258258 100%1.15K 436 344K raid5-md5 232221 95%1.00K 584 232K kmalloc-1024 3136 3110 99%0.06K 49 64 196K kmalloc-64 264 80 30%0.68K 24 11 192K ext3_inode_cache The "kmalloc-2048" was down in the noise before the upload started. This is in single-user mode, after sync and echo 3 > /proc/sys/vm/drop_caches. I'll have to try this after this evening's social plans, but I'm thinking of implementing more rapid bug detection: explicitly zero the sp->TxBuff slot when the skb is freed, and check that it is zero before putting anything else in there. (And likewise for RxBuff.) That way, I don't have to use up a noticeable amount of memory to see the bug and reboot to clear up the damage each test cycle. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SACK scoreboard
Just some idle brainstorming on the subject... It seems the only way to handle network pipes sigificantly larger (delay * bandwidth product) than the processor cache is to make freeing retransmit data o(n). Now, there are some ways to reduce the constant factor. The one that comes to mind first is to not queue sk_buffs. Throw away the struct sk_buff after transmission and just queue skb_frag_structs, pages, or maybe even higher-order pages of data. Then freeing the data when it's acked has a much smaller constant factor, particularly d-cache footprint, and no slab operations. The downside is more work to recreate the skb if you do have to retransmit, but optimizing for retransmits is silly. Some implementations could leave large chunks of memory locked until all of the sk_buff->skb_shared_info->skb_frag_structs referencing them have gone away, but you can look at the transmit window when deciding how big a chunk size to use. Then, to actually get below O(n), you want to keep the queued data in a data structure known to the memory manager. Basically, splice the retransmit queue onto the free list. It may require some kludgery in the memory manager. In particular, doing that in O(1) time obviously means that you can't coalesce adjacent free regions to build higher-order pages. So you'd have to have a threshold for uncoalesced pages and a way to force coalescing under memory pressure. You're just deferring work until the page is allocated, but the point is that then it's okay to bring it into cache when it's about to be used again. It's the redundant round trip just because an ack arrived that's annoying. I've done thins sort of thing with specialized fixed-block-size allocators before (an alpha-beta minimax search tree allocates nodes one at a time, but frees whole subtrees at once), but might it be feasible for kernel use? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ipg.c bugs
I'm just about to test that second memory leak patch, but I gave the original code a careful reading, and found a few problems: * Huge monstrous glaring bug In ipg_interrupt_handler the code to habdle a shared interrupt not caused by this device: if (!(status & IPG_IS_RSVD_MASK)) goto out_enable is *before* spin_lock(&sp->lock), but the code following out_enable does spin_unlock(&sp->lock). Thus, the sp->lock is all f*ed up. The lack of any sort of locking between the interrupt handler and hard_start_xmit could cause all sort of issues. I'm not actually sure if it's even necessary; I'd think some suitable atomic access to sp->tx_current would suffice. * Lesser bugs There's a general pattern of loops over the range from s->rx_current to sp->rx_dirty. Some of them are call code that refers to s->rx_current, even though that hasn't been updated yet. One instance is in ipg_nic_check_frame_type. A second is in ipg_nic_check_error. In ipg_nic_set_multicast(), the code to enable the multicast flags is of the form "if (dev->flags & IFF_MULTICAST & (dev->mc_count > ...))". But IFF_MULTI CAST is not 1, so this will always be false. The seond & needs to be && (2x). In ipg_io_config(), there's /* Transmitter and receiver must be disabled before setting * IFSSelect. */ ipg_w32((origmacctrl & (IPG_MC_RX_DISABLE | IPG_MC_TX_DISABLE)) & IPG_MC_RSVD_MASK, MAC_CTRL); I don't know what's going on there, but unless the IPG_MC_RX_DISABLE bit is already set in origmacctrl, that's going to write 0, which won't disable anything. Immediately following, there's some similarly buggy code doing something I don't understand with IPG_MC_IFS_96BIT. The setting of curr in ipg_nic_txfree, with that bizarre do_div, can't possibly be working right. * Possible bugs I'm not very sanguine about the handling in init_rfdlist, of the code that handles a failed ipg_get_rxbuff. In particular, it leaves rxfd->frag_info uninitialized in that case, but does set rxfd->rfs to "buffer ready to be received into", which could lead to receiving into random memory locations. In ipg_nic_hard_start_xmit(), the code if (sp->tx_current == (sp->tx_dirty + IPG_TFDLIST_LENGTH)) netif_wake_queue(dev); shouldn't that *stop* the queue if the TFDLIST is full? I think that the places where the rxfd->rfs and txfd->tfc fields are filled in (containing the hardware-handoff flag) should have memory barriers. * Stupid code In ipg_io_config, there are three writes to DEBUG_CTRL "Per silicon B3 eratta". First, that's "errata". But more significantly, can those writes be combined into one? Is it necessary to read the DEBUG_CTRL register each time? The initialization of rxfd->rfs in init_rfdlist() and ipg_nix_rxrestore() should be moved into ipg_get_rxbuf(). And since the ready bit is there, it should be set AFTER the pointer fields AND there should be a barrier so the hardware doesn't read the fields out of order. In ipg_nic_txcleanup(), there's code to call netif_wake_queue every time through the loop in 10 MBit mode (to balance some bug-workaround call that stops the queue every packet in that case), which is quite unnecessary, as ipg_nic_txfree() will do it. The IPG_INSERT_MANUAL_VLAN_TAG code (fortunately disabled by default) is just plain bizarre. What exactly is the use of assigning a tag of 0xABC to every packet? The code in ipg_hw_init to set up dev->dev_addr reads each of the 16-bit address reigsters twice, for no apparent reason. There's a lots of code in e.g. ipg_nic_rx() that does endless manipulation of rxfd->rfs with an le64_to_cpu() call around each instance, that should copy it to a CPU-ordered native value and be done with it. (Some sparse annotations would help, too.) Likewise for messing with txfd->tfc in ipg_nic_hard_start_xmit(). The Frame_WithEnd enum is a very strange value (decimal 10) to use as a bitmapped status flag. The four frame fragment functions nic_rx_with_start_and_end nic_rx_with_start nic_rx_with_end nic_rx_so_start_no_end could easily be unified into one. * Performance left on the floor The hardware supports scallter/gather, hardware checksums, VLAN tagging, and 64-bit (well, 40-bit) DMA, but the driver sets no feature flags. The jumbo frame reception code could generate fragmented skbs rather that doing all those memcopies. Would it be worth splitting the 64-bit ->rfs and ->txc fields into two 32-bit fields? Would it be worth copying small incoming packets to small skbs and keeping the large skb in the receive queue? * Questions In net_device_stats, are all those statistics registers cleared by a read? How do we determine the silicon revision numbers, so we can stop enabling bug workarounds on versions that don't need i
Re: [PATCH 0/4] Pull request for 'ipg-fixes' branch
Thank you very much, this appears to work. > The driver is still a POMS but it seems better now. I notice that the vendor-supplied driver doesn't have these bugs. Now, it does have a bug in that it doesn't have an "is this interrupt for me?" test at all (and always returns "I handled it"), but the bypass and its locking screwups are a later addition. The same with the sp->rx_current bugs. The original loop which used rx_current as the loop iteration variable wasn't great style, precisely because it hides the interaction that someone's "optimization" broke, but I don't want to blame the vendor for things they didn't do. Would you be interested in some cleanup patches? In particular, I think I can get rid of tx->lock entirely, or at least take it off the fast path. All it's protecting is the write to sp->tx_current, and a few judicious memory barriers can deal with that. (Oh, another BUG: the sp->ResetCurrentTFD logic in hard_start_xmit is just plain broken. It writes the new data to entry 0, then increments sp->tx_current just like usual. THAT isn't in the vendor driver that I see, either.) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Extensible hashing and RCU
I noticed in an LCA talk mention that apprently extensible hashing with RCU access is an unsolved problem. Here's an idea for solving it. I'm assuming the table is a power of 2 in size with open chaining for collisions. When the chains get too long, the table is doubled. When the chains get too short, the table size is halved. - Compute a sufficiently large (32-bit?) hash value for each entry. "Sufficiently large" is large enough for the largest possible hash table. - The hash value is stored with each entry. (Not strictly necessary if the update rate is sufficiently low.) - The table is indexed on the *high* bits of the hash value. As it grows, additional bits are appended to the hash value. - Each chain is stored in sorted order by hash value. (This is why storing the hash value is an efficiency win.) To double the size of a hash table: - Allocate new, larger, array of head pointers. - The even slots are copied from the smaller hash table. - The odd slots are initialized to point to the middle of the hash chains pointed to by the odd slots. However, the even chains are NOT terminated yet; a search through one of them will go through the full chain length. - The new table is declared open for business. - Wait for RCU quiescent period to elapse, so there are no more readers of the old table. - NOW truncate the even chains by setting the next pointers to NULL. - Deallocate and free the old array of head pointers. Likewise, to halve the size, copy the even heads to a smaller table, link the odd heads onto the tails of the even chains, copy to a smaller table, and declare it open for business. When an RCU quiescent period has elapsed, you can delete the old table. Ths insight is that RCU makes taking stuff out of a linked list very delicate, and moving it while preserving access is basically impossible. But you can append extraneous junk to the end of a hash chain harmlessly enough and share the structure. Thus, there is a period of overlap when both the old and the new hash tables are valid and functional. Indeed, after each of the above steps, you can actually allow new insertions into the hash table while waiting for the RCU quiescent period. If the insertion is at the head of chain, it won't be seen by readers of the old table, but that's harmless. The trickiest case I can think of is the deletion of a table entry at the head of an odd chain while an expansion is pending. When scanning the even chain afterwards to find where to truncate it, you can't compare node->next to the odd chain head; you have to look at the now deleted node's hash code and see that it exceeds the threshold for the even chain. (Equivalently, you can test to see if the appropriate bit of the hash code is set.) So that hash chain walking has to be done BEFORE the node is actually deleted. This requires an ordering guarantee on RCU callbacks, either a priority system or FIFO. call_rcu looks like it uses FIFO order, but it's per-CPU lists. Ah! It's worse than that. Even after the first RCU quiescent period, there still could be a walker of the even chain holding a pointer to the newly-deleted odd chain head. Thus, it can't actually be reclaimed until a *second* RCU quiescent period has elapsed. The first RCU period is to get rid of anyone who needs the link, then you remove it, then you need to wait until there's nobody who's still using it. Still, it's probably not too horrible. (You could index the hash table on the low-order bits, but then you need to keep the chains sorted by bit-reversed hash value, which is probably more of a pain. Still pretty easy, though. To compare x and y bit-reversed, just let mask = x^y; mask ^= mask-1; compare (x&mask) to (y&mask).) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/TOY]Extensible hashing and RCU
> For purposes of discussion, I've attached a "toy" implementation > for doing dynamic resizing of a hashtable. It is useless, except > as a proof of concept. > > I think this is very similar to what you are describing, no? Er... not quite; that has a lot more overhead than what I was thinking about. I have used the trick of distinguishable sentinel values in a doubly-linked list to maintain read cursors while it's being updated, but I don't think that's necessary here. (You can also encode the nh_type flag in the lsbit of the pointer if you're being sneaky. That will attract curses from the memleak detector, though.) In particular, I was imagining a singly linked list. To delete an item, use the hash to find the head pointer and walk it to find the pointer to be fixed up. Since the chains are short anyway, this is entirely reasonable. Less fundamental comments include: 1) Is the seqlock in get_nh() and nh_replace() really required? Is there any architecture that doesn't have atomic pointer stores? If you wanted to store the table size in a fixed location as well, I could see the need... 2) I think the whole __nh_sort_chain business will royally confuse anyone walking the chain while it happens. This is exactly what I was working to avoid. The partial sorting in __nh_insert isn't good enough. Instead, try: /* Return true if bitrev(x) > bitrev(y) */ static bool bitrev_gt(unsigned long x, unsinged long y) { /* Identify the bits that differ between x and y */ unsigned long mask = x ^ y; /* Find the bits that differ */ mask ^= mask-1; /* Find lsbit of difference (and all lower bits) */ return (x & mask) > (y & mask); } static void __nh_insert(struct nh_entry *entry, struct nh_head *head) { struct list_head *p, *n; unsigned long const hashval = nh_hashval(entry->data); /* * Insert the new entry just before the first element of the list * that its hash value is not greater than (bit-reversed). */ p = &head->list; list_for_each_rcu(n, &head->list) { struct nh_entry *t = container_of(n, struct nh_entry, nh_list); if (t->nh_type == NH_ENTRY && !bitrev_gt(hashval, nh_hashval(t->data))) break; p = n; } __list_add_rcu(&entry->nh_list, p, n); } static int nh_add(unsigned long data) { struct nh_entry *entry = kmalloc(sizeof *entry, GFP_KERNEL); struct nh *nh; if (!entry) return -ENOMEM; entry->nh_type = NH_ENTRY; entry->data = data; rcu_read_lock(); nh = get_nh(); /* or nh = __nh */ if (nh) { struct nh_head *h = \ &nh->hash[ nh_bucket(nh_hashval(data), nh->nentries) ]; spin_lock(&h->lock); __nh_insert(entry, h); spin_unlock(&h->lock); } rcu_read_unlock(); } Then there's no need for __nh_sort_chain at all. Alternatively, if the upper bits of nh_hashval are as good as the lower bits, just index the hash table on them. 3) Code inside a mutex like nh_resize() can use plain list_for_each(); the _rcu variant is only required if there can be simultaneous mutation. That's a nice module framework. I'll see if I can write some code of the sort I was thinking about. FWIW, I figured out a way around the need to delay deletion for two RCU intervals. Suppose that an expansion is pending, and we have just stretched the table from 16 entries to 32, and the following hash values are stored. (Note the bit-reversed order.) old[3] --\ new[3] ---+-> 0x03 -> 0x43 -> 0x23 -> 0x63 -> 0x13 -> 0x53 -> 0x33 -> 0x73 / new[3+16]---/ After an RCU period, you can throw away the old[] array and NUL-terminate the new[i] list after 0x63. But until then, you need to leave the list alone to accomodate people who are looking for 0x53 via the old head. The tricky case comes when you delete 0x13. If you only delete it from the new[3+16] list, you can't discard it until the RCU quiescent period after the one which dicsonnects the 0x63->0x13 link. The solution is actually very simple: notice when you're - Deleting the first entry in a list - While an expension is pending - And the list is in the second half of the expanded table then unlink the entry from BOTH the new head and the old list. It's a bit more work, and requires some lock-ordering care, but it lets you queue the node for RCU cleanup the normal way. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Extensible hashing and RCU
> I think you misunderstood me. If you are trying to DoS me from > outside with a hash collision attack, you are trying to feed me > packets that fall into the same hash bucket. The Jenkins hash does > not have to be artifact-free, and does not have to be > cryptographically strong. It just has to do a passable job of mixing > a random salt into the tuple, so you don't know which string of > packets to feed me in order to fill one (or a few) of my buckets. > XORing salt into a folded tuple doesn't help; it just permutes the > buckets. If you want to understand this more formally, read up on "universal families of hash functions," which is the name cryptologists give to this concept. When used according to directions, they are actually *more* secure than standard cryptographic hashes such as MD5 and SHA. The key difference is that *the attacker doesn't get to see the hash output*. The basic pattern is: - Here's a family of hash functions, e.g. a salted hash function. - I pick one at random. (E.g. choose a salt.) - Now your challenge is to generate a pair of inputs which will collide. - Note that you never get to see sample input/output pairs of the hash function. All you know is that it's a member of the family. - It is surprisingly easy to find families of size N such that an attacker has on the order of a 1/N chance to construct a collision. - This remains true even if you assume that the attacker has infinite computational power. This pattern corresponds exactly to an attacker trying to force collisions in a hash table they can't see. As far as I know, nobody has proved salted jash a truly universal family, but so many amazingly simple algorithms have been proved universal that it wouldn't surprise me if it was. For example, the family of all CRCs computed modulo n-bit primitive polynomials is a universal family. If you do know the polynomial, it's ridiculously easy to build a collision. If you don't, it's provably impossible. (Footnote: the chance isn't exactly 1/N, but also depends on the size of the input relative to the size of the hash. With bigger inputs, it's easier to make them match according to more of the hashes. Ultimately, if you have N k-bit CRC polynomials, you can make them all collide with an N*k-bit input. But since N is proportional to 2^k, it's easy to make k big enough that this is impractical.) The rehash-every-10-minutes detail is theoretically unnecessary, but does cover the case where a would-be attacker *does* get a chance to look at a machine, such as by using routing delays to measure the effectiveness of a collision attempt. Now, as for flaming about how xor generates more uniform distributions than jhash - that's to be expected from a weak hash. By relying on non-uniform properties of the input (particularly that hosts tend to walk linearly through the source port space), you can make hash values walk linearly through your hash table, and get a completely even distribution rather than a *random* one. This is great for efficiency, but depends on letting patterns in the hash input through to the output, which is exactly the property that makes it vulnerable to a deliberate DoS attempt. If you want to test your distribution for randomness, go have a look at Knuth volume 2 (seminumerical algorithms) and see the discussion of the Kolmogorov-Smirnov test. Some lumpiness is *expected* in a truly random distribution. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [EMAIL PROTECTED]: Re: [RFC PATCH 34/35] Add the Xen virtual network device driver.]
omains with non-full accounts. So after an initial accumulation period to fill up the buffers, the available entropy is divided evenly among all the domains that want it. I don't know how Xen works at all, whether it's easier to buffer the entropy in domain0 until requested or immediately push it to the subdomains, but either way, it's doable. So I guess, before doing any fancy design, it's worth asking: do people prefer to have entropy be a service that the Xen hypervisor delivers to client domains, or should the domains manage it themselves? They may not both be practicable, but which do you people to explore first? A few more issues which have arisen since /dev/random was first written: - Modern processors change clock rate, causing a real-world jitter number to translate into a variable number of timestamp ticks. +/-10 ns may be +/-32 timestamp ticks, or less if the clock is running slower. The most recent processors run their timestamp counters at a fixed rate, regardless of clock divisor, by incrementing it by more than one per cycle at times. But either way, you still have to reduce the entropy estimate when reducing clock speed. - Wireless keyboards and mice are a lot less unobservable than wired ones. - On the upside, full-speed timestamp counters are widely available, as are > 1 GHz clock rates, making for a rich source of clock jitter. Oh, and on the theoretical front, there's been a lot of research into so-called "randomness extraction functions". In particular, it's been shown that Shannon entropy (the sum, over the various random possibilities i = 1, 2, ... n of -p[i] * log(p[i])) is not possible to base a secure extractor on; you need your sources to have good min-entropy min -log(p[i]). In my previous post to linux-kernel, I completely forgot about this... arrgh, have to post a retraction. Anyway, min-entropy, being simply the negative log of the highest probability, is always less than or equal to the Shannon entropy. It's equal for uniform distributions (all choices equally likely), but more conservative for lopsided distributions. Here's the classic teaching example: say you have a source, which produces 31 truly random bits (0..0x7fff) half the time, but produces -1 (0x) the other half of the time. (If this seems too trivial, assume it is encrypted with a one-time pad known only to the attacker; that doesn't change the analysis.) It is simple to compute the Shannon entropy of this source: 16.5 bits per sample. p[-1] = 1/2, while p[0..0x7fff] = 2^-32, and plug all that into the Shannon entropy formula. Now, if I take 8 samples from this source (total entropy 132 bits) and mix them up together (say, with MD5), I should get a good 128-bit key, right? But 1/256 of the time, the MD5 input is simply zero and the attacker knows my key in one guess. An additional 1/32 of the time, only one of the 8 samples was random and there's only 34 bits of entropy in my key. (31 for the sample value plus 3 for the sample number.) The reason for this paradox is that, half of the time, my input contains more than 128 bits of entropy, and compressing it with MD5 is throwing the excess away. The naive Shannon entropy computation is averaging that excess entropy with the low-entropy cases, which is not valid if you are producing finite-length output. The min-entropy measure of 1 bit per sample correctly predicts the 8-bit min-entropy of the output. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
VIA Velocity VLAN vexation
I have a machine (x86-32, 2.6.20.3) with two ethernet interfaces: a 100M Tulip and a 1G VIA Velocity. Both are connected to a common VLAN-capable switch. The eventually desired configuration is VLAN support on the Gbit interface. If I set the Tulip's switch port to tagged, and configure a VLAN on the Tulip interface appropriately, packets flow as expected. But if I try the same configuration on the Velocity interface, things don't work. I can see tagged ICMP pings go out, but no responses come back. I can see ARP requests and responses on the target machine. If I manually configure the ARP caches, I can see the pings and responses on the target machine. If I kludge the target's ARP cache to point back to the source's Tulip interface, I can see the ping responses on the Tulip interface. But I don't see the ping responses on the Velocity interface. The vlan interface name and address is the same, so it can't be firewall rules distinguishing. I have tried various ping sizes from 0 to 1472. Is this likely to be a problem with the via-velocity driver? Is anyone working on it? Or should I just get a different gigabit card? Thanks for any advice! 00:09.0 Ethernet controller [0200]: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter [1106:3119] (rev 11) 00:0d.0 PCI bridge [0604]: Digital Equipment Corporation DECchip 21152 [1011:0024] (rev 03) 02:04.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 21142/43 [1011:0019] (rev 41) 02:05.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 21142/43 [1011:0019] (rev 41) 02:06.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 21142/43 [1011:0019] (rev 41) 02:07.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 21142/43 [1011:0019] (rev 41) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VIA Velocity VLAN vexation
>> Or should I just get a different gigabit card ? > > This one probably got answered the 2005/11/29. :o) Ah, that's where I asked before. I misplaced the e-mail. I hope you don't mind my asking every year or two. But I don't see any suggestions for an alternative gigabit card anywhere. I had assumed they all mostly worked, but now it appears I need to know details. > I'll got to bed in a few minutes but I'll happily resurrect the > velocity vlan patches. Haven't they been merged upstream already? Anyway, thanks for the reply! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
> Result with jenkins: > 1 23880 > 2 12108 > 3 4040 > 4 1019 > 5 200 > 6 30 > 7 8 > 8 1 > > Xor: > 1 65536 Precisely. This means that the Xor hash SUCKS, because its output is conspicuously non-random. What you expect is a Poisson distribution, where the chance that a chain will contain k elements is P(lambda,k) = e^-lambda * lambda^k / k! lambda is the average loading per chain. In your example, it's 1 (65536 inputs, 65536 outputs). (^ is exponentiation, ! is factorial) So the distribution I expect to get is: 0 24109.347656 1 24109.347656 2 12054.673828 3 4018.224609 4 1004.556152 5 200.911224 6 33.485203 7 4.783601 8 0.597950 9 0.066439 10 0.006644 Whick looks a HELL of a lot like what you observed. (The jenkins result above has 24250 chains with no entries.) Now, you can sometimes use properties of the inputs to get a distribution that is more uniform than random, by letting the distribution of the input "show through" the hash funciton. Which the xor hash does. But this depends on making assumptions about the input distribution, which means that you're assuming that they're not being chosen maliciously. If an attacker is choosing maliciously, which is a required assumption in today's Internet, the best you can do is random. Now, the core Jenkins hash mix function basically takes three inputs. What jhash_3words does with it is: a += K b += K c += seed __jhash_mix(a, b, c) return c; Now, the ipv4 hash operation fundamentally involves 96 bits of input, which is a good match for jhash. If you want to add a salt, perhaps the simplest thing would be to just replace those constants K with a 96-bit salt and be done with it: a = (lport<<16) + rport + salt[0]; Xb = laddr + salt[1]; c = raddr + salt[2]; __jhash_mix(a,b,c) return c; Regarding control by attackers, let's consider the four inputs and see how much information an attacker can insert into each one: remote port: An attacker has complete control over this. 16 bits. remote address: Depends on the size of the bit-net. Can vary from 0 bits (one machine) to 20 bits for a large bot-net. local address: Limited to the number of addresses the local machine has. Typically 0 bits, rarely more than 2 bits. May be much larger (8 bits or more) for stateful firewalls and other sorts of proxies. local port: Limited to the number of open server ports. Typically 3-6 bits, but may be lower on heavily firewalled machines. Certainly combining any two of these in a predictable way without some non-linear salting makes an attacker's job easier. While folding the local and remote addresses before hashing is usually safe because the local address is usually unique, there are situations in which there are a large number of possible local addresses. For example, it allows an attacker with a /24 to attack, say, a stateful firewall guarding a /24. If I have my machine at address a.b.c.d connect to remote machine x.y.z.~d, then they always fold to a^x.b^y.c^z.255, and I can, by making the local and remote paorts identical for all the attacks, force 254-entry hash chains without knowing anything else about the hash function, salt, or whatever. An interesting question is whether it's better to mix salt into the bits an attacker has most or least control over. It's not immediately obvious to me; does anyone else have insight? Just mixing in 96 bits over everything does seem to render the question moot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
"uli526x: I/O base is zero"
I've got a rather awkward debugging situation. I helped a friend in another city set up a dual-boot Linux/Windows box a while ago, and it just got a motherboard upgrade. Unfortunately, I had followed my usual instincts and built a custom kernel which didn't include the new motherboard's drivers. If I can just get the network working, I can log in remotely and get everything else going, but until then, I have to instruct someone in kernel debugging over the telephone. The motherboard is an MSI K9NU Neo-V ULi M1697 AM2 motherboard, and PCI device :00:12.0 is an M5263 Ethernet controller, 10b9:5263(rev 60). It's an older but not ancient 2.6 kernel. 2.6.14, I think, although I can't seem to find where I wrote it down. The previous system, which I set up and was running fine, was a single-core K8, socket 939, with an nForce4 chipset. The new one is dual-core and ULi M1697. So there are a lot of similarities. Anyway, we got SMP enabled, and the uli526x driver enabled. But the network still didn't work. Booting with uli526x.debug=1 produced uli526x: uli526x_init_one() 0 ACPI: PCI interrupt :00:12.0[A]->GFI 20(level,low)->IRQ 50 uli526x: I/O base is zero Where I'm confused is the "I/O base is zero" message. Obviously, this is a fatal error to the device initialization, but I'm not sure what causes it. The obvious "type the error message into google" only produces a couple of disk images in Romania. The next step is probably to make and use a more recent boot CD. But just in case, can I ask: Can any psychic wizard here suggest, from this very fragmentaty information, some simple thing I have overlooked that would cause this problem? Thank you! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VIA "Velocity" test report - VLAN reception not working
> Btw, you may consider using netdev@vger.kernel.org instead of > the obsolete [EMAIL PROTECTED], especially as M. Cox is in India. Oh! I remember once making the opposite error and getting a bounce, so the fact that netdev is NOT hosted as vger is stuck in my head. I guess it's changed now. (Of course, I have only recently gotten my fingers to stop auto-completing "vger.rutgers.edu", so that might have been a while ago...) > If you can put the card in a crashme/testme computer, feel free to try > the patches at: > http://www.zoreil.com/~romieu/linux/kernel/2.6.x/2.6.15-rc2/via-velocity/20051128 Neat, thanks! Are they actually likely to mess up the host or make it unstable, or are you just saying "hey, these are for TESTING, capiche?" I currently have the card installed in a machine, and while it's a production machine, taking it down to single-user at some odd hour and doing a bit of testing is not really any more outage, and less effort, than taking the machine down, removing the card, reconfiguring the net without it, installing it in a different machine, testing there, and reversing the process to put the card back. Is it so bad there's serious concern it might corrupt a mounted file system? > The patches apply on top of each other. I'd suggest doing a first round > of testing without VLAN to check that the usual flow did not experience > collateral damages. > > If it works fine, enable VLAN when the last patch is applied and add > a single vlan with vconfig. If it does not crash, tcpdump + ping in > both direction w/wo VLAN may help fix the issues. Great! Tonight is got good, but I'll get to it soon. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VIA "Velocity" test report - VLAN reception not working
> I expect the worst behavior to simply translate into a mute interface > with or without VLAN but... Actually, I upgraded to 2.6.15-rc3 plus your patches, and the behaviour is simply exactly the same. (I've only compiled one 2.6.15-rc3 kernel, so I can't have possibly booted the wrong one.) Tcpdump on a tagged port with the velocity driver: # tcpdump -n -i eth0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 03:54:26.437624 802.1d unknown version 03:54:28.438430 802.1d unknown version 03:54:30.438488 802.1d unknown version 03:54:32.442433 802.1d unknown version 03:54:33.446611 LACPv1, length: 110 03:54:34.55 802.1d unknown version 03:54:36.445276 802.1d unknown version 03:54:38.448251 802.1d unknown version 03:54:40.448322 802.1d unknown version 03:54:42.451189 802.1d unknown version 10 packets captured 20 packets received by filter 0 packets dropped by kernel While listening on the same port with a Tulip-based 100baseT card: # tcpdump -n -i eth1 tcpdump: WARNING: eth1: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes 03:55:28.295843 vlan 2, p 0, arp who-has 192.35.100.95 (ff:ff:ff:ff:ff:ff) tell 192.35.100.92 03:55:28.492928 802.1d unknown version 22:55:28.546414 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.23 03:55:28.739570 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.110 03:55:28.748937 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.86 03:55:29.739545 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.110 03:55:29.748873 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.86 03:55:30.126945 vlan 2, p 0, IP 192.35.100.23.41946 > 198.69.104.19.53: 28350+ PTR? 59.100.35.192.in-addr.arpa. (44) 03:55:30.174782 vlan 2, p 0, arp who-has 192.35.100.1 tell 192.35.100.3 03:55:30.494294 802.1d unknown version 03:55:30.739527 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.110 03:55:30.748865 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.86 03:55:30.872635 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.38 03:55:31.174739 vlan 2, p 0, arp who-has 192.35.100.1 tell 192.35.100.3 03:55:31.228356 vlan 2, p 0, arp who-has 192.35.100.1 tell 192.35.100.59 03:55:31.739538 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.110 03:55:31.748865 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.86 03:55:31.755004 vlan 2, p 0, arp who-has 192.35.100.95 (ff:ff:ff:ff:ff:ff) tell 192.35.100.92 03:55:31.872379 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.38 19 packets captured 38 packets received by filter 0 packets dropped by kernel Basiclly, all the vlan packets disappear, even during promiscuous receive. Thanks for trying, though! I'm running your patches untagged at the moment with no obvious problems. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
4.8.0-rc1: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
L.S., Just tested 4.8.0-rc1, but i get the stack trace below, everything seems to continue fine afterwards though (haven't tried to bisect it yet, hopefully someone has an insight without having to go through that :) ) My network config consists of a bridge and NAT. -- Sander [10469.336815] swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP) [10469.336820] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1-20160808-linus-doflr+ #1 [10469.336821] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [10469.336825] 88005f603228 81456ca5 [10469.336828] 0003 88005f6032b0 811633ed 020840205fd0f000 [10469.336830] 88005f603278 02084028 00035fd0f500 [10469.336832] Call Trace: [10469.336834][] dump_stack+0x87/0xb2 [10469.336845] [] warn_alloc_failed+0xdd/0x140 [10469.336847] [] __alloc_pages_nodemask+0x3e1/0xcf0 [10469.336851] [] ? check_preempt_curr+0x4f/0x90 [10469.336852] [] ? ttwu_do_wakeup+0x12/0x90 [10469.336855] [] alloc_pages_current+0x8d/0x110 [10469.336857] [] kmalloc_order+0x1f/0x70 [10469.336859] [] __kmalloc+0x129/0x140 [10469.336861] [] bucket_table_alloc+0xc1/0x1d0 [10469.336862] [] rhashtable_insert_rehash+0x5d/0xe0 [10469.336865] [] ? __nf_nat_l4proto_find+0x20/0x20 [10469.336866] [] nf_nat_setup_info+0x2ef/0x400 [10469.336869] [] nf_nat_masquerade_ipv4+0xd5/0x100 [10469.336870] [] masquerade_tg+0x32/0x40 [10469.336872] [] ipt_do_table+0x29e/0x3b0 [10469.336873] [] iptable_nat_do_chain+0x1a/0x20 [10469.336875] [] nf_nat_ipv4_fn+0x12f/0x1e0 [10469.336876] [] ? iptable_nat_ipv4_fn+0x20/0x20 [10469.336877] [] nf_nat_ipv4_out+0x37/0x40 [10469.336878] [] iptable_nat_ipv4_out+0x10/0x20 [10469.336880] [] nf_iterate+0x58/0x70 [10469.336881] [] nf_hook_slow+0x5f/0xb0 [10469.336884] [] ip_output+0xb5/0xd0 [10469.336886] [] ? ip_fragment.constprop.43+0x80/0x80 [10469.336887] [] ip_forward_finish+0x3b/0x60 [10469.336888] [] ip_forward+0x2c8/0x390 [10469.336890] [] ? ip_frag_mem+0x40/0x40 [10469.336891] [] ip_rcv_finish+0x1b5/0x3a0 [10469.336892] [] ip_rcv+0x279/0x380 [10469.336895] [] ? skb_copy_ubufs+0xf2/0x290 [10469.336896] [] ? ip_local_deliver_finish+0x120/0x120 [10469.336898] [] __netif_receive_skb_core+0x2d2/0x9e0 [10469.336900] [] __netif_receive_skb+0x11/0x70 [10469.336901] [] netif_receive_skb_internal+0x1e/0x80 [10469.336902] [] ? nf_hook_slow+0x5f/0xb0 [10469.336906] [] netif_receive_skb+0x9/0x10 [10469.336910] [] br_pass_frame_up+0x6e/0xe0 [10469.336911] [] ? __br_handle_local_finish+0x40/0x40 [10469.336913] [] br_handle_frame_finish+0x123/0x4a0 [10469.336914] [] ? nf_nat_ipv4_fn+0x18e/0x1e0 [10469.336916] [] br_nf_pre_routing_finish+0x183/0x380 [10469.336918] [] ? br_pass_frame_up+0xe0/0xe0 [10469.336919] [] br_nf_pre_routing+0x2b2/0x390 [10469.336920] [] ? br_nf_forward_ip+0x410/0x410 [10469.336921] [] nf_iterate+0x58/0x70 [10469.336922] [] nf_hook_slow+0x5f/0xb0 [10469.336924] [] br_handle_frame+0x1ce/0x2d0 [10469.336926] [] ? br_pass_frame_up+0xe0/0xe0 [10469.336927] [] ? br_handle_local_finish+0x40/0x40 [10469.336928] [] __netif_receive_skb_core+0x12b/0x9e0 [10469.336932] [] ? set_phys_to_machine+0x14/0x40 [10469.336934] [] ? set_foreign_p2m_mapping+0x1a0/0x3a0 [10469.336935] [] __netif_receive_skb+0x11/0x70 [10469.336937] [] netif_receive_skb_internal+0x1e/0x80 [10469.336939] [] netif_receive_skb+0x9/0x10 [10469.336941] [] xenvif_tx_action+0x693/0x820 [10469.336944] [] ? __handle_irq_event_percpu+0x31/0x100 [10469.336945] [] xenvif_poll+0x29/0x70 [10469.336949] [] ? do_raw_spin_unlock+0x55/0xa0 [10469.336950] [] net_rx_action+0x211/0x320 [10469.336953] [] __do_softirq+0x103/0x210 [10469.336955] [] irq_exit+0x4b/0xa0 [10469.336957] [] xen_evtchn_do_upcall+0x30/0x40 [10469.336961] [] xen_do_hypervisor_callback+0x1e/0x40 [10469.336962][] ? xen_hypercall_sched_op+0xa/0x20 [10469.336965] [] ? xen_hypercall_sched_op+0xa/0x20 [10469.336967] [] ? xen_safe_halt+0x10/0x20 [10469.336970] [] ? default_idle+0x13/0x20 [10469.336971] [] ? arch_cpu_idle+0xa/0x10 [10469.336973] [] ? default_idle_call+0x2e/0x50 [10469.336974] [] ? cpu_startup_entry+0x256/0x2c0 [10469.336975] [] ? rest_init+0x72/0x80 [10469.336979] [] ? start_kernel+0x410/0x41d [10469.336981] [] ? x86_64_start_reservations+0x2f/0x31 [10469.336983] [] ? xen_start_kernel+0x547/0x553 [10469.336984] Mem-Info: [10469.336989] active_anon:55875 inactive_anon:70482 isolated_anon:0 active_file:71530 inactive_file:77473 isolated_file:0 unevictable:601 dirty:3644 writeback:0 unstable:0 slab_reclaimable:23415 slab_unreclaimable:11066 mapped:7925 shmem:322 pagetables:2756 bounce:0 free:11987 free_pcp:747 free_cma:0 [10469.336993] Node 0 active_anon:223500kB inactive_anon:281928kB active_file:286120kB inactive_file:309892kB unevictable:2404kB isolated(anon):0kB isolated(file):0
Xen-unstable + Linux 4.2-rc4: GPF RIP: e030:[] [] detach_if_pending+0x18/0x80
Hi, Running on Xen testing a 4.2-rc4 kernel it got the crash below. Could this be related to the changes in 3bb475a3446facd0425d3f2fe7e85bf03c5c6c05 ? It crashes dom0 when i put some strain onto the network + bridge. -- Sander [ 2108.078763] general protection fault: [#1] SMP [ 2108.102839] Modules linked in: [ 2108.121598] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc4-20150728-linus-noipv6-doflr+ #1 [ 2108.157188] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 2108.190430] task: 8221a580 ti: 8220 task.ti: 8220 [ 2108.222309] RIP: e030:[] [] detach_if_pending+0x18/0x80 [ 2108.257037] RSP: e02b:88005f603818 EFLAGS: 00010086 [ 2108.282424] RAX: 88005f6cf410 RBX: 8800511a6a60 RCX: dead00200200 [ 2108.313210] RDX: RSI: 88005f60e5c0 RDI: 8800511a6a60 [ 2108.344013] RBP: 88005f603818 R08: 0001 R09: 0001 [ 2108.374795] R10: 0003 R11: 8800511a69c0 R12: [ 2108.405446] R13: 000100098b49 R14: 00015f90 R15: 88005f60e5c0 [ 2108.435784] FS: 7fe7e790d700() GS:88005f60() knlGS: [ 2108.469081] CS: e033 DS: ES: CR0: 8005003b [ 2108.495152] CR2: 017eeca0 CR3: 04c3 CR4: 0660 [ 2108.525520] Stack: [ 2108.540406] 88005f603868 8110edbf 810fb1e1 0200 [ 2108.571288] 0003 8800511a69c0 880004d5d600 [ 2108.602370] 00015f90 88005f603898 819b3ad3 [ 2108.633153] Call Trace: [ 2108.649088] [ 2108.654784] [] mod_timer_pending+0x3f/0xe0 [ 2108.689320] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [ 2108.721899] [] __nf_ct_refresh_acct+0xa3/0xb0 [ 2108.748221] [] tcp_packet+0xb3b/0x1290 [ 2108.772816] [] ? br_forward_finish+0x25/0x80 [ 2108.798706] [] ? irq_to_desc+0x12/0x20 [ 2108.822802] [] ? __local_bh_enable_ip+0x2a/0x90 [ 2108.849145] [] ? __nf_conntrack_find_get+0x129/0x2a0 [ 2108.876796] [] nf_conntrack_in+0x29c/0x7c0 [ 2108.901830] [] ipv4_conntrack_in+0x21/0x30 [ 2108.926819] [] nf_iterate+0x4c/0x80 [ 2108.949863] [] nf_hook_slow+0x64/0xc0 [ 2108.973297] [] br_nf_pre_routing+0x33c/0x350 [ 2108.34] [] ? br_nf_forward_ip+0x3d0/0x3d0 [ 2109.025256] [] nf_iterate+0x4c/0x80 [ 2109.047697] [] nf_hook_slow+0x64/0xc0 [ 2109.070579] [] br_handle_frame+0x190/0x270 [ 2109.094725] [] ? br_handle_local_finish+0x50/0x50 [ 2109.120526] [] ? br_handle_frame_finish+0x4b0/0x4b0 [ 2109.146523] [] __netif_receive_skb_core+0x12b/0x970 [ 2109.172583] [] ? set_foreign_p2m_mapping+0x19d/0x3a0 [ 2109.198956] [] __netif_receive_skb+0x15/0x70 [ 2109.223214] [] netif_receive_skb_internal+0x1e/0x80 [ 2109.249274] [] netif_receive_skb_sk+0xc/0x10 [ 2109.273450] [] xenvif_tx_action+0x6a9/0x830 [ 2109.297488] [] ? rtl8169_poll+0x8d/0x600 [ 2109.320699] [] xenvif_poll+0x29/0x70 [ 2109.342620] [] net_rx_action+0x1f7/0x300 [ 2109.365337] [] __do_softirq+0x103/0x210 [ 2109.387677] [] irq_exit+0x4b/0xa0 [ 2109.408519] [] xen_evtchn_do_upcall+0x34/0x50 [ 2109.432419] [] xen_do_hypervisor_callback+0x1e/0x40 [ 2109.457772] [ 2109.463493] [] ? xen_hypercall_sched_op+0xa/0x20 [ 2109.494059] [] ? xen_hypercall_sched_op+0xa/0x20 [ 2109.518541] [] ? xen_safe_halt+0x10/0x20 [ 2109.540952] [] ? default_idle+0x13/0x20 [ 2109.563055] [] ? arch_cpu_idle+0xa/0x10 [ 2109.585071] [] ? default_idle_call+0x2e/0x50 [ 2109.608257] [] ? cpu_startup_entry+0x272/0x2e0 [ 2109.631954] [] ? rest_init+0x77/0x80 [ 2109.652953] [] ? start_kernel+0x442/0x44f [ 2109.675099] [] ? x86_64_start_reservations+0x2a/0x2c [ 2109.700145] [] ? xen_start_kernel+0x550/0x55c [ 2109.723264] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48 [ 2109.789794] RIP [] detach_if_pending+0x18/0x80 [ 2109.813416] RSP [ 2109.829199] ---[ end trace 042bd0c1a92729d3 ]--- [ 2109.848376] Kernel panic - not syncing: Fatal exception in interrupt [ 2109.872793] Kernel Offset: disabled -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80
Hi, On my box running Xen with a 4.2-rc6 kernel i still get this splat in dom0, which crashes the box. (i reported a similar splat before (at rc4) here, http://www.spinics.net/lists/netdev/msg337570.html) Never seen this one on 4.1, so it seems a regression. -- Sander [81133.193439] general protection fault: [#1] SMP [81133.204284] Modules linked in: [81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.2.0-rc6-20150811-linus-doflr+ #1 [81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [81133.236237] task: 880059b91580 ti: 880059bb4000 task.ti: 880059bb4000 [81133.246808] RIP: e030:[] [] detach_if_pending+0x18/0x80 [81133.257354] RSP: e02b:880059bb7848 EFLAGS: 00010086 [81133.267749] RAX: 88004eddc7f0 RBX: 88000e20ae08 RCX: dead00200200 [81133.278201] RDX: RSI: 88005f60e600 RDI: 88000e20ae08 [81133.288723] RBP: 880059bb7848 R08: 0001 R09: 0001 [81133.298930] R10: 0003 R11: 88000e20ad68 R12: [81133.308875] R13: 000101735569 R14: 00015f90 R15: 88005f60e600 [81133.318845] FS: 7f28c6f7c800() GS:88005f60() knlGS: [81133.328864] CS: e033 DS: ES: CR0: 8005003b [81133.338693] CR2: 807f6800 CR3: 3d55c000 CR4: 0660 [81133.348462] Stack: [81133.358005] 880059bb7898 8110fe3f 810fc261 0200 [81133.367682] 0003 88000e20ad68 88005854d400 [81133.377064] 00015f90 880059bb78c8 819b5243 [81133.386374] Call Trace: [81133.395596] [] mod_timer_pending+0x3f/0xe0 [81133.404999] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [81133.414255] [] __nf_ct_refresh_acct+0xa3/0xb0 [81133.423137] [] tcp_packet+0xb3b/0x1290 [81133.431894] [] ? __local_bh_enable_ip+0x2a/0x90 [81133.440622] [] ? __nf_conntrack_find_get+0x129/0x2a0 [81133.449339] [] nf_conntrack_in+0x29c/0x7c0 [81133.457940] [] ipv4_conntrack_in+0x21/0x30 [81133.466296] [] nf_iterate+0x4c/0x80 [81133.474401] [] nf_hook_slow+0x64/0xc0 [81133.482615] [] ip_rcv+0x2ec/0x380 [81133.490781] [] ? ip_local_deliver_finish+0x130/0x130 [81133.498790] [] __netif_receive_skb_core+0x2a0/0x970 [81133.506714] [] ? inet_gro_receive+0x1c8/0x200 [81133.514609] [] __netif_receive_skb+0x15/0x70 [81133.522333] [] netif_receive_skb_internal+0x1e/0x80 [81133.529840] [] napi_gro_receive+0x6b/0x90 [81133.537173] [] rtl8169_poll+0x2e6/0x600 [81133.54] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [81133.551566] [] net_rx_action+0x1f7/0x300 [81133.558412] [] __do_softirq+0x103/0x210 [81133.565353] [] run_ksoftirqd+0x37/0x60 [81133.572359] [] smpboot_thread_fn+0x130/0x190 [81133.579215] [] ? sort_range+0x20/0x20 [81133.586042] [] kthread+0xee/0x110 [81133.592792] [] ? kthread_create_on_node+0x1b0/0x1b0 [81133.599694] [] ret_from_fork+0x3f/0x70 [81133.606662] [] ? kthread_create_on_node+0x1b0/0x1b0 [81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48 [81133.627196] RIP [] detach_if_pending+0x18/0x80 [81133.634036] RSP [81133.640817] ---[ end trace eaf596e1fcf6a591 ]--- [81133.647521] Kernel panic - not syncing: Fatal exception in interrupt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80
On 2015-08-12 22:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 21:19 +0200, li...@eikelenboom.it wrote: Hi, On my box running Xen with a 4.2-rc6 kernel i still get this splat in dom0, which crashes the box. (i reported a similar splat before (at rc4) here, http://www.spinics.net/lists/netdev/msg337570.html) Never seen this one on 4.1, so it seems a regression. -- Sander [81133.193439] general protection fault: [#1] SMP [81133.204284] Modules linked in: [81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.2.0-rc6-20150811-linus-doflr+ #1 [81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [81133.236237] task: 880059b91580 ti: 880059bb4000 task.ti: 880059bb4000 [81133.246808] RIP: e030:[] [] detach_if_pending+0x18/0x80 [81133.257354] RSP: e02b:880059bb7848 EFLAGS: 00010086 [81133.267749] RAX: 88004eddc7f0 RBX: 88000e20ae08 RCX: dead00200200 [81133.278201] RDX: RSI: 88005f60e600 RDI: 88000e20ae08 [81133.288723] RBP: 880059bb7848 R08: 0001 R09: 0001 [81133.298930] R10: 0003 R11: 88000e20ad68 R12: [81133.308875] R13: 000101735569 R14: 00015f90 R15: 88005f60e600 [81133.318845] FS: 7f28c6f7c800() GS:88005f60() knlGS: [81133.328864] CS: e033 DS: ES: CR0: 8005003b [81133.338693] CR2: 807f6800 CR3: 3d55c000 CR4: 0660 [81133.348462] Stack: [81133.358005] 880059bb7898 8110fe3f 810fc261 0200 [81133.367682] 0003 88000e20ad68 88005854d400 [81133.377064] 00015f90 880059bb78c8 819b5243 [81133.386374] Call Trace: [81133.395596] [] mod_timer_pending+0x3f/0xe0 [81133.404999] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [81133.414255] [] __nf_ct_refresh_acct+0xa3/0xb0 [81133.423137] [] tcp_packet+0xb3b/0x1290 [81133.431894] [] ? __local_bh_enable_ip+0x2a/0x90 [81133.440622] [] ? __nf_conntrack_find_get+0x129/0x2a0 [81133.449339] [] nf_conntrack_in+0x29c/0x7c0 [81133.457940] [] ipv4_conntrack_in+0x21/0x30 [81133.466296] [] nf_iterate+0x4c/0x80 [81133.474401] [] nf_hook_slow+0x64/0xc0 [81133.482615] [] ip_rcv+0x2ec/0x380 [81133.490781] [] ? ip_local_deliver_finish+0x130/0x130 [81133.498790] [] __netif_receive_skb_core+0x2a0/0x970 [81133.506714] [] ? inet_gro_receive+0x1c8/0x200 [81133.514609] [] __netif_receive_skb+0x15/0x70 [81133.522333] [] netif_receive_skb_internal+0x1e/0x80 [81133.529840] [] napi_gro_receive+0x6b/0x90 [81133.537173] [] rtl8169_poll+0x2e6/0x600 [81133.54] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [81133.551566] [] net_rx_action+0x1f7/0x300 [81133.558412] [] __do_softirq+0x103/0x210 [81133.565353] [] run_ksoftirqd+0x37/0x60 [81133.572359] [] smpboot_thread_fn+0x130/0x190 [81133.579215] [] ? sort_range+0x20/0x20 [81133.586042] [] kthread+0xee/0x110 [81133.592792] [] ? kthread_create_on_node+0x1b0/0x1b0 [81133.599694] [] ret_from_fork+0x3f/0x70 [81133.606662] [] ? kthread_create_on_node+0x1b0/0x1b0 [81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48 [81133.627196] RIP [] detach_if_pending+0x18/0x80 [81133.634036] RSP [81133.640817] ---[ end trace eaf596e1fcf6a591 ]--- [81133.647521] Kernel panic - not syncing: Fatal exception in interrupt This looks like the bug fixed in David Miller net tree : http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=2235f2ac75fd2501c251b0b699a9632e80239a6d Will pull the net-tree in and re-test. But since it only seems to crash after a day or two, that will take some time. Thanks, Sander -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/3] vhost_net: access ptr ring using tap recvmsg
From: Prashant Bhole vhost_net needs to peek tun packet sizes to allocate virtio buffers. Currently it directly accesses tap ptr ring to do it. Jason Wang suggested to achieve this using msghdr->msg_control and modifying the behavior of tap recvmsg. This change will be useful in future in case of virtio-net XDP offload. Where packets will be XDP processed in tap recvmsg and vhost will see only non XDP_DROP'ed packets. Patch 1: reorganizes the tun_msg_ctl so that it can be extended by the means of different commands. tap sendmsg recvmsg will behave according to commands. Patch 2: modifies recvmsg implementation to produce packet pointers. vhost_net uses recvmsg API instead of ptr_ring_consume(). Patch 3: removes ptr ring usage in vhost and functions those export ptr ring from tun/tap. Prashant Bhole (3): tuntap: reorganize tun_msg_ctl usage vhost_net: user tap recvmsg api to access ptr ring tuntap: remove usage of ptr ring in vhost_net drivers/net/tap.c | 44 ++- drivers/net/tun.c | 45 +++- drivers/vhost/net.c| 79 ++ include/linux/if_tun.h | 9 +++-- 4 files changed, 103 insertions(+), 74 deletions(-) -- 2.21.0
[PATCH net-next 1/3] tuntap: reorganize tun_msg_ctl usage
From: Prashant Bhole In order to extend the usage of tun_msg_ctl structure, this patch changes the member name from type to cmd. Also following definitions are changed: TUN_MSG_PTR : TUN_CMD_BATCH TUN_MSG_UBUF: TUN_CMD_PACKET Signed-off-by: Prashant Bhole --- drivers/net/tap.c | 9 ++--- drivers/net/tun.c | 8 ++-- drivers/vhost/net.c| 4 ++-- include/linux/if_tun.h | 6 +++--- 4 files changed, 17 insertions(+), 10 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 3ae70c7e6860..01bd260ce60c 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1213,9 +1213,10 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m, struct tap_queue *q = container_of(sock, struct tap_queue, sock); struct tun_msg_ctl *ctl = m->msg_control; struct xdp_buff *xdp; + void *ptr = NULL; int i; - if (ctl && (ctl->type == TUN_MSG_PTR)) { + if (ctl && ctl->cmd == TUN_CMD_BATCH) { for (i = 0; i < ctl->num; i++) { xdp = &((struct xdp_buff *)ctl->ptr)[i]; tap_get_user_xdp(q, xdp); @@ -1223,8 +1224,10 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m, return 0; } - return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter, - m->msg_flags & MSG_DONTWAIT); + if (ctl && ctl->cmd == TUN_CMD_PACKET) + ptr = ctl->ptr; + + return tap_get_user(q, ptr, &m->msg_iter, m->msg_flags & MSG_DONTWAIT); } static int tap_recvmsg(struct socket *sock, struct msghdr *m, diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 0413d182d782..29711671959b 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2529,11 +2529,12 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) struct tun_struct *tun = tun_get(tfile); struct tun_msg_ctl *ctl = m->msg_control; struct xdp_buff *xdp; + void *ptr = NULL; if (!tun) return -EBADFD; - if (ctl && (ctl->type == TUN_MSG_PTR)) { + if (ctl && ctl->cmd == TUN_CMD_BATCH) { struct tun_page tpage; int n = ctl->num; int flush = 0; @@ -2560,7 +2561,10 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) goto out; } - ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter, + if (ctl && ctl->cmd == TUN_CMD_PACKET) + ptr = ctl->ptr; + + ret = tun_get_user(tun, tfile, ptr, &m->msg_iter, m->msg_flags & MSG_DONTWAIT, m->msg_flags & MSG_MORE); out: diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 1a2dd53caade..5946d2775bd0 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -462,7 +462,7 @@ static void vhost_tx_batch(struct vhost_net *net, struct msghdr *msghdr) { struct tun_msg_ctl ctl = { - .type = TUN_MSG_PTR, + .cmd = TUN_CMD_BATCH, .num = nvq->batched_xdp, .ptr = nvq->xdp, }; @@ -902,7 +902,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) ubuf->desc = nvq->upend_idx; refcount_set(&ubuf->refcnt, 1); msg.msg_control = &ctl; - ctl.type = TUN_MSG_UBUF; + ctl.cmd = TUN_CMD_PACKET; ctl.ptr = ubuf; msg.msg_controllen = sizeof(ctl); ubufs = nvq->ubufs; diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index 5bda8cf457b6..bdfa671612db 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -11,10 +11,10 @@ #define TUN_XDP_FLAG 0x1UL -#define TUN_MSG_UBUF 1 -#define TUN_MSG_PTR 2 +#define TUN_CMD_PACKET 1 +#define TUN_CMD_BATCH 2 struct tun_msg_ctl { - unsigned short type; + unsigned short cmd; unsigned short num; void *ptr; }; -- 2.21.0
[PATCH net-next 2/3] vhost_net: user tap recvmsg api to access ptr ring
From: Prashant Bhole Currently vhost_net directly accesses ptr ring of tap driver to fetch Rx packet pointers. In order to avoid it this patch modifies tap driver's recvmsg api to do additional task of fetching Rx packet pointers. A special struct tun_msg_ctl is already being usedd via msg_control for tun Rx XDP batching. This patch extends tun_msg_ctl usage to send sub commands to recvmsg api. recvmsg can now produce/unproduce pointers from ptr ring as an additional task. This will be useful in future in implementation of virtio-net XDP offload feature. Where packets will be XDP batch processed in tun_recvmsg. Signed-off-by: Prashant Bhole --- drivers/net/tap.c | 22 +++- drivers/net/tun.c | 24 +- drivers/vhost/net.c| 46 +- include/linux/if_tun.h | 3 +++ 4 files changed, 83 insertions(+), 12 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 01bd260ce60c..3d0bf382dbbc 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1234,8 +1234,28 @@ static int tap_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, int flags) { struct tap_queue *q = container_of(sock, struct tap_queue, sock); - struct sk_buff *skb = m->msg_control; + struct tun_msg_ctl *ctl = m->msg_control; + struct sk_buff *skb = NULL; int ret; + + if (ctl) { + switch (ctl->cmd) { + case TUN_CMD_PACKET: + skb = ctl->ptr; + break; + case TUN_CMD_PRODUCE_PTRS: + return ptr_ring_consume_batched(&q->ring, + ctl->ptr_array, + ctl->num); + case TUN_CMD_UNPRODUCE_PTRS: + ptr_ring_unconsume(&q->ring, ctl->ptr_array, ctl->num, + tun_ptr_free); + return 0; + default: + return -EINVAL; + } + } + if (flags & ~(MSG_DONTWAIT|MSG_TRUNC)) { kfree_skb(skb); return -EINVAL; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 29711671959b..7d4886f53389 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2577,7 +2577,8 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, { struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = tun_get(tfile); - void *ptr = m->msg_control; + struct tun_msg_ctl *ctl = m->msg_control; + void *ptr = NULL; int ret; if (!tun) { @@ -2585,6 +2586,27 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, goto out_free; } + if (ctl) { + switch (ctl->cmd) { + case TUN_CMD_PACKET: + ptr = ctl->ptr; + break; + case TUN_CMD_PRODUCE_PTRS: + ret = ptr_ring_consume_batched(&tfile->tx_ring, + ctl->ptr_array, + ctl->num); + goto out; + case TUN_CMD_UNPRODUCE_PTRS: + ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr_array, + ctl->num, tun_ptr_free); + ret = 0; + goto out; + default: + ret = -EINVAL; + goto out_put_tun; + } + } + if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) { ret = -EINVAL; goto out_put_tun; diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 5946d2775bd0..5e5c1063606c 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -175,24 +175,44 @@ static void *vhost_net_buf_consume(struct vhost_net_buf *rxq) static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq) { + struct vhost_virtqueue *vq = &nvq->vq; + struct socket *sock = vq->private_data; struct vhost_net_buf *rxq = &nvq->rxq; + struct tun_msg_ctl ctl = { + .cmd = TUN_CMD_PRODUCE_PTRS, + .ptr_array = rxq->queue, + .num = VHOST_NET_BATCH, + }; + struct msghdr msg = { + .msg_control = &ctl, + }; rxq->head = 0; - rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue, - VHOST_NET_BATCH); + rxq->tail = sock->ops->recvmsg(sock, &msg, 0, 0); + if (rxq->tail < 0) +
[PATCH net-next 3/3] tuntap: remove usage of ptr ring in vhost_net
From: Prashant Bhole Remove usage of ptr ring of tuntap in vhost_net and remove the functions exported from tuntap drivers to get ptr ring. Signed-off-by: Prashant Bhole --- drivers/net/tap.c | 13 - drivers/net/tun.c | 13 - drivers/vhost/net.c | 31 --- 3 files changed, 4 insertions(+), 53 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 3d0bf382dbbc..27ffd2210375 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1298,19 +1298,6 @@ struct socket *tap_get_socket(struct file *file) } EXPORT_SYMBOL_GPL(tap_get_socket); -struct ptr_ring *tap_get_ptr_ring(struct file *file) -{ - struct tap_queue *q; - - if (file->f_op != &tap_fops) - return ERR_PTR(-EINVAL); - q = file->private_data; - if (!q) - return ERR_PTR(-EBADFD); - return &q->ring; -} -EXPORT_SYMBOL_GPL(tap_get_ptr_ring); - int tap_queue_resize(struct tap_dev *tap) { struct net_device *dev = tap->dev; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 7d4886f53389..75893921411b 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -3750,19 +3750,6 @@ struct socket *tun_get_socket(struct file *file) } EXPORT_SYMBOL_GPL(tun_get_socket); -struct ptr_ring *tun_get_tx_ring(struct file *file) -{ - struct tun_file *tfile; - - if (file->f_op != &tun_fops) - return ERR_PTR(-EINVAL); - tfile = file->private_data; - if (!tfile) - return ERR_PTR(-EBADFD); - return &tfile->tx_ring; -} -EXPORT_SYMBOL_GPL(tun_get_tx_ring); - module_init(tun_init); module_exit(tun_cleanup); MODULE_DESCRIPTION(DRV_DESCRIPTION); diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 5e5c1063606c..0d302efadf44 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -122,7 +122,6 @@ struct vhost_net_virtqueue { /* Reference counting for outstanding ubufs. * Protected by vq mutex. Writers must also take device mutex. */ struct vhost_net_ubuf_ref *ubufs; - struct ptr_ring *rx_ring; struct vhost_net_buf rxq; /* Batched XDP buffs */ struct xdp_buff *xdp; @@ -997,8 +996,9 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk) int len = 0; unsigned long flags; - if (rvq->rx_ring) - return vhost_net_buf_peek(rvq); + len = vhost_net_buf_peek(rvq); + if (len) + return len; spin_lock_irqsave(&sk->sk_receive_queue.lock, flags); head = skb_peek(&sk->sk_receive_queue); @@ -1189,7 +1189,7 @@ static void handle_rx(struct vhost_net *net) goto out; } busyloop_intr = false; - if (nvq->rx_ring) { + if (!vhost_net_buf_is_empty(&nvq->rxq)) { ctl.cmd = TUN_CMD_PACKET; ctl.ptr = vhost_net_buf_consume(&nvq->rxq); msg.msg_control = &ctl; @@ -1345,7 +1345,6 @@ static int vhost_net_open(struct inode *inode, struct file *f) n->vqs[i].batched_xdp = 0; n->vqs[i].vhost_hlen = 0; n->vqs[i].sock_hlen = 0; - n->vqs[i].rx_ring = NULL; vhost_net_buf_init(&n->vqs[i].rxq); } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, @@ -1374,7 +1373,6 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n, vhost_net_disable_vq(n, vq); vq->private_data = NULL; vhost_net_buf_unproduce(nvq); - nvq->rx_ring = NULL; mutex_unlock(&vq->mutex); return sock; } @@ -1470,25 +1468,6 @@ static struct socket *get_raw_socket(int fd) return ERR_PTR(r); } -static struct ptr_ring *get_tap_ptr_ring(int fd) -{ - struct ptr_ring *ring; - struct file *file = fget(fd); - - if (!file) - return NULL; - ring = tun_get_tx_ring(file); - if (!IS_ERR(ring)) - goto out; - ring = tap_get_ptr_ring(file); - if (!IS_ERR(ring)) - goto out; - ring = NULL; -out: - fput(file); - return ring; -} - static struct socket *get_tap_socket(int fd) { struct file *file = fget(fd); @@ -1572,8 +1551,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) r = vhost_net_enable_vq(n, vq); if (r) goto err_used; - if (index == VHOST_NET_VQ_RX) - nvq->rx_ring = get_tap_ptr_ring(fd); oldubufs = nvq->ubufs; nvq->ubufs = ubufs; -- 2.21.0
Re: The future of the TI ACX wireless driver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Nice work David. I have extracted the acxsm driver out of the wireless-2.6 git tree and made it compile on at least kernel 2.6.20 and 2.6.22-rc4 It can be found here: http://www.hauke-m.de/fileadmin/acx/tiacx-20070522.tar.bz2 I also chanced some lines in the normal acx driver so it compiles with kenrel 2.6.22 and later http://www.hauke-m.de/fileadmin/acx/acx-20070610.tar.bz2 Bouth are loading on my system, but I can't test them because I haven't got any acx100 chip based card any more, it would be nice to get positive feedback, when it's working. I will write some pages into the Wiki for these versions, so everyone can test them. I am not a big kernel hacker but I want to learn it too. Hauke Mehrtens -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGbC+UrcX0gpXFjnsRAiZvAJ9HqeLhYD71ziIY8nn1/s165IgW9wCeOkta jcvdcupbincXF0WpXVQse5s= =1Dty -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Bonding : Monitoring of 4965 wireless card
Hi, I want to make a bond with my wireless card. The ipw driver create two interfaces (wlan0 and wmaster0). When i switch the rf_kill button, ifplug detect wlan0 unplugged but not wmaster0. If i down wlan0 (while rf_kil ), bonding detect the inactivity when i up the interface. Have you some idea where is the problem? the driver or the miimon of the module? my module parameters mode=1 miimon=100 primary eth0 Thanks _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re : Bonding : Monitoring of 4965 wireless card
I'm doing a bonding with my eth0(e1000 driver) and my wlan card(iwl4965). It work like i want, when i'm in wifi the dhcp give me my ethernet adress. When i unplug the cable, my wlan card become in charge of network. My problem is when i disconnect the wlan card, the bonding does not detect it correctly, and ifplugstatus show me wlan0 not connected and wmaster0 connected!! The bonding module does not say no active interface, it work like wlan is on. Am i clear? Ps:(sorry i have trouble with my mail) - Message d'origine De : John W. Linville <[EMAIL PROTECTED]> À : [EMAIL PROTECTED] Cc : netdev@vger.kernel.org Envoyé le : Mercredi, 9 Janvier 2008, 18h02mn 05s Objet : Re: Bonding : Monitoring of 4965 wireless card On Wed, Jan 09, 2008 at 09:00:05AM +, [EMAIL PROTECTED] wrote: > Hi, > > I want to make a bond with my wireless card. The ipw driver create two > interfaces (wlan0 and wmaster0). When i switch the rf_kill button, > ifplug detect wlan0 unplugged but not wmaster0. If i down wlan0 (while > rf_kil ), bonding detect the inactivity when i up the interface. > > Have you some idea where is the problem? the driver or the miimon of > the module? > > my module parameters mode=1 miimon=100 primary eth0 I'm not sure I understand your description...what are you trying to do? How exactly is it failing? John -- John W. Linville [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re : Re : Bonding : Monitoring of 4965 wireless card
I ignore it, but it seems like it prevent bonding detect link of wlan0. I enslave wlan0 and i already use use_carrier=1; I use bond to have my etherenet ip in wifi at office, else the wireless connection give temporary and you must pass through proxy then. I'll try arp monitoring but this is annoying i c'ant test localhost. Is there a way to test localhost with arp, without pass through lo ? - Message d'origine De : John W. Linville <[EMAIL PROTECTED]> À : [EMAIL PROTECTED] Envoyé le : Mercredi, 9 Janvier 2008, 21h24mn 10s Objet : Re: Re : Bonding : Monitoring of 4965 wireless card On Wed, Jan 09, 2008 at 07:31:37PM +, [EMAIL PROTECTED] wrote: > I'm doing a bonding with my eth0(e1000 driver) and my wlan > card(iwl4965). It work like i want, when i'm in wifi the dhcp give > me my ethernet adress. When i unplug the cable, my wlan card become > in charge of network. My problem is when i disconnect the wlan card, > the bonding does not detect it correctly, and ifplugstatus show me > wlan0 not connected and wmaster0 connected!! The bonding module does > not say no active interface, it work like wlan is on. > > Am i clear? Yes, that is much more clear to me. What (if anything) are you doing to wmaster0? You should just ignore it. FWIW, miimon is not going to work with a mac80211-based device at this time. The miimon option relies on support for either miitool or ethtool, and mac80211 device support neither of those. Hmmm...it looks like there is a use_carrier option for miimon. Based on its description I would think it would work. Of course, I think it is supposed to be the default and you don't seem to be disabling it. So, I'm not sure what is happening. Are you enslaving wlan0? Or wmaster0? Make sure it is wlan0. Also, please add use_carrier=1 to your bonding module options. Does this change the behaviour? If not, please open a bug at either bugzilla.redhat.com (if you are a Fedora, RHEL, or even CentOS user) or bugzilla.kernel.org (otherwise). In the meantime, you might try using NetworkManger. Or you might consider using ARP monitoring. The former probably is the best solution if you are mobile (e.g. at a cafe or other hotspot) while the latter might be appropriate if you are just plugging and un-plugging within the same network (like at home or office). Hth! John -- John W. Linville [EMAIL PROTECTED] _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re : Re : Re : Bonding : Monitoring of 4965 wireless card
I mean that instead of arp test an ip in lan or else, i want it to test 127.0.0.1 but in order to do this it must go out and re-enter and then use wlan0 to go out. - Message d'origine De : Jay Vosburgh <[EMAIL PROTECTED]> À : [EMAIL PROTECTED] Cc : John W. Linville <[EMAIL PROTECTED]>; netdev@vger.kernel.org Envoyé le : Mercredi, 9 Janvier 2008, 22h36mn 00s Objet : Re: Re : Re : Bonding : Monitoring of 4965 wireless card [EMAIL PROTECTED] wrote: >I ignore it, but it seems like it prevent bonding detect link of wlan0. I enslave wlan0 and i already use use_carrier=1; The default for bonding is use_carrier=1, which makes bonding use the device driver's netif_carrier_on/off state for link detection. Bonding only checks via ethtool/mii if use_carrier=0. >I'll try arp monitoring but this is annoying i c'ant test localhost. Is there a way to test localhost with arp, without pass through lo ? What do you mean by "test localhost with arp, without pass through lo"? ARP monitoring issues probes (ARPs) to a remote destination to confirm that there is connectivity; I'm not sure what localhost has to do with it. In general, though, I have not tested bonding with wireless adapters, so I'm unfamiliar with how well it does or does not work. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card
Yes it's what i'm looking for. I don't understand how to change the arp_ip_target with the gateway, arp_ip_target is a module option. - Message d'origine De : Jay Vosburgh <[EMAIL PROTECTED]> À : [EMAIL PROTECTED] Cc : netdev@vger.kernel.org Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card [EMAIL PROTECTED] wrote: >I mean that instead of arp test an ip in lan or else, i want it to test 127.0.0.1 but in order to do this it must go out and re-enter and then use wlan0 to go out. In other words, what I think you're saying (and I'm not entirely sure here) is that you want probes to go to a remote node on the network, and back, without having to actually know the identity of the remote node (because, presumably, on a roaming type of wireless configuration, your gateway and whatnot can change from time to time). Is that what you're looking for? That isn't available now, but might be straightforward to plug into the address update system to keep the arp_ip_target up to date as the current gateway as the gateway changes. I haven't looked into the details of doing that, but in theory it sounds straightforward. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] > >- Message d'origine >De : Jay Vosburgh <[EMAIL PROTECTED]> >À : [EMAIL PROTECTED] >Cc : John W. Linville <[EMAIL PROTECTED]>; netdev@vger.kernel.org >Envoyé le : Mercredi, 9 Janvier 2008, 22h36mn 00s >Objet : Re: Re : Re : Bonding : Monitoring of 4965 wireless card > >[EMAIL PROTECTED] wrote: > >>I ignore it, but it seems like it prevent bonding detect link of > wlan0. I enslave wlan0 and i already use use_carrier=1; > >The default for bonding is use_carrier=1, which makes bonding >use the device driver's netif_carrier_on/off state for link detection. >Bonding only checks via ethtool/mii if use_carrier=0. > >>I'll try arp monitoring but this is annoying i c'ant test localhost. > Is there a way to test localhost with arp, without pass through lo ? > >What do you mean by "test localhost with arp, without pass >through lo"? ARP monitoring issues probes (ARPs) to a remote >destination to confirm that there is connectivity; I'm not sure what >localhost has to do with it. > >In general, though, I have not tested bonding with wireless >adapters, so I'm unfamiliar with how well it does or does not work. > >-J > >--- >-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] > > > > > > _ >Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr >-- >To unsubscribe from this list: send the line "unsubscribe netdev" in >the body of a message to [EMAIL PROTECTED] >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re : Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card
I try arp monitoring but it doesn't work! Test an ip, the interface must have an address, and the dhcpcd is launch by ifplugd if bond0 is linked ... so it goes around in circles. So i return to miimon, and i figured out that bond detect when wlan0 is associated and set it active interface. But when i switch rf_kill it don't react. So i try to deassociate and magic it detect interface off!! I presume it is a bug of the wlan driver which not re-initialise the info on the wlan. So i made a small script in acpi to provide that behavior. - Message d'origine De : Jay Vosburgh <[EMAIL PROTECTED]> À : [EMAIL PROTECTED] Cc : netdev@vger.kernel.org Envoyé le : Jeudi, 10 Janvier 2008, 21h59mn 20s Objet : Re: Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card [EMAIL PROTECTED] wrote: >Yes it's what i'm looking for. I don't understand how to change the arp_ip_target with the gateway, arp_ip_target is a module option. If you're running a relatively recent bonding driver (version 3.0.0 or later), the arp_ip_targets can be changed on the fly via sysfs, e.g., echo +10.0.0.1 > /sys/class/net/bond0/bonding/arp_ip_target echo -20.0.0.1 > /sys/class/net/bond0/bonding/arp_ip_target You can check out Documentation/networking/bonding.txt (in the kernel source code) for more details. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] >- Message d'origine >De : Jay Vosburgh <[EMAIL PROTECTED]> >À : [EMAIL PROTECTED] >Cc : netdev@vger.kernel.org >Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s >Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card > >[EMAIL PROTECTED] wrote: > >>I mean that instead of arp test an ip in lan or else, i want it to > test 127.0.0.1 but in order to do this it must go out and re-enter and > then use wlan0 to go out. > >In other words, what I think you're saying (and I'm not entirely >sure here) is that you want probes to go to a remote node on the >network, and back, without having to actually know the identity of the >remote node (because, presumably, on a roaming type of wireless >configuration, your gateway and whatnot can change from time to time). > >Is that what you're looking for? > >That isn't available now, but might be straightforward to plug >into the address update system to keep the arp_ip_target up to date as >the current gateway as the gateway changes. I haven't looked into the >details of doing that, but in theory it sounds straightforward. > >-J > >--- >-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
problem with DMA when writting driver for rtl8139?
hi, everybody my name is Mariusz, I am newbie to linux kernel, For several weeks I have been writing kernel driver for network card based on rtl8139c chip. I am writing this driver for micrococontrollers technology course in my university I have some problems with DMA, i suppose. there is a bit in Transmit Status Descriptor of RTL8139c which after clearing(It must be cleared to start transmit operation) shouldb be placed in 1 state - which according to RTL8139 specification means that DMA copy from memory to internal RTL fifo has finished. The problem is: rtl doesn't clear this bit I use pci_map_single to map address of packet buffer to dma capable memory, then cpu_to_le32 to get physicall address of this buffer. Do you have any idea what may work wrong? here is my code: 1) rtlmodule contains functions related to initialization issues 2) rtlopen contains functions related to open device issues and interrupt handling 3) rtltransmit contains functions related to transmision issues and here is output from kernel best regards, Mariusz Witosz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dsa: mv88e6xxx not receiving IPv6 multicast packets
MV88E6085 switch not passing IPv6 multicast packets to CPU. Seems to be related to interface not being in promiscuous mode. This issue has been ongoing since at least July 2020. Latest v5.10-rc3 still suffers the issue on a Turris Mox with mv88e6085. We've not been able to reproduce it on the Turris v4.14 stable kernel series so it appears to be a regression. Mox is using Debian 10 Buster. First identified due to DHCPv6 leases not being renewed on clients being served by isc-dhcp-server on the Mox. Analysis showed the client IPv6 multicast solicit packets were being received by the Mox hardware (proved via a mirror port on a managed LAN switch) but the CPU was not receiving them (observed using tcpdump). Further investigation has identified this also affects IPv6 neighbour discovery for clients when not using frequent RAs from the Mox. Currently we've found two reproducible scenarios: 1) with isc-dhcp-server configured with very short lease times (180 seconds). After mox reboot (or systemctl restart systemd-networkd) clients successfully obtain a lease and a couple of RENEWs (requested after 90 seconds) but then all goes silent, Mox OS no longer sees the IPv6 multicast RENEW packets and client leases expire. 2) Immediately after reboot when DHCPv6 renewals are still possible if on the Mox we do "tcdump -ni eth1 ip6" and immediately terminate, tcpdump takes the interface out of promiscuous mode and IPv6 multicast packets immediately cease to be received by the CPU. If we use 'tcpdump --no-promiscuous-mode ..." so on termination it doesn't try to take the interface out of promiscuous mode IPv6 multicast packets continue to be seen by the CPU. We've been pointed to the mv8e6xxx_dump tool and can capture data but not sure what specifically to look for. We've also added some pr_info() debugging into mvneta to analyse when promiscuous mode is enabled or disabled since this seems to be strongly related to the issue. We believe there's a big clue in being able to reset the issue by restarting systemd-networkd on the Mox. We've looked for but not found any clues or indications of services on the Mox causing this but aren't ruling this out.
Re: dsa: mv88e6xxx not receiving IPv6 multicast packets
On 14/11/2020 15:56, Andrew Lunn wrote: >> 1) with isc-dhcp-server configured with very short lease times (180 >> seconds). After mox reboot (or systemctl restart systemd-networkd) >> clients successfully obtain a lease and a couple of RENEWs (requested >> after 90 seconds) but then all goes silent, Mox OS no longer sees the >> IPv6 multicast RENEW packets and client leases expire. > > So it takes about 3 minutes to reproduce this? > > Can you do a git bisect to figure out which change broke it? It will > take you maybe 5 minutes per step, and given the wide range of > kernels, i'm guessing you need around 15 steps. So maybe two hours of > work. > > Andrew > I'll check if we can - the problem might be the Turris Mox kernel is based on a board support package drop by Marvell so I'm not clear right now how divergent they are. Hopefully the Turris kernel devs can help on that.
Re: dsa: mv88e6xxx not receiving IPv6 multicast packets
On 14/11/2020 18:49, Vladimir Oltean wrote: > On Sat, Nov 14, 2020 at 03:39:28PM +, Tj (Elloe Linux) wrote: >> MV88E6085 switch not passing IPv6 multicast packets to CPU. > Is there a simple step-by-step reproducer for the issue, that doesn't > require a lot of configuration? I've got a Mox with the 6190 switch > running net-next and Buildroot that I could try on. Our set-up is Mox A (CPU) + G (mini-PCIe) + F (4x USB 3.0) + 3 x E (8 port Marvell switch) + D (SFP cage) Whilst working on this we've moved one of the E modules to another A CPU in our lab so as not to mess with the gateway. Running Debian 10, using systemd-networkd, which configures: eth0 (WAN) static IPv4 and IPv6 - DHCP=no eth1 (uplink to the switch ports) DHCP=no lan1 (connected to external managed switch) Bridge=br-lan br-lan static IPv4 and IPv6, Kind=bridge, IPForward=true Whilst we're working on this issue only lan1 is connected to anything external; a 48-port managed switch. No connection to SPF either. We assign an IPv6 from our delegated /48 prefix to br-lan and have isc-dhcp-server configured on a very short lease (180 seconds) to issue leases. On a LAN client we request a lease using: dhclient -d -6 wlp4s0 Usually, if this is started just after the Mox systemd-networkd was restarted, it'll manage to obtain and then renew a lease about 3 or 4 times. These will show up in the Mox logs too. At some point, with absolutely nothing showing in any Mox log in the meantime, additional renewals will fail. We later noticed that after this happens sometime later clients on the network lose IPv6 connectivity to the Mox because neighbour discovery is also failing - took us a while to spot this because the Mox occasionally sends RAs at which point the clients can talk to the Mox again. The symptom here was unexplained random-length 'hangs' of SSH sessions to the Mox that would affect LAN clients only when the neighbour table entry had expired. I'm trying to create a very small reproducer root file-system on the lab Mox. Right now I've not been able to reproduce it on the lab unit even with a clone of the gateway Mox's micro SD-card, but that seems due to it failing to complete a regular boot - hence creating a fresh root file-system.
Re: dsa: mv88e6xxx not receiving IPv6 multicast packets
[On 15/11/2020 16:02, Andrew Lunn wrote: > What might be interesting is running > > ip monitor > > and > > bridge monitor > > Look for neighbours being timed out do to inactivity. Funny you write that! This afternoon I've narrowed it down although I still don't understand the 'why'. Watching on the 'good' (lab) and 'bad' (gateway) Mox devices I noticed that: # bridge -d -s mdb show 23: br-lan br-lan ff02::2 temp 257.05 23: br-lan br-lan ff05::2 temp 257.05 23: br-lan br-lan ff02::6a temp 257.05 23: br-lan br-lan ff02::1:ff77:2b20 temp 257.05 23: br-lan br-lan ff02::1:ff00: temp 257.05 23: br-lan br-lan ff02::fb temp 257.05 23: br-lan br-lan ff02::1:ff00:0 temp 257.05 23: br-lan br-lan ff02::1:2 temp 257.05 23: br-lan br-lan ff05::1:3 temp 257.05 indicates that the entries time out on 'bad' but are reset to a high value on 'good' # bridge monitor on 'bad' reported: Deleted Deleted 23: br-lan br-lan ff02::2 temp Deleted Deleted 23: br-lan br-lan ff05::2 temp Deleted Deleted 23: br-lan br-lan ff02::6a temp Deleted Deleted 23: br-lan br-lan ff02::1:ff77:2b20 temp Deleted Deleted 23: br-lan br-lan ff02::1:ff00: temp Deleted Deleted 23: br-lan br-lan ff02::fb temp Deleted Deleted 23: br-lan br-lan ff02::1:ff00:0 temp Deleted Deleted 23: br-lan br-lan ff02::1:2 temp Deleted Deleted 23: br-lan br-lan ff05::1:3 temp On the laptop I'm testing from (tcpdump always on the laptop): Using tcpdump I *think* enp2s0 (wired link direct into lan1 on 'good') always showed the laptop sending multicast listener report v2 packets on a regular cadence of about 60-100 seconds as well as the DHCPv6 solicit/renews and that cadence matched when the timers on the output of "bridge -d -s mdb show" reset to approximately 258. But for wlp4s0 (wifi to 'bad') the DHCPv6 solicit/renew didn't seem to be accompanied by multicast listener reports and the mdb timers expired. I need to re-affirm that tomorrow because I've got slightly lost attempting to compare multiple aspects on both 'good' and 'bad' and seem to be seeing inconsistent results. On the laptops we are using Xubuntu 20.04 amd64 with NetworkManager. I'll try to test from a range of different devices tomorrow in case this is only affecting staff laptops. Many thanks for the pointers.
Re: dsa: mv88e6xxx not receiving IPv6 multicast packets
On 15/11/2020 17:27, Andrew Lunn wrote: > So check if you have an IGMP querier in the network. If not, try > turning it on in the bridge, > > ip link set br0 type bridge mcast_querier 1 Thanks Andrew - that does indeed seem to have solved the issue. I'm relieved this isn't a hardware or driver issue after all but annoyed we didn't figure this out ourselves months ago! Is there any other kernel 'nob' to alter this? I'm trying to understand why we're seeing two different results with seemingly identical kernel/OS versions and network configurations.
Re: dsa: mv88e6xxx not receiving IPv6 multicast packets
On 15/11/2020 17:27, Andrew Lunn wrote: > So check if you have an IGMP querier in the network. If not, try > turning it on in the bridge, > > ip link set br0 type bridge mcast_querier Thankfully it turns out this is totally unrelated to Linux - our TP-Link Jetstream T1600G-PS has some unfortunate default behaviour and a bug. Specifically, we are operating an IPv6-only network and Layer 2 MLD snooping was enabled and set to forward unknown multicast groups and as such the switch should be broadcasting all multicast packets. However, buried in the TP-Link manual there's a note that says: "Note: IGMP Snooping and MLD Snooping share the setting of Unknown Multicast Groups, so you have to enable IGMP Snooping globally on the L2 FEATURES > Multicast > IGMP Snooping > Global Config page at the same time." We hadn't enabled IGMP snooping since we don't use IPv4! Many thanks for the help resolving this and apologies for mis-reporting it.
kernel BUG at net/core/skbuff.c:109!
On a recent build (5.10.0) we've seen several hard-to-pinpoint complete lock-ups requiring power-off restarts. Today we found a small clue in the kernel log but unfortunately the complete backtrace wasn't captured (presumably system froze before log could be flushed) but I thought I should share it for investigation. kernel BUG at net/core/skbuff.c:109! kernel: skbuff: skb_under_panic: text:c103c622 len:1228 put:48 head:a00202858000 data:a00202857ff2 tail:0x4be end:0x6c0 dev:wlp4s0 kernel: [ cut here ] kernel: kernel BUG at net/core/skbuff.c:109! Obviously this ought not to happen and we'd like to discover the cause. Whilst writing this report it happened again. Checking the logs we see three instances of the BUG none of which capture a stack trace: Jan 27 Feb 03 #1 Feb 03 #2 The only slight clue may be a k3s service that we were unaware was constantly restarting and had reached 26,636 iterations just before the Feb 03 #1 BUG. However, we removed k3s immediately after and there were no similar clues 20 minutes later for the Feb 03 #2 BUG. Feb 03 11:11:13 elloe001 k3s[1209978]: time="2021-02-03T11:11:13.452745479Z" level=fatal msg="starting kubernetes: preparing server: start cluster and https: listen tcp 10.1.2.1:6443: bind: cannot assign requested address" Feb 03 11:11:13 elloe001 systemd[1]: k3s-main.service: Main process exited, code=exited, status=1/FAILURE Feb 03 11:11:13 elloe001 systemd[1]: k3s-main.service: Failed with result 'exit-code'. Feb 03 11:11:13 elloe001 systemd[1]: Failed to start Lightweight Kubernetes. Feb 03 11:11:18 elloe001 systemd[1]: k3s-dev.service: Scheduled restart job, restart counter is at 26636. Feb 03 11:11:18 elloe001 systemd[1]: k3s-main.service: Scheduled restart job, restart counter is at 26636. Feb 03 11:11:18 elloe001 systemd[1]: Stopped Lightweight Kubernetes. Feb 03 11:11:18 elloe001 systemd[1]: Starting Lightweight Kubernetes... Feb 03 11:11:18 elloe001 systemd[1]: Stopped Lightweight Kubernetes. Feb 03 11:11:18 elloe001 systemd[1]: Starting Lightweight Kubernetes... We don't think this is hardware related as we have several identical Lenovo E495 laptops and they have never suffered this. We don't know of any way to reproduce it at will.
Re: dsa: mv88e6xxx losing DHCPv6 solicit packets / IPv6 multicast packets?
> As another thought do you know what DHCPv6 client/server is being > used. > There was a fairly recent bugfix for busybox that was needed because >the v6 code was using the wrong MAC address. I'm the customer experiencing this issue. It appears unrelated to the DHCP server software. On the Turris Mox with Debian 10 we have isc-dhcp-server 4.4.1-2. Clients are Xubuntu 20.04 withNetworkManager 1.22.10-1ubuntu2.1 using isc-dclient 4.4.1-2.1ubuntu5. Quoting from another email I sent to Turris: We've now done more testing and CONFIRMED the Mox is losing DHCPv6 solicit packets. Specifically, it seems the 88E6190 hardware switches in the Peridot module is swallowing IPv6 multicast packets (sent to ff02::1:2 ). We tested this by mirroring the Mox LAN port on an external switch and saw the DHCPv6 solicit packet egress the switch but the Mox kernel didn't see it ingress (using tcpdump).
Re: [Bridge] [PATCH] bridge: flush forwarding table when device carrier off
Is this patch submitted into kernel tree? What version of kernel will have this patch applied (thinking on 2.6.x and 2.4.x branchs)? Thanks El Jue, 12 de Octubre de 2006, 20:24, Stephen Hemminger escribió: > Flush the forwarding table when carrier is lost. This helps for > availability because we don't want to forward to a downed device and > new packets may come in on other links. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > --- > net/bridge/br_fdb.c |7 ++- > net/bridge/br_if.c |4 ++-- > net/bridge/br_private.h |2 +- > net/bridge/br_stp_if.c |2 ++ > 4 files changed, 11 insertions(+), 4 deletions(-) > > --- bridge.orig/net/bridge/br_fdb.c > +++ bridge/net/bridge/br_fdb.c > @@ -128,7 +128,10 @@ void br_fdb_cleanup(unsigned long _data) > mod_timer(&br->gc_timer, jiffies + HZ/10); > } > > -void br_fdb_delete_by_port(struct net_bridge *br, struct net_bridge_port > *p) > + > +void br_fdb_delete_by_port(struct net_bridge *br, > +const struct net_bridge_port *p, > +int do_all) > { > int i; > > @@ -142,6 +145,8 @@ void br_fdb_delete_by_port(struct net_br > if (f->dst != p) > continue; > > + if (f->is_static & !do_all) > + continue; > /* >* if multiple ports all have the same device address >* then when one port is deleted, assign > --- bridge.orig/net/bridge/br_if.c > +++ bridge/net/bridge/br_if.c > @@ -163,7 +163,7 @@ static void del_nbp(struct net_bridge_po > br_stp_disable_port(p); > spin_unlock_bh(&br->lock); > > - br_fdb_delete_by_port(br, p); > + br_fdb_delete_by_port(br, p, 1); > > list_del_rcu(&p->list); > > @@ -448,7 +448,7 @@ int br_add_if(struct net_bridge *br, str > > return 0; > err2: > - br_fdb_delete_by_port(br, p); > + br_fdb_delete_by_port(br, p, 1); > err1: > kobject_del(&p->kobj); > err0: > --- bridge.orig/net/bridge/br_private.h > +++ bridge/net/bridge/br_private.h > @@ -143,7 +143,7 @@ extern void br_fdb_changeaddr(struct net > const unsigned char *newaddr); > extern void br_fdb_cleanup(unsigned long arg); > extern void br_fdb_delete_by_port(struct net_bridge *br, > -struct net_bridge_port *p); > + const struct net_bridge_port *p, int do_all); > extern struct net_bridge_fdb_entry *__br_fdb_get(struct net_bridge *br, >const unsigned char *addr); > extern struct net_bridge_fdb_entry *br_fdb_get(struct net_bridge *br, > --- bridge.orig/net/bridge/br_stp_if.c > +++ bridge/net/bridge/br_stp_if.c > @@ -113,6 +113,8 @@ void br_stp_disable_port(struct net_brid > del_timer(&p->forward_delay_timer); > del_timer(&p->hold_timer); > > + br_fdb_delete_by_port(br, p, 0); > + > br_configuration_update(br); > > br_port_state_selection(br); > ___ > Bridge mailing list > Bridge@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/bridge > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 02/13] net: phy: sfp: handle non-wired SFP connectors
On Fri, May 04, 2018 at 03:56:32PM +0200, Antoine Tenart wrote: > SFP connectors can be solder on a board without having any of their pins > (LOS, i2c...) wired. In such cases the SFP link state cannot be guessed, > and the overall link status reporting is left to other layers. > > In order to achieve this, a new SFP_DEV status is added, named UNKNOWN. > This mode is set when it is not possible for the SFP code to get the > link status and as a result the link status is reported to be always UP > from the SFP point of view. This looks weird to me. SFP_DEV_* states track the netdevice up/down state and have little to do with whether LOS or MODDEF0 are implemented. I think it would be better to have a new SFP_MOD_* and to force sm_mod_state to that in this circumstance. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next v2 03/13] net: phy: sfp: warn the user when no tx_disable pin is available
On Fri, May 04, 2018 at 03:56:33PM +0200, Antoine Tenart wrote: > In case no Tx disable pin is available the SFP modules will always be > emitting. This could be an issue when using modules using laser as their > light source as we would have no way to disable it when the fiber is > removed. This patch adds a warning when registering an SFP cage which do > not have its tx_disable pin wired or available. > > Signed-off-by: Antoine Tenart Looks fine, thanks. Acked-by: Russell King > --- > drivers/net/phy/sfp.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > index 8e323a4b70da..d4f503b2e3e2 100644 > --- a/drivers/net/phy/sfp.c > +++ b/drivers/net/phy/sfp.c > @@ -1093,6 +1093,15 @@ static int sfp_probe(struct platform_device *pdev) > if (!sfp->gpio[GPIO_MODDEF0] && !sfp->gpio[GPIO_LOS]) > sfp->sm_dev_state = SFP_DEV_UNKNOWN; > > + /* We could have an issue in cases no Tx disable pin is available or > + * wired as modules using a laser as their light source will continue to > + * be active when the fiber is removed. This could be a safety issue and > + * we should at least warn the user about that. > + */ > + if (!sfp->gpio[GPIO_TX_DISABLE]) > + dev_warn(sfp->dev, > + "No tx_disable pin: SFP modules will always be > emitting.\n"); > + > return 0; > } > > -- > 2.17.0 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net] net: phy: sfp: fix the BR,min computation
On Fri, May 04, 2018 at 05:10:54PM +0200, Antoine Tenart wrote: > In an SFP EEPROM values can be read to get information about a given SFP > module. One of those is the bitrate, which can be determined using a > nominal bitrate in addition with min and max values (in %). The SFP code > currently compute both BR,min and BR,max values thanks to this nominal > and min,max values. > > This patch fixes the BR,min computation as the min value should be > subtracted to the nominal one, not added. > > Fixes: 9962acf7fb8c ("sfp: add support for 1000Base-PX and 1000Base-BX10") > Signed-off-by: Antoine Tenart I know David has already applied it, but for the record, your fix looks correct, thanks. > --- > drivers/net/phy/sfp-bus.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c > index 0381da78d228..fd6c23f69c2f 100644 > --- a/drivers/net/phy/sfp-bus.c > +++ b/drivers/net/phy/sfp-bus.c > @@ -125,7 +125,7 @@ void sfp_parse_support(struct sfp_bus *bus, const struct > sfp_eeprom_id *id, > if (id->base.br_nominal) { > if (id->base.br_nominal != 255) { > br_nom = id->base.br_nominal * 100; > - br_min = br_nom + id->base.br_nominal * id->ext.br_min; > + br_min = br_nom - id->base.br_nominal * id->ext.br_min; > br_max = br_nom + id->base.br_nominal * id->ext.br_max; > } else if (id->ext.br_max) { > br_nom = 250 * id->ext.br_max; > -- > 2.17.0 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next] net: phy: sfp: handle cases where neither BR,min nor BR,max is given
On Fri, May 04, 2018 at 05:21:03PM +0200, Antoine Tenart wrote: > When computing the bitrate using values read from an SFP module EEPROM, > we use the nominal BR plus BR,min and BR,max to determine the > boundaries. But in some cases BR,min and BR,max aren't provided, which > led the SFP code to end up having the nominal value for both the minimum > and maximum bitrate values. When using a passive cable, the nominal > value should be used as the maximum one, and there is no minimum one > so we should use 0. > > Signed-off-by: Antoine Tenart > --- > > Hi Russell, > > I'm not completely sure about this patch as this case is not really > specified in the specification. But the issue is there, and I've discuss > this with others. It seemed logical (at least to us :)) to use the > BR,nominal values as br_max and 0 as br_min when using a passive cable > which only provides BR,nominal as this would be the highest rate at > which the cable could work. And because it's passive, there should be no > issues using it at a lower rate. > > I've tested this with one passive cable which only reports its > BR,nominal (which was 10300) while it could be used when using 1000baseX > or 2500baseX modes. The electronic engineer in me says that using zero isn't really valid because there are coupling capacitors in the SFP module that block DC. These blocking capacitors are required by the SFP+ specs to have a high pass pole of between 20kHz and 100kHz - in other words, frequencies below this are attenuated by the coupling capacitors. The relationship between this and the bit rate will be a function of the encoding, so we can't come to a definitive figure without some math (and I want to be lazy about that!) Practically, we're talking about SerDes Ethernet, where the bit rate is no lower than 100Mbps [*], which will always have a frequency well above this cut-off. So, I don't have any problem with your approach to setting the minimum to zero. Therefore, Acked-by: Russell King Please send me the EEPROM dump using: ethtool -m raw on > foo.bin so I can add it to my database for future testing and validation. Thanks. * - 10Mbps SGMII is 1Gbps SGMII with each bit repeated 100 times. 100Mbps SGMII is 1Gbps SGMII with each bit repeated 10 times. There is a capability bits for transceivers supporting 100base-FX/LX but no one has tested those yet. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next v2 06/13] phy: add 2.5G SGMII mode to the phy_mode enum
On Fri, May 04, 2018 at 03:56:36PM +0200, Antoine Tenart wrote: > This patch adds one more generic PHY mode to the phy_mode enum, to allow > configuring generic PHYs to the 2.5G SGMII mode by using the set_mode > callback. > > Signed-off-by: Antoine Tenart > Acked-by: Kishon Vijay Abraham I Hi, Would it be possible to get the 2.5G SGMII comphy support merged ahead of the rest of this series please - I don't think there's been any objections to it, and having it in mainline would then mean I can drop the Marvell Comphy code from my tree and transition to the bootlin Comphy code instead. Of course, the perfect solution would be to get the whole series merged, but I'm just thinking about the situation where we're still discussing points when the next merge window opens. Thanks. > --- > include/linux/phy/phy.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h > index c9d147f5..9713aebdd348 100644 > --- a/include/linux/phy/phy.h > +++ b/include/linux/phy/phy.h > @@ -36,6 +36,7 @@ enum phy_mode { > PHY_MODE_USB_DEVICE_SS, > PHY_MODE_USB_OTG, > PHY_MODE_SGMII, > + PHY_MODE_2500SGMII, > PHY_MODE_10GKR, > PHY_MODE_UFS_HS_A, > PHY_MODE_UFS_HS_B, > -- > 2.17.0 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next] net: phy: sfp: handle cases where neither BR,min nor BR,max is given
On Sat, May 05, 2018 at 01:35:34PM -0700, Florian Fainelli wrote: > On May 4, 2018 8:21:03 AM PDT, Antoine Tenart > wrote: > >When computing the bitrate using values read from an SFP module EEPROM, > >we use the nominal BR plus BR,min and BR,max to determine the > >boundaries. But in some cases BR,min and BR,max aren't provided, which > >led the SFP code to end up having the nominal value for both the > >minimum > >and maximum bitrate values. When using a passive cable, the nominal > >value should be used as the maximum one, and there is no minimum one > >so we should use 0. > > > >Signed-off-by: Antoine Tenart > >--- > > > >Hi Russell, > > > >I'm not completely sure about this patch as this case is not really > >specified in the specification. But the issue is there, and I've > >discuss > >this with others. It seemed logical (at least to us :)) to use the > >BR,nominal values as br_max and 0 as br_min when using a passive cable > >which only provides BR,nominal as this would be the highest rate at > >which the cable could work. And because it's passive, there should be > >no > >issues using it at a lower rate. > > > >I've tested this with one passive cable which only reports its > >BR,nominal (which was 10300) while it could be used when using > >1000baseX > >or 2500baseX modes. > > Which SFP modules (vendor and model) exposed this out of curiosity? > Russell and I already saw the Cotsworks modules having so e issues > with checksums, so building a table of quirks would help. Thanks! I think this is just manufacturers being lazy with their EEPROM contents - looking around, most passive cables are specified to be "up to" some figure, and that's definitely what's specified by the SFP+ specification by way of the high-pass pole requirement of the coupling capacitors. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next v2 03/13] net: phy: sfp: warn the user when no tx_disable pin is available
On Sat, May 05, 2018 at 10:52:42PM +0200, Andrew Lunn wrote: > On Sat, May 05, 2018 at 01:38:31PM -0700, Florian Fainelli wrote: > > On May 4, 2018 10:14:25 AM PDT, Andrew Lunn wrote: > > >On Fri, May 04, 2018 at 10:07:53AM -0700, Florian Fainelli wrote: > > >> On 05/04/2018 06:56 AM, Antoine Tenart wrote: > > >> > In case no Tx disable pin is available the SFP modules will always > > >be > > >> > emitting. This could be an issue when using modules using laser as > > >their > > >> > light source as we would have no way to disable it when the fiber > > >is > > >> > removed. This patch adds a warning when registering an SFP cage > > >which do > > >> > not have its tx_disable pin wired or available. > > >> > > >> Is this something that was done in a possibly earlier revision of a > > >> given board design and which was finally fixed? Nothing wrong with > > >the > > >> patch, but this seems like a pretty serious board design mistake, > > >that > > >> needs to be addressed. > > > > > >Hi Florian > > > > > >Zii Devel B is like this. Only the "Signal Detect" pin is wired to a > > >GPIO. > > > > > Good point, indeed. BTW what do you think about exposing the SFF's > > EEPROM and diagnostics through the standard ethtool operations even > > if we have to keep the description of the SFF as a fixed link in > > Device Tree because of the unfortunate wiring? > > I believe in Antoine case, all the control plane is broken. He cannot > read the EEPROM, nor any of the modules pins via GPIOs. Correct. > For Zii Devel B, the EEPROM is accessible, and so is the SD pin. What > is missing is transmit disable. So i would expose it as an SFF module. Agreed. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next v2 3/9] net: phy: phylink: Poll link GPIOs
On Thu, May 10, 2018 at 01:17:31PM -0700, Florian Fainelli wrote: > From: Russell King > > When using a fixed link with a link GPIO, we need to poll that GPIO to > determine link state changes. This is consistent with what fixed_phy.c does. > > Signed-off-by: Florian Fainelli I'd like this to use the GPIO interrupt where available, only falling back to the timer approach when there's no interrupt. Unfortunately, I don't have much time to devote to this at the moment, having recently been away on vacation, and now having to work on ARM specific issues for probably all of the remainder of this kernel cycle. That means I won't have time to test your series on any of the boards I have available to me. > --- > drivers/net/phy/phylink.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > index 6392b5248cf5..581ce93ecaf9 100644 > --- a/drivers/net/phy/phylink.c > +++ b/drivers/net/phy/phylink.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > > #include "sfp.h" > @@ -54,6 +55,7 @@ struct phylink { > /* The link configuration settings */ > struct phylink_link_state link_config; > struct gpio_desc *link_gpio; > + struct timer_list link_poll; > void (*get_fixed_state)(struct net_device *dev, > struct phylink_link_state *s); > > @@ -500,6 +502,15 @@ static void phylink_run_resolve(struct phylink *pl) > queue_work(system_power_efficient_wq, &pl->resolve); > } > > +static void phylink_fixed_poll(struct timer_list *t) > +{ > + struct phylink *pl = container_of(t, struct phylink, link_poll); > + > + mod_timer(t, jiffies + HZ); > + > + phylink_run_resolve(pl); > +} > + > static const struct sfp_upstream_ops sfp_phylink_ops; > > static int phylink_register_sfp(struct phylink *pl, > @@ -572,6 +583,7 @@ struct phylink *phylink_create(struct net_device *ndev, > pl->link_config.an_enabled = true; > pl->ops = ops; > __set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state); > + timer_setup(&pl->link_poll, phylink_fixed_poll, 0); > > bitmap_fill(pl->supported, __ETHTOOL_LINK_MODE_MASK_NBITS); > linkmode_copy(pl->link_config.advertising, pl->supported); > @@ -905,6 +917,8 @@ void phylink_start(struct phylink *pl) > clear_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state); > phylink_run_resolve(pl); > > + if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio)) > + mod_timer(&pl->link_poll, jiffies + HZ); > if (pl->sfp_bus) > sfp_upstream_start(pl->sfp_bus); > if (pl->phydev) > @@ -929,6 +943,8 @@ void phylink_stop(struct phylink *pl) > phy_stop(pl->phydev); > if (pl->sfp_bus) > sfp_upstream_stop(pl->sfp_bus); > + if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio)) > + del_timer_sync(&pl->link_poll); > > set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state); > queue_work(system_power_efficient_wq, &pl->resolve); > -- > 2.14.1 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next 2/4] net: phy: phylink: Provide PHY interface to mac_link_{up,down}
On Sun, Mar 18, 2018 at 11:52:44AM -0700, Florian Fainelli wrote: > In preparation for having DSA transition entirely to PHYLINK, we need to pass > a > PHY interface type to the mac_link_{up,down} callbacks because we may have to > make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not > pass > an entire phylink_link_state because not all parameters (pause, duplex etc.) > are > defined when the link is down, only link and interface are. If we're going to make this change, we ought to decide whether David should pick this up for the coming merge window or not independently of the remaining patches - there are other users of phylink in the pipeline (bootlin are working on mvpp2 support, so this will be a minor source of build error pain for folk.) To that end, Acked-by: Russell King However, the documentation probably ought to make it clear that the configuration of the interface mode of the MAC should always happen in the mac_config() callback, not in the mac_link_*() functions. Thanks. > Update mvneta accordingly since it currently implements phylink_mac_ops. > > Signed-off-by: Florian Fainelli > --- > drivers/net/ethernet/marvell/mvneta.c | 4 +++- > drivers/net/phy/phylink.c | 6 +- > include/linux/phylink.h | 10 -- > 3 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index 25e9a551cc8c..60de9b8d62c2 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -3396,7 +3396,8 @@ static void mvneta_set_eee(struct mvneta_port *pp, bool > enable) > mvreg_write(pp, MVNETA_LPI_CTRL_1, lpi_ctl1); > } > > -static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode) > +static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode, > + phy_interface_t interface) > { > struct mvneta_port *pp = netdev_priv(ndev); > u32 val; > @@ -3415,6 +3416,7 @@ static void mvneta_mac_link_down(struct net_device > *ndev, unsigned int mode) > } > > static void mvneta_mac_link_up(struct net_device *ndev, unsigned int mode, > +phy_interface_t interface, > struct phy_device *phy) > { > struct mvneta_port *pp = netdev_priv(ndev); > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > index 51a011a349fe..cef3c1356a8c 100644 > --- a/drivers/net/phy/phylink.c > +++ b/drivers/net/phy/phylink.c > @@ -423,8 +423,10 @@ static void phylink_resolve(struct work_struct *w) > if (pl->phylink_disable_state) { > pl->mac_link_dropped = false; > link_state.link = false; > + link_state.interface = pl->phy_state.interface; > } else if (pl->mac_link_dropped) { > link_state.link = false; > + link_state.interface = pl->phy_state.interface; > } else { > switch (pl->link_an_mode) { > case MLO_AN_PHY: > @@ -470,10 +472,12 @@ static void phylink_resolve(struct work_struct *w) > if (link_state.link != netif_carrier_ok(ndev)) { > if (!link_state.link) { > netif_carrier_off(ndev); > - pl->ops->mac_link_down(ndev, pl->link_an_mode); > + pl->ops->mac_link_down(ndev, pl->link_an_mode, > +pl->phy_state.interface); > netdev_info(ndev, "Link is Down\n"); > } else { > pl->ops->mac_link_up(ndev, pl->link_an_mode, > + pl->phy_state.interface, >pl->phydev); > > netif_carrier_on(ndev); > diff --git a/include/linux/phylink.h b/include/linux/phylink.h > index bd137c273d38..f29a40947de9 100644 > --- a/include/linux/phylink.h > +++ b/include/linux/phylink.h > @@ -73,8 +73,10 @@ struct phylink_mac_ops { > void (*mac_config)(struct net_device *ndev, unsigned int mode, > const struct phylink_link_state *state); > void (*mac_an_restart)(struct net_device *ndev); > - void (*mac_link_down)(struct net_device *ndev, unsigned int mode); > + void (*mac_link_down)(struct net_device *ndev, unsigned int mode, > + phy_interface_t interface); > void (*mac_link_up)(struct net_device *ndev, unsigned int mode, > + phy_interface_t interface, > struct phy_device *phy); >
Re: [PATCH] sfp: allow cotsworks modules
On Wed, Mar 28, 2018 at 03:33:57AM -0700, Joe Perches wrote: > On Wed, 2018-03-28 at 11:18 +0100, Russell King wrote: > > Cotsworks modules fail the checksums - it appears that Cotsworks > > reprograms the EEPROM at the end of production with the final product > > information (serial, date code, and exact part number for module > > options) and fails to update the checksum. > > trivia: > > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > [] > > @@ -574,23 +575,43 @@ static int sfp_sm_mod_probe(struct sfp *sfp) > [] > > + if (cotsworks) { > > + dev_warn(sfp->dev, > > +"EEPROM base structure checksum failure > > (0x%02x != 0x%02x)\n", > > +check, id.base.cc_base); > > + } else { > > + dev_err(sfp->dev, > > + "EEPROM base structure checksum failure: 0x%02x > > != 0x%02x\n", > > It'd be better to move this above the if and > use only a single format string instead of > using 2 slightly different formats. No. I think you've missed the fact that one is a _warning_ the other is an _error_ and they are emitted at the appropriate severity. It's not just that the format strings are slightly different. > > > + check, id.base.cc_base); > > + print_hex_dump(KERN_ERR, "sfp EE: ", DUMP_PREFIX_OFFSET, > > + 16, 1, &id, sizeof(id), true); > > + return -EINVAL; > > + } > > } > > > > check = sfp_check(&id.ext, sizeof(id.ext) - 1); > > if (check != id.ext.cc_ext) { > > - dev_err(sfp->dev, > > - "EEPROM extended structure checksum failure: 0x%02x\n", > > - check); > > - memset(&id.ext, 0, sizeof(id.ext)); > > + if (cotsworks) { > > + dev_warn(sfp->dev, > > +"EEPROM extended structure checksum failure > > (0x%02x != 0x%02x)\n", > > +check, id.ext.cc_ext); > > + } else { > > + dev_err(sfp->dev, > > + "EEPROM extended structure checksum failure: > > 0x%02x != 0x%02x\n", > > + check, id.ext.cc_ext); > > > here too Same applies. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH] sfp: allow cotsworks modules
On Wed, Mar 28, 2018 at 09:19:01AM -0700, Joe Perches wrote: > On Wed, 2018-03-28 at 11:41 +0100, Russell King - ARM Linux wrote: > > On Wed, Mar 28, 2018 at 03:33:57AM -0700, Joe Perches wrote: > > > On Wed, 2018-03-28 at 11:18 +0100, Russell King wrote: > > > > Cotsworks modules fail the checksums - it appears that Cotsworks > > > > reprograms the EEPROM at the end of production with the final product > > > > information (serial, date code, and exact part number for module > > > > options) and fails to update the checksum. > > > > > > trivia: > > > > > > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > > > > > > [] > > > > @@ -574,23 +575,43 @@ static int sfp_sm_mod_probe(struct sfp *sfp) > > > > > > [] > > > > + if (cotsworks) { > > > > + dev_warn(sfp->dev, > > > > +"EEPROM base structure checksum > > > > failure (0x%02x != 0x%02x)\n", > > > > +check, id.base.cc_base); > > > > + } else { > > > > + dev_err(sfp->dev, > > > > + "EEPROM base structure checksum > > > > failure: 0x%02x != 0x%02x\n", > > > > > > It'd be better to move this above the if and > > > use only a single format string instead of > > > using 2 slightly different formats. > > > > No. I think you've missed the fact that one is a _warning_ the other is > > an _error_ and they are emitted at the appropriate severity. It's not > > just that the format strings are slightly different. > > Right. Still nicer to use the same formats. I'll stick a "Warning:" and "Error:" tag before them if you really want the rest of the message to be identically formatted - otherwise, when seeing reports from people's dmesg, there will be nothing to indicate which message was printed. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next 1/2] net: phy: phylink: Provide PHY interface to mac_link_{up,down}
On Wed, Mar 28, 2018 at 12:03:38PM -0700, Florian Fainelli wrote: > In preparation for having DSA transition entirely to PHYLINK, we need to pass > a > PHY interface type to the mac_link_{up,down} callbacks because we may have to > make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not > pass > an entire phylink_link_state because not all parameters (pause, duplex etc.) > are > defined when the link is down, only link and interface are. > > Update mvneta accordingly since it currently implements phylink_mac_ops. > > Signed-off-by: Florian Fainelli Similar comments to previous version wrt documentation, but... Acked-by: Russell King > --- > drivers/net/ethernet/marvell/mvneta.c | 4 +++- > drivers/net/phy/phylink.c | 4 +++- > include/linux/phylink.h | 10 -- > 3 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index eaa4bb80f1c9..cd09bde55596 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -3396,7 +3396,8 @@ static void mvneta_set_eee(struct mvneta_port *pp, bool > enable) > mvreg_write(pp, MVNETA_LPI_CTRL_1, lpi_ctl1); > } > > -static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode) > +static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode, > + phy_interface_t interface) > { > struct mvneta_port *pp = netdev_priv(ndev); > u32 val; > @@ -3415,6 +3416,7 @@ static void mvneta_mac_link_down(struct net_device > *ndev, unsigned int mode) > } > > static void mvneta_mac_link_up(struct net_device *ndev, unsigned int mode, > +phy_interface_t interface, > struct phy_device *phy) > { > struct mvneta_port *pp = netdev_priv(ndev); > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > index 51a011a349fe..9b1e4721ea3a 100644 > --- a/drivers/net/phy/phylink.c > +++ b/drivers/net/phy/phylink.c > @@ -470,10 +470,12 @@ static void phylink_resolve(struct work_struct *w) > if (link_state.link != netif_carrier_ok(ndev)) { > if (!link_state.link) { > netif_carrier_off(ndev); > - pl->ops->mac_link_down(ndev, pl->link_an_mode); > + pl->ops->mac_link_down(ndev, pl->link_an_mode, > +pl->phy_state.interface); > netdev_info(ndev, "Link is Down\n"); > } else { > pl->ops->mac_link_up(ndev, pl->link_an_mode, > + pl->phy_state.interface, > pl->phydev); > > netif_carrier_on(ndev); > diff --git a/include/linux/phylink.h b/include/linux/phylink.h > index bd137c273d38..f29a40947de9 100644 > --- a/include/linux/phylink.h > +++ b/include/linux/phylink.h > @@ -73,8 +73,10 @@ struct phylink_mac_ops { > void (*mac_config)(struct net_device *ndev, unsigned int mode, > const struct phylink_link_state *state); > void (*mac_an_restart)(struct net_device *ndev); > - void (*mac_link_down)(struct net_device *ndev, unsigned int mode); > + void (*mac_link_down)(struct net_device *ndev, unsigned int mode, > + phy_interface_t interface); > void (*mac_link_up)(struct net_device *ndev, unsigned int mode, > + phy_interface_t interface, > struct phy_device *phy); > }; > > @@ -161,17 +163,20 @@ void mac_an_restart(struct net_device *ndev); > * mac_link_down() - take the link down > * @ndev: a pointer to a &struct net_device for the MAC. > * @mode: link autonegotiation mode > + * @interface: link &typedef phy_interface_t mode > * > * If @mode is not an in-band negotiation mode (as defined by > * phylink_autoneg_inband()), force the link down and disable any > * Energy Efficient Ethernet MAC configuration. > */ > -void mac_link_down(struct net_device *ndev, unsigned int mode); > +void mac_link_down(struct net_device *ndev, unsigned int mode, > +phy_interface_t interface); > > /** > * mac_link_up() - allow the link to come up > * @ndev: a pointer to a &struct net_device for the MAC. > * @mode: link autonegotiation mode > + * @interface: link &typedef phy_interface_t mode > * @phy: any attached phy > * > * If @mode is not an in-band negotiation mode (a
Re: [PATCH net-next 2/2] sfp/phylink: move module EEPROM ethtool access into netdev core ethtool
On Wed, Mar 28, 2018 at 12:03:39PM -0700, Florian Fainelli wrote: > From: Russell King > > Provide a pointer to the SFP bus in struct net_device, so that the > ethtool module EEPROM methods can access the SFP directly, rather > than needing every user to provide a hook for it. > > Signed-off-by: Russell King This probably ought to have your sign-off too as you're passing the patch along rather than me submitting it directly. DCO v1.1 (c) seems to apply to this situation. > --- > drivers/net/ethernet/marvell/mvneta.c | 18 -- > drivers/net/phy/phylink.c | 28 > drivers/net/phy/sfp-bus.c | 6 ++ > include/linux/netdevice.h | 3 +++ > include/linux/phylink.h | 3 --- > net/core/ethtool.c| 7 +++ > 6 files changed, 12 insertions(+), 53 deletions(-) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index cd09bde55596..25ced96750bf 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -4075,22 +4075,6 @@ static int mvneta_ethtool_set_wol(struct net_device > *dev, > return ret; > } > > -static int mvneta_ethtool_get_module_info(struct net_device *dev, > - struct ethtool_modinfo *modinfo) > -{ > - struct mvneta_port *pp = netdev_priv(dev); > - > - return phylink_ethtool_get_module_info(pp->phylink, modinfo); > -} > - > -static int mvneta_ethtool_get_module_eeprom(struct net_device *dev, > - struct ethtool_eeprom *ee, u8 *buf) > -{ > - struct mvneta_port *pp = netdev_priv(dev); > - > - return phylink_ethtool_get_module_eeprom(pp->phylink, ee, buf); > -} > - > static int mvneta_ethtool_get_eee(struct net_device *dev, > struct ethtool_eee *eee) > { > @@ -4165,8 +4149,6 @@ static const struct ethtool_ops mvneta_eth_tool_ops = { > .set_link_ksettings = mvneta_ethtool_set_link_ksettings, > .get_wol= mvneta_ethtool_get_wol, > .set_wol= mvneta_ethtool_set_wol, > - .get_module_info = mvneta_ethtool_get_module_info, > - .get_module_eeprom = mvneta_ethtool_get_module_eeprom, > .get_eee= mvneta_ethtool_get_eee, > .set_eee= mvneta_ethtool_set_eee, > }; > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > index 9b1e4721ea3a..c582b2d7546c 100644 > --- a/drivers/net/phy/phylink.c > +++ b/drivers/net/phy/phylink.c > @@ -1250,34 +1250,6 @@ int phylink_ethtool_set_pauseparam(struct phylink *pl, > } > EXPORT_SYMBOL_GPL(phylink_ethtool_set_pauseparam); > > -int phylink_ethtool_get_module_info(struct phylink *pl, > - struct ethtool_modinfo *modinfo) > -{ > - int ret = -EOPNOTSUPP; > - > - WARN_ON(!lockdep_rtnl_is_held()); > - > - if (pl->sfp_bus) > - ret = sfp_get_module_info(pl->sfp_bus, modinfo); > - > - return ret; > -} > -EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_info); > - > -int phylink_ethtool_get_module_eeprom(struct phylink *pl, > - struct ethtool_eeprom *ee, u8 *buf) > -{ > - int ret = -EOPNOTSUPP; > - > - WARN_ON(!lockdep_rtnl_is_held()); > - > - if (pl->sfp_bus) > - ret = sfp_get_module_eeprom(pl->sfp_bus, ee, buf); > - > - return ret; > -} > -EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_eeprom); > - > /** > * phylink_ethtool_get_eee_err() - read the energy efficient ethernet error > * counter > diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c > index 3d4ff5d0d2a6..0381da78d228 100644 > --- a/drivers/net/phy/sfp-bus.c > +++ b/drivers/net/phy/sfp-bus.c > @@ -342,6 +342,7 @@ static int sfp_register_bus(struct sfp_bus *bus) > } > if (bus->started) > bus->socket_ops->start(bus->sfp); > + bus->netdev->sfp_bus = bus; > bus->registered = true; > return 0; > } > @@ -356,6 +357,7 @@ static void sfp_unregister_bus(struct sfp_bus *bus) > if (bus->phydev && ops && ops->disconnect_phy) > ops->disconnect_phy(bus->upstream); > } > + bus->netdev->sfp_bus = NULL; > bus->registered = false; > } > > @@ -371,8 +373,6 @@ static void sfp_unregister_bus(struct sfp_bus *bus) > */ > int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo) > { > - if (!bus->registered) > -
Re: [EXT] [PATCH net-next v2 0/2] phylink: API changes
On Thu, Mar 29, 2018 at 05:58:43AM +, Yan Markman wrote: > Hi Florian > Please keep CCYelena Krivosheev > for changes with drivers/net/ethernet/marvell/mvneta.c > Thanks We have a way to ensure such things happen - it's the MAINTAINERS file. Please use the established community methods rather than sending emails asking for people to remember such quirks. Thanks. > Yan Markman > Tel. 05-44732819 > > > -Original Message- > From: Florian Fainelli [mailto:f.faine...@gmail.com] > Sent: Thursday, March 29, 2018 1:44 AM > To: netdev@vger.kernel.org > Cc: Florian Fainelli ; Thomas Petazzoni > ; Andrew Lunn ; David S. > Miller ; Russell King ; open > list ; Antoine Tenart > ; Yan Markman ; Stefan > Chulski ; Maxime Chevallier > ; Miquel Raynal > ; Marcin Wojtas > Subject: [EXT] [PATCH net-next v2 0/2] phylink: API changes > > External Email > > -- > Hi all, > > This patch series contains two API changes to PHYLINK which will later be > used by DSA to migrate to PHYLINK. Because these are API changes that impact > other outstanding work (e.g: MVPP2) I would rather get them included sooner > to minimize conflicts. > > Thank you! > > Changes in v2: > > - added missing documentation to mac_link_{up,down} that the interface > must be configured in mac_config() > > - added Russell's, Andrew's and my tags > > Florian Fainelli (1): > net: phy: phylink: Provide PHY interface to mac_link_{up,down} > > Russell King (1): > sfp/phylink: move module EEPROM ethtool access into netdev core > ethtool > > drivers/net/ethernet/marvell/mvneta.c | 22 +++--- > drivers/net/phy/phylink.c | 32 +++- > drivers/net/phy/sfp-bus.c | 6 ++ > include/linux/netdevice.h | 3 +++ > include/linux/phylink.h | 17 +++-- > net/core/ethtool.c| 7 +++ > 6 files changed, 29 insertions(+), 58 deletions(-) > > -- > 2.14.1 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next] phylink: Fix an uninitialized variable bug
On Thu, Aug 10, 2017 at 05:21:12PM +0200, Andrew Lunn wrote: > On Thu, Aug 10, 2017 at 12:35:50AM +0300, Dan Carpenter wrote: > > "ret" isn't necessarily initialized here. > > > > Fixes: 9525ae83959b ("phylink: add phylink infrastructure") > > Signed-off-by: Dan Carpenter > > Reviewed-by: Andrew Lunn Thanks, not sure how that got missed - it was probably introduced when migrating the code to ksettings. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [RFC PATCH] dt-binding: net: sfp binding documentation
On Sun, Aug 20, 2017 at 01:28:06PM +0300, Baruch Siach wrote: > Add device-tree binding documentation SFP transceivers. Support for SFP > transceivers has been recently introduced (drivers/net/phy/sfp.c). > > Signed-off-by: Baruch Siach > --- > > The SFP driver is on net-next. > > Not sure about the rate-select-gpio property name. The SFP+ standard > (not supported yet) uses two signals, RS0 and RS1. RS0 is compatible > with the SFP rate select signal, while RS1 controls the Tx rate. SFP+ is usable with this, but the platforms I have do not wire the rate select pins on the SFP+ sockets to GPIOs, but hard-wire them. Note that I didn't expect the SFP code to just get merged with very little in the way of real in-depth review of things like: * the way the SFP code works, and its structure * analysis of the bindings checking that they're fit for everyone's purposes. The implementation that I've designed is based around the boards that I have access to and the various public SFP documentation. I think documenting the bindings suggests that they are stable - I don't think we're really ready to make that assertion yet - there may be things that have been missed which will only come up when other people start using this code. > --- > Documentation/devicetree/bindings/net/sff-sfp.txt | 24 > +++ > 1 file changed, 24 insertions(+) > create mode 100644 Documentation/devicetree/bindings/net/sff-sfp.txt > > diff --git a/Documentation/devicetree/bindings/net/sff-sfp.txt > b/Documentation/devicetree/bindings/net/sff-sfp.txt > new file mode 100644 > index ..f0c27bc3925e > --- /dev/null > +++ b/Documentation/devicetree/bindings/net/sff-sfp.txt > @@ -0,0 +1,24 @@ > +Small Form Factor (SFF) Committee Small Form-factor Pluggable (SFP) > +Transceiver > + > +Required properties: > + > +- compatible : must be "sff,sfp" > + > +Optional Properties: > + > +- i2c-bus : phandle of an I2C bus controller for the SFP two wire serial > + interface The code as it currently stands pretty much requires an I2C bus to be functional - but when I wrote the code, I left the possibility open for an implementation (eg, network driver) to provide its own functionality for reading the I2C EEPROM on the module. Some adapters which already have SFP support do this. Hence, for current implementations, this is required. > + > +- moddef0-gpio : phandle of the MOD-DEF0 (AKA Mod_ABS) module presence input > + gpio signal > + > +- los-gpio : phandle of the Receiver Loss of Signal Indication input gpio > + signal > + > +- tx-fault-gpio : phandle of the Module Transmitter Fault input gpio signal > + > +- tx-disable-gpio : phandle of the Transmitter Disable output gpio signal > + > +- rate-select-gpio : phandle of the Rx Signaling Rate Select (AKA RS0) output > + gpio > -- > 2.14.1 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [RFC PATCH] dt-binding: net: sfp binding documentation
On Mon, Aug 21, 2017 at 02:10:33PM -0500, Rob Herring wrote: > On Sun, Aug 20, 2017 at 5:28 AM, Baruch Siach wrote: > > Add device-tree binding documentation SFP transceivers. Support for SFP > > transceivers has been recently introduced (drivers/net/phy/sfp.c). > > > > Signed-off-by: Baruch Siach > > --- > > > > The SFP driver is on net-next. > > > > Not sure about the rate-select-gpio property name. The SFP+ standard > > (not supported yet) uses two signals, RS0 and RS1. RS0 is compatible > > with the SFP rate select signal, while RS1 controls the Tx rate. > > --- > > Documentation/devicetree/bindings/net/sff-sfp.txt | 24 > > +++ > > 1 file changed, 24 insertions(+) > > create mode 100644 Documentation/devicetree/bindings/net/sff-sfp.txt > > > > diff --git a/Documentation/devicetree/bindings/net/sff-sfp.txt > > b/Documentation/devicetree/bindings/net/sff-sfp.txt > > new file mode 100644 > > index ..f0c27bc3925e > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/net/sff-sfp.txt > > @@ -0,0 +1,24 @@ > > +Small Form Factor (SFF) Committee Small Form-factor Pluggable (SFP) > > +Transceiver > > + > > +Required properties: > > + > > +- compatible : must be "sff,sfp" > > Need to document "sff" vendor prefix. > > Kind of a short name, but I guess it is sufficient. Are there > revisions of the standard (not SFP+) or more than one form factor (I > don't recall any)? The standards get revised and reorganised, so you can't really name any particular standard. SFP+ is a supplement to SFP, and I suspect that's going to continue into the future. > > + > > +Optional Properties: > > + > > +- i2c-bus : phandle of an I2C bus controller for the SFP two wire serial > > + interface > > Why not a child of the i2c bus it is on? IOW, what should this be a child of? What reg= value would you use to identify it? There's no particular I2C bus address. There's an EEPROM on the actual module, and there may be a PHY on the I2C bus (some PHYs include I2C as an alternative way to speak to them other than MDIO.) I2C couldn't probe these as they are effectively hotplugged. However, there's also the question about why it should be a child of the I2C bus - the I2C bus is just a means of communicating with and identifying the module. You could equally argue that it should be a child of the GPIO controller, because that's how it's controlled. You could also argue that it should be a child of the ethernet interface, since that's the main data path. > > + > > +- moddef0-gpio : phandle of the MOD-DEF0 (AKA Mod_ABS) module presence > > input > > + gpio signal > > mod-def0-gpios? It all depends on the standard you read. Some call it MOD_DEF0, Mod-DEF0, Mod_ABS, and some call it MOD-DEF0. And confusingly, some standards call the binary combination of the three MOD-DEF signals "MOD-DEF 0"... "MOD-DEF 7". These signals come from the GBIC module era. It's something of a mess. > > + > > +- los-gpio : phandle of the Receiver Loss of Signal Indication input gpio > > + signal > > + > > +- tx-fault-gpio : phandle of the Module Transmitter Fault input gpio signal > > + > > +- tx-disable-gpio : phandle of the Transmitter Disable output gpio signal > > + > > +- rate-select-gpio : phandle of the Rx Signaling Rate Select (AKA RS0) > > output > > + gpio > > -gpios is the preferred form for all of these. Even if there's only _one_ - using the plural leads one to think that you can list many GPIOs, which is not correct here. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [RFC PATCH] dt-binding: net: sfp binding documentation
On Mon, Aug 21, 2017 at 02:12:42PM -0500, Rob Herring wrote: > On Mon, Aug 21, 2017 at 10:06 AM, Baruch Siach wrote: > > Hi Russell, > > > > On Mon, Aug 21, 2017 at 01:53:17PM +0100, Russell King - ARM Linux wrote: > >> On Sun, Aug 20, 2017 at 01:28:06PM +0300, Baruch Siach wrote: > >> > Add device-tree binding documentation SFP transceivers. Support for SFP > >> > transceivers has been recently introduced (drivers/net/phy/sfp.c). > >> > > >> > Signed-off-by: Baruch Siach > >> > --- > >> > > >> > The SFP driver is on net-next. > >> > > >> > Not sure about the rate-select-gpio property name. The SFP+ standard > >> > (not supported yet) uses two signals, RS0 and RS1. RS0 is compatible > >> > with the SFP rate select signal, while RS1 controls the Tx rate. > >> > >> SFP+ is usable with this, but the platforms I have do not wire the > >> rate select pins on the SFP+ sockets to GPIOs, but hard-wire them. > > > > So maybe naming this signal 'rate-select0-gpio' would make it more future > > (SPF+) proof? Or 'rate-select-rx-gpio'? > > Just extend it by making it an array of 2 gpios. What do you do if you have only one rate select wired up and it doesn't correspond with the first? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [RFC PATCH] dt-binding: net: sfp binding documentation
On Mon, Aug 21, 2017 at 06:06:53PM +0300, Baruch Siach wrote: > Hi Russell, > > On Mon, Aug 21, 2017 at 01:53:17PM +0100, Russell King - ARM Linux wrote: > > Note that I didn't expect the SFP code to just get merged with very > > little in the way of real in-depth review of things like: > > > > * the way the SFP code works, and its structure > > * analysis of the bindings checking that they're fit for everyone's > > purposes. > > I was also surprised to see the "sff,sfp" compatible string with no ack from > DT maintainers. Hence this RFC. I've been pushed into submitting the code for merging, and I hadn't got around to writing the DT docs (thanks for doing that). As I've already said, I'm disappointed that the code didn't get more of a review before it was merged - it seems Linux review is not what it was, people care more about reviewing for spelling errors and style than code structure and functionality, stating that "if we don't like it we can always rework it" or similar. It also seems that people believe that they can't make use of other people's work until it gets merged into mainline kernels (which is what has been behind the pressure of getting this merged.) What isn't realised is that having other people use the code before it gets merged allows design issues to be identified and resolved when there is great flexibility available - for example, changing the DT binding. Once it's merged, changing DT bindings becomes harder, especially if they need to be changed in an incompatible way. I'm fed up about this, and way past caring about these details today through. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
On Thu, Aug 24, 2017 at 04:56:09PM +0200, Andrew Lunn wrote: > On Thu, Aug 24, 2017 at 10:38:19AM +0200, Antoine Tenart wrote: > > This patch adds logic to reconfigure the comphy/gop when the link status > > change at runtime. This is very useful on boards such as the mcbin which > > have SFP and Ethernet ports connected to the same MAC port: depending on > > what the user connects the driver will automatically reconfigure the > > link mode. > > Hi Antoine > > I would expect each of these external Ethernet ports to have its own > Ethernet PHY. Don't you need to disconnect from one Ethernet phy and > connect to the other Ethernet PHY when you change external Ethernet > port? I think you're all getting confused. The link mode has very little to do with whether you're using SFP+ or whether you're using the RJ45 at 10G speeds. The link mode has everything to do with the speed at which the link is negotiated at. So please, put SFP+ out of your minds for this - SFP+ isn't the reason why you need to switch the MAC link mode. In all cases, the mvpp2 to 88x3310 link ends up in one of two modes: 1. SGMII for RJ45 speeds less than 10G. Autonegotiation on SGMII at the mvpp2 end *must* be enabled for the PHY to work. 2. 10Gbase-R for 10G speeds, whether that be for SFP+ or RJ45 at 10G. Note: mcbin does not support SFP (1G) modules on the SFP+ ports. The 88x3310 driver in the kernel knows about these combinations and sets the phy interface parameter correctly depending on whether the PHY has configured itself for copper at whatever speed or SFP+. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
On Thu, Aug 24, 2017 at 06:57:43PM +0200, Andrew Lunn wrote: > > I see what could be the issue but I do not understand one aspect though: > > how could we switch from one PHY to another, as there's only one output > > between the SoC (and so a given GoP#) and the board. So if a given PHY > > can handle multiple modes I see, but in the other case a muxing > > somewhere would be needed? Or did I miss something? > > I think we need a hardware diagram... > > How are the RJ45, copper PHY, SFP module connected to the SoC? > > Somewhere there must be a mux, to select between copper and > fibre. Where is that mux? In the 88x3310 PHY: .--- RJ45 MVPP2 - 88x3310 PHY `--- SFP+ Here's the commentry I've provided at the very top of the 88x3310 driver which describes all these modes: * There appears to be several different data paths through the PHY which * are automatically managed by the PHY. The following has been determined * via observation and experimentation: * * SGMII PHYXS -- BASE-T PCS -- 10G PMA -- AN -- Copper (for <= 1G) * 10GBASE-KR PHYXS -- BASE-T PCS -- 10G PMA -- AN -- Copper (for 10G) * 10GBASE-KR PHYXS -- BASE-R PCS -- Fiber * * If both the fiber and copper ports are connected, the first to gain * link takes priority and the other port is completely locked out. It's not a copper-only PHY, it's just like most other PHYs out there that support multiple connections, like the 88e151x series that support both RJ45 and fibre and can auto-switch between them. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
On Thu, Aug 24, 2017 at 07:45:19PM +0200, Andrew Lunn wrote: > > The 88x3310 driver in the kernel knows about these combinations and > > sets the phy interface parameter correctly depending on whether the > > PHY has configured itself for copper at whatever speed or SFP+. > > So when the PHY decides to swap from copper to fibre etc, is the > phylib state machine kept up to date. Does it see a down, followed by > an up? I'd have to re-check to make sure, but I believe it does, because the negotiation is held off on the "other" media until the currently active link has gone down. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [EXT] Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
On Thu, Aug 24, 2017 at 07:14:18PM +0200, Antoine Tenart wrote: > On Thu, Aug 24, 2017 at 05:08:29PM +, Stefan Chulski wrote: > > > > Imagine phylib is using the copper Ethernet PHY, but the MAC is using > > > > the SFP port. Somebody pulls out the copper cable, phylib says the > > > > link is down, turns the carrier off and calls the callback. Not good, > > > > since your SFP cable is still plugged in... Ethtool is > > > > returning/setting stuff in the Copper Ethernet PHY, when in fact you > > > > intend to be setting SFP settings. > > > > > > I see what could be the issue but I do not understand one aspect though: > > > how could we switch from one PHY to another, as there's only one output > > > between the SoC (and so a given GoP#) and the board. So if a given PHY can > > > handle multiple modes I see, but in the other case a muxing somewhere > > > would > > > be needed? Or did I miss something? > > > > I think PHY name and PHY mode struct that describe here both MAC to > > PHY and PHY to PHY connection create confusion... Serdes IP lane > > doesn't care if connector is SFP, RJ45 or direct attached cable. > > mvpp22_comphy_init only configures MAC to PHY > > connection. SFI for 10G(KR in mainline), SGMII for 1G and HS_SGMII for > > 2.5G. > > So maybe one confusion was to name them PHY_MODE_10GKR and > PHY_MODE_SGMII. It could be PHY_MODE_10G and PHY_MODE_1G instead. SGMII mode supports 100M and 10M as well using data repetition, so 1G makes it look like those speeds are not supported. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next v2 05/14] net: mvpp2: do not force the link mode
On Fri, Aug 25, 2017 at 04:48:12PM +0200, Antoine Tenart wrote: > The link mode (speed, duplex) was forced based on what the phylib > returns. This should not be the case, and only forced by ethtool > functions manually. This patch removes the link mode enforcement from > the phylib link_event callback. So how does RGMII work (which has no in-band signalling between the PHY and MAC)? phylib expects the network driver to configure it according to the PHY state at link_event time - I think you need to explain more why you think that this is not necessary. > > Signed-off-by: Antoine Tenart > --- > drivers/net/ethernet/marvell/mvpp2.c | 24 > 1 file changed, 24 deletions(-) > > diff --git a/drivers/net/ethernet/marvell/mvpp2.c > b/drivers/net/ethernet/marvell/mvpp2.c > index fab231858a41..498a4969dc58 100644 > --- a/drivers/net/ethernet/marvell/mvpp2.c > +++ b/drivers/net/ethernet/marvell/mvpp2.c > @@ -5741,30 +5741,10 @@ static void mvpp2_link_event(struct net_device *dev) > struct mvpp2_port *port = netdev_priv(dev); > struct phy_device *phydev = dev->phydev; > int status_change = 0; > - u32 val; > > if (phydev->link) { > if ((port->speed != phydev->speed) || > (port->duplex != phydev->duplex)) { > - u32 val; > - > - val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG); > - val &= ~(MVPP2_GMAC_CONFIG_MII_SPEED | > - MVPP2_GMAC_CONFIG_GMII_SPEED | > - MVPP2_GMAC_CONFIG_FULL_DUPLEX | > - MVPP2_GMAC_AN_SPEED_EN | > - MVPP2_GMAC_AN_DUPLEX_EN); > - > - if (phydev->duplex) > - val |= MVPP2_GMAC_CONFIG_FULL_DUPLEX; > - > - if (phydev->speed == SPEED_1000) > - val |= MVPP2_GMAC_CONFIG_GMII_SPEED; > - else if (phydev->speed == SPEED_100) > - val |= MVPP2_GMAC_CONFIG_MII_SPEED; > - > - writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG); > - > port->duplex = phydev->duplex; > port->speed = phydev->speed; > } > @@ -5782,10 +5762,6 @@ static void mvpp2_link_event(struct net_device *dev) > > if (status_change) { > if (phydev->link) { > - val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG); > - val |= (MVPP2_GMAC_FORCE_LINK_PASS | > - MVPP2_GMAC_FORCE_LINK_DOWN); > - writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG); > mvpp2_egress_enable(port); > mvpp2_ingress_enable(port); > } else { > -- > 2.13.5 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next v2 09/14] net: mvpp2: dynamic reconfiguration of the PHY mode
On Fri, Aug 25, 2017 at 04:48:16PM +0200, Antoine Tenart wrote: > This patch adds logic to reconfigure the comphy/gop when the link status > change at runtime. This is very useful on boards such as the mcbin which > have SFP and Ethernet ports connected to the same MAC port: depending on > what the user connects the driver will automatically reconfigure the > link mode. This commit commentry needs updating - as I've already pointed out in the previous round, the need to reconfigure things has *nothing* to do with there being SFP and "Ethernet" ports present. Hence, your commit message is entirely misleading. > > Signed-off-by: Antoine Tenart > --- > drivers/net/ethernet/marvell/mvpp2.c | 21 - > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/marvell/mvpp2.c > b/drivers/net/ethernet/marvell/mvpp2.c > index 49a6789a4142..04e0c8ab7b51 100644 > --- a/drivers/net/ethernet/marvell/mvpp2.c > +++ b/drivers/net/ethernet/marvell/mvpp2.c > @@ -5740,6 +5740,7 @@ static void mvpp2_link_event(struct net_device *dev) > { > struct mvpp2_port *port = netdev_priv(dev); > struct phy_device *phydev = dev->phydev; > + bool link_reconfigured = false; > > if (!netif_running(dev)) > return; > @@ -5750,9 +5751,27 @@ static void mvpp2_link_event(struct net_device *dev) > port->duplex = phydev->duplex; > port->speed = phydev->speed; > } > + > + if (port->phy_interface != phydev->interface && port->comphy) { > + /* disable current port for reconfiguration */ > + mvpp2_interrupts_disable(port); > + netif_carrier_off(port->dev); > + mvpp2_port_disable(port); > + phy_power_off(port->comphy); > + > + /* comphy reconfiguration */ > + port->phy_interface = phydev->interface; > + mvpp22_comphy_init(port); > + > + /* gop/mac reconfiguration */ > + mvpp22_gop_init(port); > + mvpp2_port_mii_set(port); > + > + link_reconfigured = true; > + } > } > > - if (phydev->link != port->link) { > + if (phydev->link != port->link || link_reconfigured) { > port->link = phydev->link; > > if (phydev->link) { > -- > 2.13.5 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next v2 05/14] net: mvpp2: do not force the link mode
On Mon, Aug 28, 2017 at 10:38:37AM +0200, Marcin Wojtas wrote: > Hi Antoine, > > Can you be 100% sure that when using SGMII with PHY's (like Marvell > Alaska 88E1xxx series), is in-band link information always available? > I'd be very cautious with such assumption and use in-band management > only when set in the DT, like mvneta. I think phylib can properly can > do its work when MDIO connection is provided on the board. There is another issue to be aware of: if you're wanting to use flow control autonegotiation, that is not carried across SGMII's in-band signalling. If you want to use SGMII's in-band signalling for the duplex and speed information, you still need phylib's notification to properly set the flow control. Switching mvpp2 to use phylink (which is needed for the 1G SFP slot on mcbin) will handle all this for you - dealing with both in-band and out-of-band negotiation methods, and combining them in the appropriate manner for the selected operation mode. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next v2 05/14] net: mvpp2: do not force the link mode
On Mon, Aug 28, 2017 at 11:40:51AM +0200, Antoine Tenart wrote: > On Mon, Aug 28, 2017 at 09:51:52AM +0100, Russell King - ARM Linux wrote: > > On Mon, Aug 28, 2017 at 10:38:37AM +0200, Marcin Wojtas wrote: > > > > > > Can you be 100% sure that when using SGMII with PHY's (like Marvell > > > Alaska 88E1xxx series), is in-band link information always available? > > > I'd be very cautious with such assumption and use in-band management > > > only when set in the DT, like mvneta. I think phylib can properly can > > > do its work when MDIO connection is provided on the board. > > > > There is another issue to be aware of: if you're wanting to use flow > > control autonegotiation, that is not carried across SGMII's in-band > > signalling. If you want to use SGMII's in-band signalling for the > > duplex and speed information, you still need phylib's notification > > to properly set the flow control. > > > > > > Switching mvpp2 to use phylink (which is needed for the 1G SFP slot on > > mcbin) will handle all this for you - dealing with both in-band and > > out-of-band negotiation methods, and combining them in the appropriate > > manner for the selected operation mode. > > > > So probably the best move here is to remove this patch, and wait for the > phylink support in the PPv2 driver. I've nothing on that specifically for the mvpp2 driver - what I have is for mvneta and the Marvell mvpp2x driver, with GMAC support extracted from mvneta (that last bit is rather dirty at the moment so not published anywhere, and doesn't cater for PP v2.1 at all.) I ought to have posted the mvneta part of the phylink patches, but I didn't get around to it early enough in this cycle - there are probably quite a number of conflicts with net-next now, so I think it's too late to submit it for mainline. I know Andrew has already looked at them in my git tree as part of the review of phylink when that was merged - which should be adequate to give an example of how to implement it for the mainline PP v2 driver. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next] net: mvpp2: phylink support
On Mon, Oct 09, 2017 at 02:55:27PM +0200, Antoine Tenart wrote: > Hi Russell, > > On Mon, Sep 25, 2017 at 11:55:14AM +0200, Antoine Tenart wrote: > > On Fri, Sep 22, 2017 at 12:07:31PM +0100, Russell King - ARM Linux wrote: > > > On Thu, Sep 21, 2017 at 03:45:22PM +0200, Antoine Tenart wrote: > > > > > > +static int mvpp2_phylink_mac_link_state(struct net_device *dev, > > > > + struct phylink_link_state > > > > *state) > > > > +{ > > > > + struct mvpp2_port *port = netdev_priv(dev); > > > > + u32 val; > > > > + > > > > + if (!phy_interface_mode_is_rgmii(port->phy_interface) && > > > > + port->phy_interface != PHY_INTERFACE_MODE_SGMII) > > > > + return 0; > > > > > > You're blocking this for 1000base-X and 10G connections, which is not > > > correct. The expectation is that this function returns the current > > > MAC state irrespective of the interface mode. > > > > I moved what was already supported in the PPv2 driver and did not > > implemented the full set of what is supported. It's not perfect, but it > > does move what was already supported. > > > > Any reason not to first move what's already supported to phylink, and > > then add more supported modes in separate patches? > > Any thoughts on this? You're asking me to comment about something I know little about as I've not used mvpp2.c. I don't know the details of what your "already supported" statement refers to. Maybe you could give some clues - maybe produce a list of what mvpp2 currently supports? Here's the link modes that phylink supports: 1. PHY based links 2. PHYless fixed links with details specified in DT, in the same way as the existing "fixed-link" support works, but without needing to create fake PHYs. 3. PHYless fixed links with GPIO link indication (again, same way as the existing fixed-link support.) 4. Direct fibre connections via fixed-link or SFP. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH v9 00/20] simplify crypto wait for async op
On Sun, Oct 15, 2017 at 10:19:45AM +0100, Gilad Ben-Yossef wrote: > Many users of kernel async. crypto services have a pattern of > starting an async. crypto op and than using a completion > to wait for it to end. > > This patch set simplifies this common use case in two ways: > > First, by separating the return codes of the case where a > request is queued to a backlog due to the provider being > busy (-EBUSY) from the case the request has failed due > to the provider being busy and backlogging is not enabled > (-EAGAIN). > > Next, this change is than built on to create a generic API > to wait for a async. crypto operation to complete. > > The end result is a smaller code base and an API that is > easier to use and more difficult to get wrong. > > The patch set was boot tested on x86_64 and arm64 which > at the very least tests the crypto users via testmgr and > tcrypt but I do note that I do not have access to some > of the HW whose drivers are modified nor do I claim I was > able to test all of the corner cases. > > The patch set is based upon linux-next release tagged > next-20171013. Has there been any performance impact analysis of these changes? I ended up with patches for one of the crypto drivers which converted its interrupt handling to threaded interrupts being reverted because it caused a performance degredation. Moving code to latest APIs to simplify it is not always beneficial. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
Subject says offlist, but this isn't... On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote: > Sorry for the delay on this from my end. I noticed there was some bpf > bits land in the last net fixes pull request landed Monday so I built > a kernel with the JIT reenabled. It seems it's improved in that the > completely dead no output boot has gone but the original problem that > arrived in the merge window still persists: > > [ 17.564142] note: systemd-udevd[194] exited with preempt_count 1 > [ 17.592739] Unable to handle kernel NULL pointer dereference at > virtual address 000c > [ 17.601002] pgd = (ptrval) > [ 17.603819] [000c] *pgd= > [ 17.607487] Internal error: Oops: 805 [#10] SMP ARM > [ 17.612396] Modules linked in: > [ 17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G D > 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1 > [ 17.626056] Hardware name: Generic AM33XX (Flattened Device Tree) > [ 17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc > [ 17.637102] LR is at (null) > [ 17.640086] pc : []lr : [<>]psr: 6013 > [ 17.646384] sp : cfe1dd48 ip : fp : > [ 17.651635] r10: d837e000 r9 : d833be00 r8 : > [ 17.656887] r7 : 0001 r6 : e003d000 r5 : r4 : > [ 17.663447] r3 : 0007 r2 : r1 : r0 : > [ 17.670009] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment > none > [ 17.677180] Control: 10c5387d Table: 8fe20019 DAC: 0051 > [ 17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval)) > [ 17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000) Can you provide a full disassembly of sk_filter_trim_cap from vmlinux (iow, annotated with its linked address) for the above dump please - alternatively a new dump with matching disassembly. Thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Thu, Jul 05, 2018 at 12:41:54AM +0100, Russell King - ARM Linux wrote: > Subject says offlist, but this isn't... > > On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote: > > Sorry for the delay on this from my end. I noticed there was some bpf > > bits land in the last net fixes pull request landed Monday so I built > > a kernel with the JIT reenabled. It seems it's improved in that the > > completely dead no output boot has gone but the original problem that > > arrived in the merge window still persists: > > > > [ 17.564142] note: systemd-udevd[194] exited with preempt_count 1 > > [ 17.592739] Unable to handle kernel NULL pointer dereference at > > virtual address 000c > > [ 17.601002] pgd = (ptrval) > > [ 17.603819] [000c] *pgd= > > [ 17.607487] Internal error: Oops: 805 [#10] SMP ARM > > [ 17.612396] Modules linked in: > > [ 17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G D > > 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1 > > [ 17.626056] Hardware name: Generic AM33XX (Flattened Device Tree) > > [ 17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc > > [ 17.637102] LR is at (null) > > [ 17.640086] pc : []lr : [<>]psr: 6013 > > [ 17.646384] sp : cfe1dd48 ip : fp : > > [ 17.651635] r10: d837e000 r9 : d833be00 r8 : > > [ 17.656887] r7 : 0001 r6 : e003d000 r5 : r4 : > > [ 17.663447] r3 : 0007 r2 : r1 : r0 : > > [ 17.670009] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment > > none > > [ 17.677180] Control: 10c5387d Table: 8fe20019 DAC: 0051 > > [ 17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval)) > > [ 17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000) > > Can you provide a full disassembly of sk_filter_trim_cap from vmlinux > (iow, annotated with its linked address) for the above dump please - > alternatively a new dump with matching disassembly. Thanks. Also probably a good idea to have bpf_jit_enable set to 2 to get a dump of the bpf program being run, which I think for your problem, you'll have to hack the kernel source to do that. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
[PATCH 00/13] ARM BPF jit compiler improvements
Hi, This series improves the ARM BPF JIT compiler by: - enumerating the stack layout rather than using constants that happen to be multiples of four - rejig the BPF "register" accesses to use negative numbers instead of positive, which could be confused with register numbers in the bpf2a32 array. - since we maintain the ARM FP register as a pointer to the top of our scratch space (or, with frame pointers enabled, a valid ARM frame pointer register), we can access our scratch space using FP, which is constant across all BPF programs, including tail-called programs. - use immediate forms of ARM instructions where possible, rather than first loading the immediate into an ARM register. - use load-with-shift instruction rather than seperate shift instruction followed by load - avoid reloading index and array in the tail-call code - use double-word load/store instructions where available arch/arm/net/bpf_jit_32.c | 927 +++--- arch/arm/net/bpf_jit_32.h | 44 +-- 2 files changed, 493 insertions(+), 478 deletions(-) -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [PATCH net-next 13/13] ARM: net: bpf: use double-word load/stores where available
On Tue, Jul 10, 2018 at 10:03:33AM -0700, Olof Johansson wrote: > Hi Russell, > > @@ -663,13 +679,27 @@ static inline void emit_a32_mov_r(const s8 dst, const > > s8 src, > > static inline void emit_a32_mov_r64(const bool is64, const s8 dst[], > > const s8 src[], > > struct jit_ctx *ctx) { > > - emit_a32_mov_r(dst_lo, src_lo, ctx); > > - if (is64) { > > + if (!is64) { > > + emit_a32_mov_r(dst_lo, src_lo, ctx); > > + /* Zero out high 4 bytes */ > > + emit_a32_mov_i(dst_hi, 0, ctx); > > + } else if (__LINUX_ARM_ARCH__ < 6 && > > + ctx->cpu_architecture < CPU_ARCH_ARMv5) { > > /* complete 8 byte move */ > > + emit_a32_mov_r(dst_lo, src_lo, ctx); > > emit_a32_mov_r(dst_hi, src_hi, ctx); > > > Tiny nit: Looks like you compare for >= ARMv5TE above and I'm not aware of any vanilla v5 implementations (all I can find are > v5TE or <=v4T), so it doesn't seem like something actually causing > problems. Mostly pointing it out for consistency's sake. They're rare - I think the only one is an ARM1020 (ARMv5T) as opposed to the ARM1020E (ARMv5TE). Whether any are in the wild or not is another matter. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [PATCH net-next 01/13] ARM: net: bpf: enumerate the JIT scratch stack layout
On Tue, Jul 10, 2018 at 08:30:04PM +0200, Daniel Borkmann wrote: > Hi Russell, > > thanks a lot for your work on the arm32 JIT! > > On 07/10/2018 02:36 PM, Russell King wrote: > > Enumerate the contents of the JIT scratch stack layout used for storing > > some of the JITs 64-bit registers, tail call counter and AX register. > > > > XXX: what about the skb_copy_bits buffer - this appears to overlap with > > the first word of the JITs accessible stack. > > Could you elaborate on that case? Unless I'm missing something there should > be no use of the skb_copy_bits buffer anymore (aka former SKB_BUFFER at > STACK_VAR(SCRATCH_SIZE) offset), but aside from that it's not supposed to > overlap either. Probably an old comment - these were originally developed back in January timeframe when there was the SKB_BUFFER stuff, but that was removed during the 4.18 merge window. I'll kill the comment. Thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
[PATCH 00/14] ARM BPF jit compiler improvements
Hi, This series improves the ARM BPF JIT compiler by: - enumerating the stack layout rather than using constants that happen to be multiples of four - rejig the BPF "register" accesses to use negative numbers instead of positive, which could be confused with register numbers in the bpf2a32 array. - since we maintain the ARM FP register as a pointer to the top of our scratch space (or, with frame pointers enabled, a valid ARM frame pointer register), we can access our scratch space using FP, which is constant across all BPF programs, including tail-called programs. - use immediate forms of ARM instructions where possible, rather than first loading the immediate into an ARM register. - use load-with-shift instruction rather than seperate shift instruction followed by load - avoid reloading index and array in the tail-call code - use double-word load/store instructions where available Version 2: - Fix ARMv5 test pointed out by Olof - Fix build error found by 0-day (adding an additional patch) arch/arm/net/bpf_jit_32.c | 982 -- arch/arm/net/bpf_jit_32.h | 42 +- 2 files changed, 543 insertions(+), 481 deletions(-) -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
[PATCH net-next 0/4] Further ARM BPF jit compiler improvements
Four further jit compiler improves for 32-bit ARM. arch/arm/net/bpf_jit_32.c | 120 -- 1 file changed, 73 insertions(+), 47 deletions(-) -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [PATCH 00/14] ARM BPF jit compiler improvements
On Thu, Jul 12, 2018 at 09:02:41PM +0200, Daniel Borkmann wrote: > Applied to bpf-next, thanks a lot Russell! Thanks, I've just sent four more patches, which is the sum total of what I'm intending to send for BPF improvements for the next merge window. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [PATCH 00/14] ARM BPF jit compiler improvements
On Thu, Jul 12, 2018 at 11:12:45PM +0200, Daniel Borkmann wrote: > On 07/12/2018 11:02 PM, Russell King - ARM Linux wrote: > > On Thu, Jul 12, 2018 at 09:02:41PM +0200, Daniel Borkmann wrote: > >> Applied to bpf-next, thanks a lot Russell! > > > > Thanks, I've just sent four more patches, which is the sum total of > > what I'm intending to send for BPF improvements for the next merge > > window. > > Great, thanks a lot for the batch of improvements, Russell! > > Did you manage to get the BPF kselftest suite working on arm32 under > tools/testing/selftests/bpf/? In particular the test_verfier with > bpf_jit_enabled set to 1 and test_kmod.sh has a bigger number of > runtime tests that would stress it. I have a big issue with almost all of the tools/ subdirectory, and that is that it isn't "portable". It seems that cross-build environments just weren't considered when the tools subdirectory was created - it appears to require the entire kernel tree and build tree to be accessible on the target in order to build almost everything there. (I also exclusively do split-object builds, I never do an in-source-tree build.) At least perf has the ability to ask Kbuild to package it up as a tar.* file. That can be easily transported to the target as a self-contained buildable tree, and then be able to built from that. My cross-build environment for the kernel is just for building kernels, it does not have the facilities to build for userspace - I have a wide range of userspaces across targets, with a multitude of different glibc versions, and even when they're compatible versions, they're built differently. As far as I can see, basically, most tools/ stuff requires too much effort to work around this to be of any use to me. Even if I did unpick it from the kernel source tree by hand, that would be wasted effort, because I'd need to repeat that same process whenever anything there gets updated. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR interface to use in-band negotiation
On Fri, Mar 16, 2018 at 11:33:43AM +0100, Antoine Tenart wrote: > The PHY mode 10GKR can use in-band negotiation. This patches allows this > mode to be used with MLO_AN_INBAND in phylink. > > Signed-off-by: Antoine Tenart > --- > drivers/net/phy/phylink.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > index 51a011a349fe..7224b005f0dd 100644 > --- a/drivers/net/phy/phylink.c > +++ b/drivers/net/phy/phylink.c > @@ -768,7 +768,8 @@ int phylink_of_phy_connect(struct phylink *pl, struct > device_node *dn, > /* Fixed links and 802.3z are handled without needing a PHY */ > if (pl->link_an_mode == MLO_AN_FIXED || > (pl->link_an_mode == MLO_AN_INBAND && > - phy_interface_mode_is_8023z(pl->link_interface))) > + (phy_interface_mode_is_8023z(pl->link_interface) || > + pl->link_interface == PHY_INTERFACE_MODE_10GKR))) There is no inband negotiation like there is with 802.3z or SGMII, so this makes no sense. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next 03/10] net: mvpp2: phylink support
On Fri, Mar 16, 2018 at 11:33:44AM +0100, Antoine Tenart wrote: > +static void mvpp2_phylink_validate(struct net_device *dev, > +unsigned long *supported, > +struct phylink_link_state *state) > +{ > + __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, }; > + > + phylink_set(mask, Autoneg); > + phylink_set_port_modes(mask); > + phylink_set(mask, Pause); > + phylink_set(mask, Asym_Pause); > + > + phylink_set(mask, 10baseT_Half); > + phylink_set(mask, 10baseT_Full); > + phylink_set(mask, 100baseT_Half); > + phylink_set(mask, 100baseT_Full); > + phylink_set(mask, 1000baseT_Full); > + phylink_set(mask, 1000baseX_Full); AFAICS, the driver (before these patches) does not support 1000baseX as it always clears the MVPP2_GMAC_PORT_TYPE_MASK bit, so adding this mode should be part of the patch adding 1000baseX support. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR interface to use in-band negotiation
On Mon, Mar 19, 2018 at 09:52:52AM +0100, Antoine Tenart wrote: > Hi Russell, > > On Fri, Mar 16, 2018 at 03:53:07PM +, Russell King - ARM Linux wrote: > > On Fri, Mar 16, 2018 at 11:33:43AM +0100, Antoine Tenart wrote: > > > The PHY mode 10GKR can use in-band negotiation. This patches allows this > > > mode to be used with MLO_AN_INBAND in phylink. > > > > > > Signed-off-by: Antoine Tenart > > > --- > > > drivers/net/phy/phylink.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > > > index 51a011a349fe..7224b005f0dd 100644 > > > --- a/drivers/net/phy/phylink.c > > > +++ b/drivers/net/phy/phylink.c > > > @@ -768,7 +768,8 @@ int phylink_of_phy_connect(struct phylink *pl, struct > > > device_node *dn, > > > /* Fixed links and 802.3z are handled without needing a PHY */ > > > if (pl->link_an_mode == MLO_AN_FIXED || > > > (pl->link_an_mode == MLO_AN_INBAND && > > > - phy_interface_mode_is_8023z(pl->link_interface))) > > > + (phy_interface_mode_is_8023z(pl->link_interface) || > > > + pl->link_interface == PHY_INTERFACE_MODE_10GKR))) > > > > There is no inband negotiation like there is with 802.3z or SGMII, > > so this makes no sense. > > Oh, that's what I feared. I read some docs but probably will need more > :) > > Anyway, the reason to use in-band negotiation was also to avoid using > fixed-link. It would work but always report the link is up, which for > the user isn't a great experience as we have a way to detect this. > > What would you suggest to achieve this in a reasonable way? The intention of this test in phylink_of_phy_connect() is to avoid failing when there is no requirement for a PHY to be present (such as a fixed link, or an 802.3z link.) However, with 10G PHYs such as the 3310, we need the PHY so we can read the speed from it, and so know whether to downgrade the MAC to SGMII mode, or having downgraded the MAC, upgrade it back to 10G mode when the PHY switches to 10G. I'm guessing that you're wanting this for the DB boards, but I don't see why. Do they not have PHYs? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [EXT] Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR interface to use in-band negotiation
On Mon, Mar 19, 2018 at 01:01:07PM +, Yan Markman wrote: > The DTS-patch for this board (in "old" format) is attached > > > Yan Markman > Tel. 05-44732819 > > > -Original Message- > From: Stefan Chulski > Sent: Monday, March 19, 2018 2:58 PM > To: Russell King - ARM Linux ; Antoine Tenart > > Cc: da...@davemloft.net; kis...@ti.com; gregory.clem...@bootlin.com; > and...@lunn.ch; ja...@lakedaemon.net; sebastian.hesselba...@gmail.com; > netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > thomas.petazz...@bootlin.com; maxime.chevall...@bootlin.com; > miquel.ray...@bootlin.com; Nadav Haklai ; Yan Markman > ; m...@semihalf.com; > linux-arm-ker...@lists.infradead.org > Subject: RE: [EXT] Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR > interface to use in-band negotiation > > > > > There is no inband negotiation like there is with 802.3z or SGMII, > > > > so this makes no sense. > > > > > > Oh, that's what I feared. I read some docs but probably will need > > > more > > > :) > > > > > > Anyway, the reason to use in-band negotiation was also to avoid > > > using fixed-link. It would work but always report the link is up, > > > which for the user isn't a great experience as we have a way to detect > > > this. > > > > > > What would you suggest to achieve this in a reasonable way? > > > > The intention of this test in phylink_of_phy_connect() is to avoid > > failing when there is no requirement for a PHY to be present (such as > > a fixed link, or an 802.3z link.) However, with 10G PHYs such as the > > 3310, we need the PHY so we can read the speed from it, and so know > > whether to downgrade the MAC to SGMII mode, or having downgraded the > > MAC, upgrade it back to 10G mode when the PHY switches to 10G. > > > > I'm guessing that you're wanting this for the DB boards, but I don't see > > why. > > Do they not have PHYs? > > New Solid Run board MACCHIATObin Single Shot doesn't has 3310 PHY either, > like DB boards. > https://www.cnx-software.com/2017/12/20/solidrun-macchiatobin-single-shot-networking-board-launched-for-269-and-up/ Correct, but this DTS is wrong. It connects to a SFP cage, and as SFP cages are supported in mainline now, there's no need to mess around with fixed links or similar. I haven't tested phylink in that configuration yet as SolidRun haven't sent me a SingleShot board yet - and I need any board I do get to have the pull-up resistors on the I2C lines of the correct value, because I'm not risking corruption of the EEPROMs in my SFP* modules. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR interface to use in-band negotiation
On Mon, Mar 19, 2018 at 02:10:09PM +0100, Antoine Tenart wrote: > Hi Andrew, > > On Mon, Mar 19, 2018 at 01:59:53PM +0100, Andrew Lunn wrote: > > > > If they don't have PHYs, how are the connected to the outside world? > > On 7k/8k you have the following scheme for 10G only interfaces: > >MAC -- Comphy -- PHY -- SFP cage -- ... > > Or > >MAC -- Comphy -- SFP cage -- ... > > The comphy provides serdes lanes, and can be configured in various > modes (SGMII, 2500SGMII, 10GKR...). Right - the correct mode is dependent on the SFP module plugged into the cage. Trying to describe this by ignoring the SFP cage isn't going to work out well for end-user functionality, though is fine if you're just hacking a configuration to test (which would not be suitable for mainline kernels!) As I've recently replied to Yan, this is a configuration I haven't tested yet, and it's entirely possible that phylink may need some tweaks for it. What you have is a very similar setup to what is on Clearfog with its SFP cage, where the SFP cage is connected directly to the Armada 388. That only has to deal with 2500base-X / 1000base-X / SGMII and not 10G. What I want is to avoid hacks as much as possible here - if there is a short-coming with SFP/phylink here, we need to address that properly. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up