from:"linux"

arp-scan triggers via-velocity "eth0: excessive work at interrupt"

2007-06-12 Thread linux

It kind of surprised me that sending 254 arp packets by using the arp-scan
tool (http://www.nta-monitor.com/tools/arp-scan/) on a /24 consistently
triggers a burst of "eth0: excessive work at interrupt."

This is a 600 MHz PIII, 2.6.22-rc4, via-velocity driver.

model name  : Pentium III (Katmai)
stepping: 3
cpu MHz : 601.406
cache size  : 512 KB

00:09.0 Ethernet controller [0200]: VIA Technologies, Inc. VT6120/VT6121/VT6122 
Gigabit Ethernet Adapter [1106:3119] (rev 11)

Just double-checking... the program actually sent 463 packets (256 +
a retry to all those that didn't respond to the first one), and triggers
11 copies of the kernel message.

Command line: arp-scan -I eth0 -l [-v]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IC Plus Corp IC Plus IP1000

2007-06-12 Thread linux

[EMAIL PROTECTED] wrote:
> I wonder if it at some time will be included in the standard Linux kernel?
> I am of course interested because my main board has it built in, so I 
> would be willing to test it.

"Me, too!"

This has been discussed sporadically for the last year, and I can confirm
that the driver source from the manufacturer's web page is starting
to suffer bit rot, but after patching the more egregious breakage
(references to , UTS_RELEASE and pci_module_init()
stop it from compiling), it works.

It doesn't even spew "eth0: excessive work at interrupt" when running
arp-scan, unlike certain in-tree drivers. :-)

I got a bit of a rude shock today after doing an emergency replacement
on a socket 939 motherboard and blandly assuring a Windows-experienced
co-worker that despite a change from nForce to VIA KT890 chipset, the
system should "just work".

One round of floppy shuffle and code-fixing later, my co-worker is
not impressed by the Linux version of "Have driver disk".  :-)

Is anyone able to push it to completion?  I have a vague idea that the
vendor lost interest.  (Should I write to Greg K-H and tell him
"Free Linux Driver Developed!"?)

I can play testing guinea-pig if needed.

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IC Plus Corp IC Plus IP1000

2007-06-13 Thread linux

> Use the 'sundance' driver that's been in the kernel for quite a while.

Er... that driver specifically does not list the IP1000's PCI device ID
(13f0:1023), nor does it support anything over 100 Mbit/s.

Are you *quite* sure that adding 13f0:1023 to the sundance_pci_tbl is
all that's required?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IC Plus Corp IC Plus IP1000

2007-06-13 Thread linux

The following hacks to bring it up to date got the vendor-supplied
driver working for me.  This is just fixing the things the compiler
complained about; there may be other issues, but they don't seem to
interfere with basic funtionality.


diff --git a/Makefile b/Makefile
index c91b384..31e4172 100644
--- a/Makefile
+++ b/Makefile
@@ -77,10 +77,10 @@ ifeq ($(kernelFlag26),kernel26x)
 EXTRA_CFLAGS+=$(MAPPING_MODE)
 
 all:
-   $(MAKE) -C $(KernelBuildDir) SUBDIRS=$(PWD) modules 
+   $(MAKE) -C $(KernelBuildDir) M=$(PWD)

 install:
-   install -m 644 -c ipg.$(kernelExtension) $(kernelMisc)
+   $(MAKE) -C $(KernelBuildDir) M=$(PWD) modules_install
 
 ipg-objs:=$(OBJS)
 obj-m+=$(TARGET)
diff --git a/ipg.h b/ipg.h
index 2d184d4..cefe5c8 100644
--- a/ipg.h
+++ b/ipg.h
@@ -98,8 +98,8 @@
  */
 
 
-#include 
 #include 
+#include 
 #include 
 
 #if ((LINUX_VERSION_CODE < KERNEL_VERSION(2,3,0)) &&  defined(MODVERSIONS))
diff --git a/ipg_main.c b/ipg_main.c
index c39ff4a..3a0dfd4 100644
--- a/ipg_main.c
+++ b/ipg_main.c
@@ -172,9 +172,11 @@ int ipg_io_config(IPG_DEVICE_TYPE 
*ipg_ethernet_device);
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
 voidipg_interrupt_handler(int ipg_irq, void *device_instance,
 struct pt_regs *regs);
-#else
+#elif LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
 static  irqreturn_t  ipg_interrupt_handler(int ipg_irq, void *device_instance,
 struct pt_regs *regs);
+#else
+static  irqreturn_t  ipg_interrupt_handler(int ipg_irq, void *device_instance);
 #endif
 
 voidipg_nic_txcleanup(IPG_DEVICE_TYPE *ipg_ethernet_device);
@@ -1425,9 +1427,11 @@ int  ipg_io_config(IPG_DEVICE_TYPE 
*ipg_ethernet_device)
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
 void ipg_interrupt_handler(int ipg_irq, void *device_instance,
 struct pt_regs *regs)
-#else
+#elif LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)
 static  irqreturn_t  ipg_interrupt_handler(int ipg_irq, void *device_instance,
 struct pt_regs *regs)
+#else
+static  irqreturn_t  ipg_interrupt_handler(int ipg_irq, void *device_instance)
 #endif
 {
int error;
@@ -1957,7 +1961,7 @@ int   ipg_nic_open(IPG_DEVICE_TYPE 
*ipg_ethernet_device)
 */
if ((error = request_irq(sp->ipg_pci_device->irq,
 &ipg_interrupt_handler,
-SA_SHIRQ,
+IRQF_SHARED,
 ipg_ethernet_device->name,
 ipg_ethernet_device)) < 0)
{
@@ -4041,7 +4045,10 @@ int  init_module(void)
 #endif
 
IPG_DEBUG_MSG("init_module\n");
-#if LINUX_VERSION_CODE > KERNEL_VERSION(2,5,0) 
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,12)   
+   return pci_register_driver(&ipg_pci_driver);
+#elif LINUX_VERSION_CODE > KERNEL_VERSION(2,5,0)   
return pci_module_init(&ipg_pci_driver);
 #else

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ipg: add IP1000A driver to kernel tree

2007-09-26 Thread linux

(Resend to netdev; already sent to relevant individuals.)

Here's a possible fix for the p[] array issues akpm noticed.
This replaces them with calls to a new mdio_write_bits function.

Boot-tested, passes net traffic, and mii-tool and mii-diag produce sensible
output (including noticing link status changes).

Also, regarding
>> +for (i = 0; i < IPG_TFDLIST_LENGTH; i++) {
>> +offset = (u32) &sp->txd[i].next_desc - (u32) sp->txd;
>> +printk(KERN_INFO "%2.2x %4.4x TFDNextPtr = %16.16lx\n", i,
>> +   offset, (unsigned long) sp->txd[i].next_desc);
>> +
>> +offset = (u32) &sp->txd[i].tfc - (u32) sp->txd;
>
> Is the u32 cast safe here on all architectures?

IPG_TFDLIST_LENGTH is 256, and sp->txd is an array of struct ipg_tx,
which are 24 bytes each, so the most it can be is 6K.  The result fits
into 32 bits, so the inputs can be safely truncated.

A more awkward way to write it would be
offset = i * sizeof(struct ipg_tx) + offsetof(struct ipg_tx, tfc);


This patch is placed in the public domain; copyright abandoned.

(The final hunk is a space-TAB whitespace repair that git complained about
when I imported the patch.)

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index 87a674c..6267a34 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -180,12 +180,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 
phyctrlpolarity)
 }
 
 /*
+ * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1)
+ * of the PhyCtrl register. 1 <= len <= 32.  "ioaddr" is the register
+ * address, and "otherbits" are the values of the other bits.
+ */
+static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, 
unsigned len)
+{
+   otherbits |= IPG_PC_MGMTDIR;
+   do {
+   /* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
+   u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA;
+   /* + rather than | lets compiler microoptimize better */
+   ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
+   } while (len);
+}
+
+/*
  * Read a register from the Physical Layer device located
  * on the IPG NIC, using the IPG PHYCTRL register.
  */
 static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 {
void __iomem *ioaddr = ipg_ioaddr(dev);
+   u8 const polarity = ipg_r8(PHY_CTRL) &
+   (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
+   unsigned i, data = 0;
/*
 * The GMII mangement frame structure for a read is as follows:
 *
@@ -199,75 +218,30 @@ static int mdio_read(struct net_device * dev, int phy_id, 
int phy_reg)
 * D = bit of read data (MSB first)
 *
 * Transmission order is 'Preamble' field first, bits transmitted
-* left to right (first to last).
+* left to right (msbit-first).
 */
-   struct {
-   u32 field;
-   unsigned int len;
-   } p[] = {
-   { GMII_PREAMBLE,32 },   /* Preamble */
-   { GMII_ST,  2  },   /* ST */
-   { GMII_READ,2  },   /* OP */
-   { phy_id,   5  },   /* PHYAD */
-   { phy_reg,  5  },   /* REGAD */
-   { 0x,   2  },   /* TA */
-   { 0x,   16 },   /* DATA */
-   { 0x,   1  }/* IDLE */
-   };
-   unsigned int i, j;
-   u8 polarity, data;
-
-   polarity  = ipg_r8(PHY_CTRL);
-   polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
-
-   /* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
-   for (j = 0; j < 5; j++) {
-   for (i = 0; i < p[j].len; i++) {
-   /* For each variable length field, the MSB must be
-* transmitted first. Rotate through the field bits,
-* starting with the MSB, and move each bit into the
-* the 1st (2^1) bit position (this is the bit position
-* corresponding to the MgmtData bit of the PhyCtrl
-* register for the IPG).
-*
-* Example: ST = 01;
-*
-*  First write a '0' to bit 1 of the PhyCtrl
-*  register, then write a '1' to bit 1 of the
-*  PhyCtrl register.
-*
-* To do this, right shift the MSB of ST by the value:
-* [field length - 1 - #ST bits already written]
-* then left shift this result by 1.
-*/
-   data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
-   data &= IPG_PC_MGMTDATA;
-   data |= polarity | IPG_PC_MGMTDIR;
-
-

2.6.23-rc8 network problem. Mem leak? ip1000a?

2007-09-27 Thread linux

Uniprocessor Althlon 64, 64-bit kernel, 2G ECC RAM,
2.6.23-rc8 + linuxpps (5.0.0) + ip1000a driver.
(patch from http://marc.info/?l=linux-netdev&m=118980588419882)

After a few hours of operation, ntp loses the ability to send packets.
sendto() returns -EAGAIN to everything, including the 24-byte UDP packet
that is a response to ntpq.

-EAGAIN on a sendto() makes me think of memory problems, so here's
meminfo at the time:

### FAILED state ###
# cat /proc/meminfo 
MemTotal:  2059384 kB
MemFree: 15332 kB
Buffers:665608 kB
Cached:  18212 kB
SwapCached:  0 kB
Active: 380384 kB
Inactive:   355020 kB
SwapTotal: 5855208 kB
SwapFree:  5854552 kB
Dirty:   28504 kB
Writeback:   0 kB
AnonPages:   51608 kB
Mapped:  11852 kB
Slab:  1285348 kB
SReclaimable:   152968 kB
SUnreclaim:1132380 kB
PageTables:   3888 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:   6884900 kB
Committed_AS:   590528 kB
VmallocTotal: 34359738367 kB
VmallocUsed:265628 kB
VmallocChunk: 34359472059 kB


Killing and restarting ntpd gets it running again for a few hours.
Here's after about two hours of successful operation.  (I'll try to
remember to run slabinfo before killing ntpd next time.)

### WORKING state ###
# cat /proc/meminfo
MemTotal:  2059384 kB
MemFree: 20252 kB
Buffers:242688 kB
Cached:  41556 kB
SwapCached:200 kB
Active: 285012 kB
Inactive:   147348 kB
SwapTotal: 5855208 kB
SwapFree:  5854212 kB
Dirty:  36 kB
Writeback:   0 kB
AnonPages:  148052 kB
Mapped:  12756 kB
Slab:  1582512 kB
SReclaimable:   134348 kB
SUnreclaim:1448164 kB
PageTables:   4500 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:   6884900 kB
Committed_AS:   689956 kB
VmallocTotal: 34359738367 kB
VmallocUsed:265628 kB
VmallocChunk: 34359472059 kB
# /usr/src/linux/Documentation/vm/slabinfo
Name   Objects ObjsizeSpace Slabs/Part/Cpu  O/S O %Fr %Ef 
Flg
:016  1478  1624.5K  6/3/1  256 0  50  96 *
:024   170  24 4.0K  1/0/1  170 0   0  99 *
:032  1339  3245.0K 11/2/1  128 0  18  95 *
:040   102  40 4.0K  1/0/1  102 0   0  99 *
:064  5937  64   413.6K   101/15/1   64 0  14  91 *
:07256  72 4.0K  1/0/1   56 0   0  98 *
:088  6946  88   618.4K151/0/1   46 0   0  98 *
:096 23851  96 2.5M  616/144/1   42 0  23  90 *
:128   730 128   114.6K 28/6/1   32 0  21  81 *
:136   232 13636.8K  9/6/1   30 0  66  85 *
:192   474 19298.3K 24/4/1   21 0  16  92 *
:256   1385376 256   354.6M  86587/0/1   16 0   0  99 *
:32012 304 4.0K  1/0/1   12 0   0  89 *A
:384   359 384   180.2K44/23/1   10 0  52  76 *A
:512   1384316 512   708.7M 173040/1/18 0   0  99 *
:64072 61653.2K 13/5/16 0  38  83 *A
:704  1870 696 1.3M170/0/1   11 1   0  93 *A
:0001024   4271024   454.6K111/9/14 0   8  96 *
:0001472   1501472   245.7K 30/0/15 1   0  89 *
:00020481589912048   325.7M 39759/25/14 1   0  99 *
:0004096514096   245.7K 30/9/12 1  30  85 *
Acpi-State  51  80 4.0K  1/0/1   51 0   0  99 
anon_vma  1032  1628.6K  7/5/1  170 0  71  57 
bdev_cache  43 72036.8K  9/1/15 0  11  83 Aa
blkdev_requests 42 28812.2K  3/0/1   14 0   0  98 
buffer_head  59173 10411.1M2734/1690/1   39 0  61  54 a
cfq_io_context 223 15240.9K 10/6/1   26 0  60  82 
dentry   98641 19219.7M 4813/274/1   21 0   5  96 a
ext3_inode_cache115690 68886.3M 10545/77/1   11 1   0  92 a
file_lock_cache 23 168 4.0K  1/0/1   23 0   0  94 
idr_layer_cache118 52869.6K 17/1/17 0   5  89 
inode_cache   1365 528   798.7K195/0/17 0   0  90 a
kmalloc-131072   1  131072   131.0K  1/0/11 5   0 100 
kmalloc-163848   16384   131.0K  8/0/11 2   0 100 
kmalloc-327681   3276832.7K  1/0/11 3   0 100 
kmalloc-8 1535   812.2K  3/1/1  512 0  33  99 
kmalloc-819

Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?

2007-09-30 Thread linux

> ntpd.  Sounds like pps leaking to me.

That's what I'd think, except that pps does no allocation in the normal
running state, so there's nothing to leak.  The interrupt path just
records the time in some preallocated, static buffers and wakes up
blocked readers.  The read path copies the latest data out of those
static buffers.  There's allocation when the PPS device is created,
and more when it's opened.

>> Can anyone offer some diagnosis advice?

> CONFIG_DEBUG_SLAB_LEAK?

Ah, thanks you; I've been using SLUB which doesn't support this option.
Here's what I've extracted.  I've only presented the top few
slab_allocators and a small subset of the oom-killer messages, but I
have full copies if desired.  Unfortunately, I've discovered that the
machine doesn't live in this unhappy state forever.  Indeed, I'm not
sure if killing ntpd "fixes" anything; my previous observations
may have been optimistic ignorance.

(For my own personal reference looking for more oom-kill, I nuked ntpd
at 06:46:56.  And the oom-kills are continuing, with the latest at
07:43:52.)

Anyway, I have a bunch of information from the slab_allocators file, but
I'm not quire sure how to make sense of it.


With a machine in the unhappy state and firing the OOM killer, the top
20 slab_allocators are:
$ sort -rnk2 /proc/slab_allocators | head -20
skbuff_head_cache: 1712746 __alloc_skb+0x31/0x121
size-512: 1706572 tcp_send_ack+0x23/0x102
skbuff_fclone_cache: 149113 __alloc_skb+0x31/0x121
size-2048: 148500 tcp_sendmsg+0x1b5/0xae1
sysfs_dir_cache: 5289 sysfs_new_dirent+0x4b/0xec
size-512: 2613 sock_alloc_send_skb+0x93/0x1dd
Acpi-Operand: 2014 acpi_ut_allocate_object_desc_dbg+0x34/0x6e
size-32: 1995 sysfs_new_dirent+0x29/0xec
vm_area_struct: 1679 mmap_region+0x18f/0x421
size-512: 1618 tcp_xmit_probe_skb+0x1f/0xcd
size-512: 1571 arp_create+0x4e/0x1cd
vm_area_struct: 1544 copy_process+0x9f1/0x1108
anon_vma: 1448 anon_vma_prepare+0x29/0x74
filp: 1201 get_empty_filp+0x44/0xcd
UDP: 1173 sk_alloc+0x25/0xaf
size-128: 1048 r1bio_pool_alloc+0x23/0x3b
size-128: 1024 nfsd_cache_init+0x2d/0xcf
Acpi-Namespace: 973 acpi_ns_create_node+0x2c/0x45
vm_area_struct: 717 split_vma+0x33/0xe5
dentry: 594 d_alloc+0x24/0x177

I'm not sure quite what "normal" numbers are, but I do wonder why there
are 1.7 million TCP acks buffered in the system.  Shouldn't they be
transmitted and deallocated pretty quickly?

This machine receives more data than it sends, so I'd expect acks to
outnumber "real" packets.  Could the ip1000a driver's transmit path be
leaking skbs somehow?  that would also explain the "flailing" of the
oom-killer; it can't associate the allocations with a process.

Here's /proc/meminfo:
MemTotal:  1035756 kB
MemFree: 43508 kB
Buffers: 72920 kB
Cached: 224056 kB
SwapCached: 344916 kB
Active: 664976 kB
Inactive:   267656 kB
SwapTotal: 4950368 kB
SwapFree:  3729384 kB
Dirty:6460 kB
Writeback:   0 kB
AnonPages:  491708 kB
Mapped:  79232 kB
Slab:41324 kB
SReclaimable:25008 kB
SUnreclaim:  16316 kB
PageTables:   8132 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:   5468244 kB
Committed_AS:  1946008 kB
VmallocTotal:   253900 kB
VmallocUsed:  2672 kB
VmallocChunk:   251228 kB

I have a lot of oom-killer messages, that I have saved but am not
posting for size reasons, but here are some example backtraces.
They're not very helpful to me; do they enlighten anyone else?

02:50:20: apcupsd invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
02:50:22: 
02:50:22: Call Trace:
02:50:22:  [] out_of_memory+0x71/0x1ba
02:50:22:  [] __alloc_pages+0x255/0x2d7
02:50:22:  [] cache_alloc_refill+0x2f4/0x60a
02:50:22:  [] hiddev_ioctl+0x579/0x919
02:50:22:  [] kmem_cache_alloc+0x57/0x95
02:50:22:  [] hiddev_ioctl+0x579/0x919
02:50:22:  [] cp_new_stat+0xe5/0xfd
02:50:22:  [] hiddev_read+0x199/0x1f6
02:50:22:  [] default_wake_function+0x0/0xe
02:50:22:  [] do_ioctl+0x45/0x50
02:50:22:  [] vfs_ioctl+0x1f9/0x20b
02:50:22:  [] sys_ioctl+0x3c/0x5d
02:50:22:  [] system_call+0x7e/0x83

02:52:18: postgres invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
02:52:18: 
02:52:18: Call Trace:
02:52:18:  [] out_of_memory+0x71/0x1ba
02:52:18:  [] __alloc_pages+0x255/0x2d7
02:52:18:  [] poison_obj+0x26/0x2f
02:52:18:  [] __get_free_pages+0x40/0x79
02:52:18:  [] copy_process+0xb0/0x1108
02:52:18:  [] alloc_pid+0x1f/0x27d
02:52:18:  [] do_fork+0xb1/0x1a7
02:52:18:  [] copy_user_generic_string+0x17/0x40
02:52:18:  [] system_call+0x7e/0x83
02:52:18:  [] ptregscall_common+0x67/0xb0

02:52:18: kthreadd invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
02:52:18: 
02:52:18: Call Trace:
02:52:18:  [] out_of_memory+0x71/0x1ba
02:52:18:  [] __alloc_pages+0x255/0x2d7
02:52:18:  [] __get_free_pages+0x40/0x79
02:52:18:  [] copy_process+0xb0/0x1108
02:52:18:  [] alloc_pid+0x1f/0x27d
02:52:18:  [] do_fork+0xb1/0x1a7
02:52:18:  [] update_curr+0xe6/0x10b
02:52:18:  [] de

Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?

2007-09-30 Thread linux

> OK.  Did you try to reproduce it without the pps patch applied?

No.  But I've yanked the ip1000a driver (using old crufy vendor-supplied
out-of-kernel module) and the problems are GONE.

>> This machine receives more data than it sends, so I'd expect acks to
>> outnumber "real" packets.  Could the ip1000a driver's transmit path be
>> leaking skbs somehow?

> Absolutely.  Normally a driver's transmit completion interrupt handler will
> run dev_kfree_skb_irq() against the skbs which have been fully sent.
>
> However it'd be darned odd if the driver was leaking only tcp acks.

It's leaking lots of things... you can see ARP packets in there and
all sorts of stuff.  But the big traffic hog is BackupPC doing inbound
rsyncs all night long, which generates a lot of acks.  Those are the
packets it sends, so those are the packets that get leaked.

> I can find no occurrence of "dev_kfree_skb" in drivers/net/ipg.c, which is
> suspicious.

Look for "IPG_DEV_KFREE_SKB", which is a wrapper macro.  (Or just add
"-i" to your grep.)  It should probably be deleted (it just expands to
dev_kfree_skb), but was presumably useful to someone during development.

> Where did you get your ipg.c from, btw?  davem's tree?  rc8-mm1? rc8-mm2??

As I wrote originally, I got it from
http://marc.info/?l=linux-netdev&m=118980588419882
which was a reuqest for mainline submission.

If there are other patches floating around, I'm happy to try them.
Now that I know what to look for, it's easy to spot the leak before OOM.

> I assume that meminfo was not captured when the system was ooming?  There
> isn't much slab there.

Oops, sorry.  I captured slabinfo but not meminfo.


Thank you very much!  Sorry to jump the gun and post a lot before I had
all the data, but if it WAS a problem in -rc8, I wanted to mention it
before -final.

Now, the rush is to get the ip1000a driver fixed before the merge
window opens.  I've added all the ip1000a developers to the Cc: list in
an attempt to speed that up.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

The mess that is SIGPIPE

2007-08-19 Thread linux

I noticed that FreeBSD has a useful SOL_SOCKET option, SO_NOSIGPIPE, which
is a "sticky" version of MSG_NOSIGNAL.  Particularly useful for libraries,
it disables SIGPIPE on a particular socket without disabling it globally.

Anyway, this led me to look at the implementation of SIGPIPE and
MSG_NOSIGNAL and...  it's a bit of a mess.  Some places honor
MSG_NOSIGNAL, but there are a lot of code paths that don't appear to.

So I started thinking about a cleanup...

Currently, SIGPIPE is sent from dozens of places that return EPIPE.
What if they could all be deleted and just a few system calls: write(),
writev(), send(), sendto() and sendmsg() (oh, yes, and sendfile())
could check for EPIPE from the VFS calls they make and generate SIGPIPE
appropriately?

The only thing is figuring out where in the call chain to put it.

sys_send()
-> sys_sendto()
   -> sock_sendmsg()
  -> __sock_sendmsg()
 -> sock->ops->sendmsg

sys_write()
-> vfs_write()
   -> do_sync_write
  -> filp->f_op->aio_write
 -> sock_aio_write()
-> do_sock_write()
   -> __sock_(sendmsg()

sys_writev()
-> vfs_writev()
   -> do_readv_writev()
  -> do_loop_readv_writev()
 -> file->f_op->write
-> 
   -> sock_aio_write()
  -> do_sock_write()
 -> __sock_(sendmsg()

kernel_sendmsg() also calls sock_sendmsg(), and it would save a bunch
of fiddling with MSG_NOSIGNAL if kernel_sendmsg() never generated signals.

That implies that the check should be at the sys_sendto() layer or higher.


Anyway, looking into implementing this, I found a zillion places where
the logic looked a little unclear, such as OCFS2 code.  I'm not convinced
that a bug there can't generate SIGPIPE unexpectedly.


Anyway, before I tackle this rewrite, I'd like to ask if someone knows
what the code is *supposed* to be doing, and can confirm that SIGPIPE
should be generated if and only if the write is done by a user-level
system call that can return EPIPE.  So all the buried network file
systems should never generate it.  Is that right?

Thanks for the insights.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc6 oops in net_tx_action

2008-01-06 Thread linux

> [EMAIL PROTECTED] <[EMAIL PROTECTED]> :
>> Kernel is 2.6.24-rc6 + linuxpps patches, which are all to the serial
>> port driver.
>> 
>> 2.6.23 was known stable.  I haven't tested earlier 2.6.24 releases.
>> I think it happened once before; I got a black-screen lockup with
>> keyboard LEDs blinking, but that was with X running so I couldn't see a
>> console oops.  But given that I installed 2.6.24-rc6 about 24 hours ago,
>> that's a disturbing pattern.

> It is probably this one:
>
> http://marc.info/?t=11978279403&r=1&w=2

Thanks!  I got the patch from
http://marc.info/?l=linux-netdev&m=119756785219214
(Which didn't make it into -rc7; please fix!)
and am recompiling now.

Actually, I grabbed the hardware mitigation followon patch while I was
at it.  I notice that the comment explaining the format of CSR11 and
what 0x80F1 means got lost; perhaps it would be nice to resurrect it?

0x80F1
  8000 = Cycle size (timer control)
  7800 = TX timer in 16 * Cycle size
  0700 = No. pkts before Int. (0 =  interrupt per packet)
  00F0 = Rx timer in Cycle size
  000E = No. pkts before Int.
  0001 = Continues mode (CM)
  
(Boy, that tulip driver could use a whitespace overhaul.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc6 oops in net_tx_action

2008-01-07 Thread linux

>> Thanks!  I got the patch from
>> http://marc.info/?l=linux-netdev&m=119756785219214
>> (Which didn't make it into -rc7; please fix!)
>> and am recompiling now.

> Jeff is busy so he's asked me to pick up the more important
> driver bug fixes that get posted.
>
> I'll push this around, thanks.

Much obliged.  It's only 11 hours of uptime, but no problems so far,
even trying abusive things like "ping -f -l64 -s8000".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] via-velocity big-endian support

2008-01-07 Thread linux

It doesn't look like you need a test report, but here's one anyway...
I grabbed the patch series from git and am running it successfully
right now.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?

2008-01-07 Thread linux

Just to keep the issue open, drivers/net/ipg.c currently in 2.6.24-rc6
still leaks skbuffs like a sieve.  Run it for a few hours with network
traffic and the machine swaps like crazy while the oom killer goes nuts.

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index d9107e5..4fa392c 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -172,6 +172,10 @@ config IP1000
select MII
---help---
  This driver supports IP1000 gigabit Ethernet cards.
+ It works, but suffers from a memory leak.  Signifcant
+ use will consume unswappable kernel memory until the
+ machine runs out of memory and crashes.  Thus, this
+ driver cannot be considered usable at the the present time.
 
  To compile this driver as a module, choose M here: the module
  will be called ipg.  This is recommended.

Or should it be demoted to BROKEN?  It compiles, and sends and receives
packets, which is better than a lot of BROKEN drivers.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak

2008-01-08 Thread linux

Prompted by davem, this attempt at fixing the memory leak
actually appears to work.  At least, leaving ping -f -s1472 -l64
running doesn't drop packets and doesn't show up in /proc/slabinfo.
---
diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index dbd23bb..a0dfba5 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -1110,10 +1110,9 @@ enum {
Frame_WithStart_WithEnd = 11
 };
 
-inline void ipg_nic_rx_free_skb(struct net_device *dev)
+inline void ipg_nic_rx_free_skb(struct net_device *dev, unsigned entry)
 {
struct ipg_nic_private *sp = netdev_priv(dev);
-   unsigned int entry = sp->rx_current % IPG_RFDLIST_LENGTH;
 
if (sp->RxBuff[entry]) {
struct ipg_rx *rxfd = sp->rxd + entry;
@@ -1308,7 +1307,7 @@ static void ipg_nic_rx_with_end(struct net_device *dev,
jumbo->CurrentSize = 0;
jumbo->skb = NULL;
 
-   ipg_nic_rx_free_skb(dev);
+   ipg_nic_rx_free_skb(dev, entry);
} else {
IPG_DEV_KFREE_SKB(jumbo->skb);
jumbo->FoundStart = 0;
@@ -1337,7 +1336,7 @@ static void ipg_nic_rx_no_start_no_end(struct net_device 
*dev,
}
}
dev->last_rx = jiffies;
-   ipg_nic_rx_free_skb(dev);
+   ipg_nic_rx_free_skb(dev, entry);
}
} else {
IPG_DEV_KFREE_SKB(jumbo->skb);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] drivers/net/ipg.c: fix horrible mdio_read and _write

2008-01-08 Thread linux

akpm noticed that this code was awful.
 ipg.c |  158 
+- 1 file 
changed, 43 insertions(+), 115 deletions(-)
should be sufficient justification all by itself.
---
diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index 3860fcd..b3d3fc8 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -202,12 +202,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 
phyctrlpolarity)
 }
 
 /*
+ * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1)
+ * of the PhyCtrl register. 1 <= len <= 32.  "ioaddr" is the register
+ * address, and "otherbits" are the values of the other bits.
+ */
+static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, 
unsigned len)
+{
+   otherbits |= IPG_PC_MGMTDIR;
+   do {
+   /* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
+   u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA;
+   /* + rather than | lets compiler microoptimize better */
+   ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
+   } while (len);
+}
+
+/*
  * Read a register from the Physical Layer device located
  * on the IPG NIC, using the IPG PHYCTRL register.
  */
 static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 {
void __iomem *ioaddr = ipg_ioaddr(dev);
+   u8 const polarity = ipg_r8(PHY_CTRL) &
+   (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
+   unsigned i, data = 0;
/*
 * The GMII mangement frame structure for a read is as follows:
 *
@@ -221,75 +240,30 @@ static int mdio_read(struct net_device * dev, int phy_id, 
int phy_reg)
 * D = bit of read data (MSB first)
 *
 * Transmission order is 'Preamble' field first, bits transmitted
-* left to right (first to last).
+* left to right (msbit-first).
 */
-   struct {
-   u32 field;
-   unsigned int len;
-   } p[] = {
-   { GMII_PREAMBLE,32 },   /* Preamble */
-   { GMII_ST,  2  },   /* ST */
-   { GMII_READ,2  },   /* OP */
-   { phy_id,   5  },   /* PHYAD */
-   { phy_reg,  5  },   /* REGAD */
-   { 0x,   2  },   /* TA */
-   { 0x,   16 },   /* DATA */
-   { 0x,   1  }/* IDLE */
-   };
-   unsigned int i, j;
-   u8 polarity, data;
-
-   polarity  = ipg_r8(PHY_CTRL);
-   polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
-
-   /* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
-   for (j = 0; j < 5; j++) {
-   for (i = 0; i < p[j].len; i++) {
-   /* For each variable length field, the MSB must be
-* transmitted first. Rotate through the field bits,
-* starting with the MSB, and move each bit into the
-* the 1st (2^1) bit position (this is the bit position
-* corresponding to the MgmtData bit of the PhyCtrl
-* register for the IPG).
-*
-* Example: ST = 01;
-*
-*  First write a '0' to bit 1 of the PhyCtrl
-*  register, then write a '1' to bit 1 of the
-*  PhyCtrl register.
-*
-* To do this, right shift the MSB of ST by the value:
-* [field length - 1 - #ST bits already written]
-* then left shift this result by 1.
-*/
-   data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
-   data &= IPG_PC_MGMTDATA;
-   data |= polarity | IPG_PC_MGMTDIR;
-
-   ipg_drive_phy_ctl_low_high(ioaddr, data);
-   }
-   }
-
-   send_three_state(ioaddr, polarity);
-
-   read_phy_bit(ioaddr, polarity);
+   mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
+   mdio_write_bits(ioaddr, polarity, GMII_ST<<12 | GMII_READ << 10 |
+ phy_id << 5 | phy_reg, 14);
 
/*
 * For a read cycle, the bits for the next two fields (TA and
 * DATA) are driven by the PHY (the IPG reads these bits).
 */
-   for (i = 0; i < p[6].len; i++) {
-   p[6].field |=
-   (read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i));
-   }
+   send_three_state(ioaddr, polarity); /* TA first bit */
+   (void)read_phy_bit(ioaddr, polarity);   /* TA second bit */
+
+   for (i = 0; i < 16; i++)
+   data += data + read_phy_bit(ioaddr, polarity);
 
+   /* Trailing idle */

[PATCH 2/3] drivers/net/ipg.c: convert Jumbo.FoundStart to bool

2008-01-08 Thread linux

This is a fairly basic code cleanup that annoyed me while working
on the first patch.
---
diff --git a/drivers/net/ipg.h b/drivers/net/ipg.h
index d5d092c..5d7cc84 100644
--- a/drivers/net/ipg.h
+++ b/drivers/net/ipg.h
@@ -789,11 +789,6 @@ struct ipg_rx {
__le64 frag_info;
 };
 
-struct SJumbo {
-   int FoundStart;
-   int CurrentSize;
-   struct sk_buff *skb;
-};
 /* Structure of IPG NIC specific data. */
 struct ipg_nic_private {
void __iomem *ioaddr;
@@ -809,7 +804,11 @@ struct ipg_nic_private {
unsigned int rx_dirty;
 // Add by Grace 2005/05/19
 #ifdef JUMBO_FRAME
-   struct SJumbo Jumbo;
+   struct SJumbo {
+   bool FoundStart;
+   int CurrentSize;
+   struct sk_buff *skb;
+   } Jumbo;
 #endif
unsigned int rx_buf_sz;
struct pci_dev *pdev;
diff --git a/drivers/net/ipg.h b/drivers/net/ipg.h
index d5d092c..5d7cc84 100644
--- a/drivers/net/ipg.h
+++ b/drivers/net/ipg.h
@@ -789,11 +789,6 @@ struct ipg_rx {
__le64 frag_info;
 };
 
-struct SJumbo {
-   int FoundStart;
-   int CurrentSize;
-   struct sk_buff *skb;
-};
 /* Structure of IPG NIC specific data. */
 struct ipg_nic_private {
void __iomem *ioaddr;
@@ -809,7 +804,11 @@ struct ipg_nic_private {
unsigned int rx_dirty;
 // Add by Grace 2005/05/19
 #ifdef JUMBO_FRAME
-   struct SJumbo Jumbo;
+   struct SJumbo {
+   bool FoundStart;
+   int CurrentSize;
+   struct sk_buff *skb;
+   } Jumbo;
 #endif
unsigned int rx_buf_sz;
struct pci_dev *pdev;
diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index a0dfba5..3860fcd 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -1206,7 +1206,7 @@ static void ipg_nic_rx_with_start_and_end(struct 
net_device *dev,
 
if (jumbo->FoundStart) {
IPG_DEV_KFREE_SKB(jumbo->skb);
-   jumbo->FoundStart = 0;
+   jumbo->FoundStart = false;
jumbo->CurrentSize = 0;
jumbo->skb = NULL;
}
@@ -1257,7 +1257,7 @@ static void ipg_nic_rx_with_start(struct net_device *dev,
 
skb_put(skb, IPG_RXFRAG_SIZE);
 
-   jumbo->FoundStart = 1;
+   jumbo->FoundStart = true;
jumbo->CurrentSize = IPG_RXFRAG_SIZE;
jumbo->skb = skb;
 
@@ -1303,14 +1303,14 @@ static void ipg_nic_rx_with_end(struct net_device *dev,
}
 
dev->last_rx = jiffies;
-   jumbo->FoundStart = 0;
+   jumbo->FoundStart = false;
jumbo->CurrentSize = 0;
jumbo->skb = NULL;
 
ipg_nic_rx_free_skb(dev, entry);
} else {
IPG_DEV_KFREE_SKB(jumbo->skb);
-   jumbo->FoundStart = 0;
+   jumbo->FoundStart = false;
jumbo->CurrentSize = 0;
jumbo->skb = NULL;
}
@@ -1340,7 +1340,7 @@ static void ipg_nic_rx_no_start_no_end(struct net_device 
*dev,
}
} else {
IPG_DEV_KFREE_SKB(jumbo->skb);
-   jumbo->FoundStart = 0;
+   jumbo->FoundStart = false;
jumbo->CurrentSize = 0;
jumbo->skb = NULL;
}
@@ -1840,7 +1840,7 @@ static int ipg_nic_open(struct net_device *dev)
 
 #ifdef JUMBO_FRAME
/* initialize JUMBO Frame control variable */
-   sp->Jumbo.FoundStart = 0;
+   sp->Jumbo.FoundStart = false;
sp->Jumbo.CurrentSize = 0;
sp->Jumbo.skb = 0;
dev->mtu = IPG_TXFRAG_SIZE;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak

2008-01-08 Thread linux

I take that back.  This patch does NOT fix the leak, at least if
ping: sendmsg: No buffer space available
is any indication...

I think I was reading slabinfo wrong.
kmalloc-2048   42111  42112   204842 : tunables000 : 
slabdata  10528  10528  0

Sorry for the false hope.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] drivers/net/ipg.c: fix horrible mdio_read and _write

2008-01-08 Thread linux

>> +do {
>> +/* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
>> +u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA;
>> +/* + rather than | lets compiler microoptimize better */
>> +ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
>> +} while (len);

> Imho something is not quite right when the code needs a comment every line
> and I am mildly convinced that we really want to honk an "optimizing mdio
> methods is ok" signal around.

Oh, but those are SPACE-saving optimiztions. :-)
I know it's not time-critical; it's really pure hack value, but is it
that evil?

> "while (len--) {" is probably more akpm-ish btw.

Well spotted.

[...]
>>  static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
>>  {
>>  void __iomem *ioaddr = ipg_ioaddr(dev);
>> +u8 const polarity = ipg_r8(PHY_CTRL) &
>> +(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);

> (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY) appears twice. I would not
> mind a #define for it.

I'm hardly going to go to war over over the matter, but actually I disagree.

There's a non-zero mental cost to keeping track of an additional name,
and when it's only used two times, and is pretty simple, I think reducing
the number of layers of #defines to understand is a positive advantage.
The above reads "the two polarity bits from the PHY_CTRL register"
to a person who's never read ipg.h.  Adding IPG_PC_POLARITY_BITS just
requires mentally dereferencing another layer of pointers.

Think of it as a function small enough that it can be inlined.

>> @@ -221,75 +240,30 @@ static int mdio_read(struct net_device * dev, int 
>> phy_id, int phy_reg)
>[...]
>> -for (i = 0; i < p[6].len; i++) {
>> -p[6].field |=
>> -(read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i));
>> -}
>> +send_three_state(ioaddr, polarity); /* TA first bit */
>> +(void)read_phy_bit(ioaddr, polarity);   /* TA second bit */
>> +
>> +for (i = 0; i < 16; i++)
>> +data += data + read_phy_bit(ioaddr, polarity);

> Huh ?

Okay, I guess you prefer
+   data = 2*data + read_phy_bit(ioaddr, polarity);

That's only one character longer and easier to understand.
Or even four characters:
+   data = (data<<1) + read_phy_bit(ioaddr, polarity);

That's just the synonym that happened to come out of my fingers at the
time.  There's no particular meaning to it.

>> @@ -299,11 +273,13 @@ static int mdio_read(struct net_device * dev, int 
>> phy_id, int phy_reg)
>>  static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int 
>> val)
[...]
>> +mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
>> +mdio_write_bits(ioaddr, polarity, GMII_ST << 14 | GMII_WRITE << 12 |
>> +  phy_id << 7 | phy_reg << 2 |
>> +  0x2, 16);

> Use the 80 cols luke:
> phy_id << 7 | phy_reg << 2 | 0x2, 16);

Good spotting, thanks.

Here's a revised patch:

drivers/net/ipg.c: Fixed style problems that AKPM noticed.

(And a few more while at it.  Including an actual bug in enabling multicast
due to & vs. && confusion.)

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index 3860fcd..fb69374 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -188,9 +188,9 @@ static void send_end(void __iomem *ioaddr, u8 
phyctrlpolarity)
phyctrlpolarity) & IPG_PC_RSVD_MASK, PHY_CTRL);
 }
 
-static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
+static unsigned read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
 {
-   u16 bit_data;
+   unsigned bit_data;
 
ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | phyctrlpolarity);
 
@@ -202,12 +202,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 
phyctrlpolarity)
 }
 
 /*
+ * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1)
+ * of the PhyCtrl register. 1 <= len <= 32.  "ioaddr" is the register
+ * address, and "otherbits" are the values of the other bits.
+ */
+static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, 
unsigned len)
+{
+   otherbits |= IPG_PC_MGMTDIR;
+   while (len--) {
+   /* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
+   u8 d = ((data >> len) & 1) * IPG_PC_MGMTDATA;
+   /* + rather than | allows slight code size microoptimization */
+   ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
+   }
+}
+
+/*
  * Read a register from the Physical Layer device located
  * on the IPG NIC, using the IPG PHYCTRL register.
  */
 static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 {
void __iomem *ioaddr = ipg_ioaddr(dev);
+   u8 const polarity = ipg_r8(PHY_CTRL) &
+   (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);

Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak

2008-01-08 Thread linux

> Can you try the patch below ?

Testing now... (I presume you noticed the one-character typo in my
earlier patch.  That should be "mc = mc->next", not "mv = mc->next".)

That doesn't seem to do it.  Not entirely, at least.  After downloading
and partially re-uploading an 800M file, slabtop reports:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
341576 341574  99%0.50K  426978170788K kmalloc-512
342006 341953  99%0.19K  16286   21 65144K kmalloc-192
 30592  30575  99%2.00K   76484 61184K kmalloc-2048
 30213  30193  99%0.44K   33579 13428K skbuff_fclone_cache
  7650   7643  99%0.08K150   51   600K sysfs_dir_cache
  4000   3938  98%0.12K125   32   500K kmalloc-128
   258258 100%1.15K 436   344K raid5-md5
   232221  95%1.00K 584   232K kmalloc-1024
  3136   3110  99%0.06K 49   64   196K kmalloc-64
   264 80  30%0.68K 24   11   192K ext3_inode_cache

The "kmalloc-2048" was down in the noise before the upload started.
This is in single-user mode, after sync and echo 3 > /proc/sys/vm/drop_caches.


I'll have to try this after this evening's social plans, but I'm thinking
of implementing more rapid bug detection: explicitly zero the sp->TxBuff
slot when the skb is freed, and check that it is zero before putting
anything else in there.  (And likewise for RxBuff.)

That way, I don't have to use up a noticeable amount of memory to see
the bug and reboot to clear up the damage each test cycle.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SACK scoreboard

2008-01-08 Thread linux

Just some idle brainstorming on the subject...

It seems the only way to handle network pipes sigificantly larger (delay *
bandwidth product) than the processor cache is to make freeing retransmit
data o(n).

Now, there are some ways to reduce the constant factor.  The one that
comes to mind first is to not queue sk_buffs.  Throw away the struct
sk_buff after transmission and just queue skb_frag_structs, pages, or
maybe even higher-order pages of data.  Then freeing the data when it's
acked has a much smaller constant factor, particularly d-cache footprint,
and no slab operations.

The downside is more work to recreate the skb if you do have to
retransmit, but optimizing for retransmits is silly.

Some implementations could leave large chunks of memory locked until
all of the sk_buff->skb_shared_info->skb_frag_structs referencing them
have gone away, but you can look at the transmit window when deciding
how big a chunk size to use.


Then, to actually get below O(n), you want to keep the queued data in
a data structure known to the memory manager.  Basically, splice the
retransmit queue onto the free list.

It may require some kludgery in the memory manager.  In particular, doing
that in O(1) time obviously means that you can't coalesce adjacent free
regions to build higher-order pages.  So you'd have to have a threshold
for uncoalesced pages and a way to force coalescing under memory pressure.

You're just deferring work until the page is allocated, but the point
is that then it's okay to bring it into cache when it's about to be
used again.  It's the redundant round trip just because an ack arrived
that's annoying.

I've done thins sort of thing with specialized fixed-block-size allocators
before (an alpha-beta minimax search tree allocates nodes one at a
time, but frees whole subtrees at once), but might it be feasible for
kernel use?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ipg.c bugs

2008-01-09 Thread linux

I'm just about to test that second memory leak patch, but I gave the
original code a careful reading, and found a few problems:

* Huge monstrous glaring bug

  In ipg_interrupt_handler the code to habdle a shared interrupt
  not caused by this device:
if (!(status & IPG_IS_RSVD_MASK))
goto out_enable
  is *before* spin_lock(&sp->lock), but the code following
  out_enable does spin_unlock(&sp->lock).

  Thus, the sp->lock is all f*ed up.  The lack of any sort of
  locking between the interrupt handler and hard_start_xmit
  could cause all sort of issues.

  I'm not actually sure if it's even necessary; I'd think some
  suitable atomic access to sp->tx_current would suffice.

* Lesser bugs

  There's a general pattern of loops over the range from
  s->rx_current to sp->rx_dirty.  Some of them are call code
  that refers to s->rx_current, even though that hasn't been
  updated yet.

  One instance is in ipg_nic_check_frame_type.
  A second is in ipg_nic_check_error.

  In ipg_nic_set_multicast(), the code to enable the multicast flags
  is of the form "if (dev->flags & IFF_MULTICAST & (dev->mc_count > ...))".
  But IFF_MULTI CAST is not 1, so this will always be false.
  The seond & needs to be && (2x).


  In ipg_io_config(), there's
/* Transmitter and receiver must be disabled before setting
 * IFSSelect.
 */
ipg_w32((origmacctrl & (IPG_MC_RX_DISABLE | IPG_MC_TX_DISABLE)) &
IPG_MC_RSVD_MASK, MAC_CTRL);
  I don't know what's going on there, but unless the IPG_MC_RX_DISABLE
  bit is already set in origmacctrl, that's going to write 0, which
  won't disable anything.

  Immediately following, there's some similarly buggy code doing something
  I don't understand with IPG_MC_IFS_96BIT.


  The setting of curr in ipg_nic_txfree, with that bizarre do_div, can't
  possibly be working right.


* Possible bugs

  I'm not very sanguine about the handling in init_rfdlist, of the
  code that handles a failed ipg_get_rxbuff.  In particular, it leaves
  rxfd->frag_info uninitialized in that case, but does set rxfd->rfs to
  "buffer ready to be received into", which could lead to receiving into
  random memory locations.

  In ipg_nic_hard_start_xmit(), the code
if (sp->tx_current == (sp->tx_dirty + IPG_TFDLIST_LENGTH))
netif_wake_queue(dev);
  shouldn't that *stop* the queue if the TFDLIST is full?

  I think that the places where the rxfd->rfs and txfd->tfc fields
  are filled in (containing the hardware-handoff flag) should
  have memory barriers.
  
* Stupid code

  In ipg_io_config, there are three writes to DEBUG_CTRL "Per silicon
  B3 eratta".  First, that's "errata".  But more significantly,
  can those writes be combined into one?  Is it necessary to read
  the DEBUG_CTRL register each time?

  The initialization of rxfd->rfs in init_rfdlist() and ipg_nix_rxrestore()
  should be moved into ipg_get_rxbuf().  And since the ready bit is there,
  it should be set AFTER the pointer fields AND there should be a barrier
  so the hardware doesn't read the fields out of order.

  In ipg_nic_txcleanup(), there's code to call netif_wake_queue every
  time through the loop in 10 MBit mode (to balance some bug-workaround
  call that stops the queue every packet in that case), which is
  quite unnecessary, as ipg_nic_txfree() will do it.

  The IPG_INSERT_MANUAL_VLAN_TAG code (fortunately disabled by default)
  is just plain bizarre.  What exactly is the use of assigning a tag of
  0xABC to every packet?

  The code in ipg_hw_init to set up dev->dev_addr reads each of the
  16-bit address reigsters twice, for no apparent reason.

  There's a lots of code in e.g. ipg_nic_rx() that does endless
  manipulation of rxfd->rfs with an le64_to_cpu() call around each
  instance, that should copy it to a CPU-ordered native value and be
  done with it.  (Some sparse annotations would help, too.)

  Likewise for messing with txfd->tfc in ipg_nic_hard_start_xmit().

  The Frame_WithEnd enum is a very strange value (decimal 10) to use as
  a bitmapped status flag.

  The four frame fragment functions
nic_rx_with_start_and_end
nic_rx_with_start
nic_rx_with_end
nic_rx_so_start_no_end
  could easily be unified into one.

* Performance left on the floor

  The hardware supports scallter/gather, hardware checksums, VLAN tagging,
  and 64-bit (well, 40-bit) DMA, but the driver sets no feature flags.

  The jumbo frame reception code could generate fragmented skbs rather
  that doing all those memcopies.

  Would it be worth splitting the 64-bit ->rfs and ->txc fields into
  two 32-bit fields?

  Would it be worth copying small incoming packets to small skbs and
  keeping the large skb in the receive queue?

* Questions

  In net_device_stats, are all those statistics registers cleared by
  a read?

  How do we determine the silicon revision numbers, so we can stop enabling
  bug workarounds on versions that don't need i

Re: [PATCH 0/4] Pull request for 'ipg-fixes' branch

2008-01-10 Thread linux

Thank you very much, this appears to work.

> The driver is still a POMS but it seems better now.

I notice that the vendor-supplied driver doesn't have these bugs.
Now, it does have a bug in that it doesn't have an "is this
interrupt for me?" test at all (and always returns "I handled it"),
but the bypass and its locking screwups are a later addition.

The same with the sp->rx_current bugs.  The original loop which used
rx_current as the loop iteration variable wasn't great style, precisely
because it hides the interaction that someone's "optimization" broke,
but I don't want to blame the vendor for things they didn't do.

Would you be interested in some cleanup patches?  In particular, I think I
can get rid of tx->lock entirely, or at least take it off the fast path.
All it's protecting is the write to sp->tx_current, and a few judicious
memory barriers can deal with that.


(Oh, another BUG: the sp->ResetCurrentTFD logic in hard_start_xmit is
just plain broken.  It writes the new data to entry 0, then increments
sp->tx_current just like usual.  THAT isn't in the vendor driver that
I see, either.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Extensible hashing and RCU

2007-02-03 Thread linux

I noticed in an LCA talk mention that apprently extensible hashing
with RCU access is an unsolved problem.  Here's an idea for solving it.

I'm assuming the table is a power of 2 in size with open chaining
for collisions.  When the chains get too long, the table is doubled.
When the chains get too short, the table size is halved.

- Compute a sufficiently large (32-bit?) hash value for each entry.
  "Sufficiently large" is large enough for the largest possible hash table.

- The hash value is stored with each entry.  (Not strictly
  necessary if the update rate is sufficiently low.)

- The table is indexed on the *high* bits of the hash value.
  As it grows, additional bits are appended to the hash value.

- Each chain is stored in sorted order by hash value.
  (This is why storing the hash value is an efficiency win.)

To double the size of a hash table:
- Allocate new, larger, array of head pointers.
- The even slots are copied from the smaller hash table.
- The odd slots are initialized to point to the middle of
  the hash chains pointed to by the odd slots.  However, the
  even chains are NOT terminated yet; a search through one of
  them will go through the full chain length.
- The new table is declared open for business.
- Wait for RCU quiescent period to elapse, so there are no more readers
  of the old table.
- NOW truncate the even chains by setting the next pointers to NULL.
- Deallocate and free the old array of head pointers.

Likewise, to halve the size, copy the even heads to a smaller table,
link the odd heads onto the tails of the even chains, copy to
a smaller table, and declare it open for business.  When an RCU quiescent
period has elapsed, you can delete the old table.

Ths insight is that RCU makes taking stuff out of a linked list
very delicate, and moving it while preserving access is
basically impossible.  But you can append extraneous junk to the
end of a hash chain harmlessly enough and share the structure.

Thus, there is a period of overlap when both the old and the new hash
tables are valid and functional.

Indeed, after each of the above steps, you can actually allow new
insertions into the hash table while waiting for the RCU quiescent period.

If the insertion is at the head of chain, it won't be seen by readers
of the old table, but that's harmless.

The trickiest case I can think of is the deletion of a table
entry at the head of an odd chain while an expansion is pending.
When scanning the even chain afterwards to find where to truncate it,
you can't compare node->next to the odd chain head; you have to
look at the now deleted node's hash code and see that it exceeds
the threshold for the even chain.  (Equivalently, you can test to
see if the appropriate bit of the hash code is set.)

So that hash chain walking has to be done BEFORE the node is actually
deleted.  This requires an ordering guarantee on RCU callbacks, either
a priority system or FIFO.  call_rcu looks like it uses FIFO order,
but it's per-CPU lists.

Ah!  It's worse than that.  Even after the first RCU quiescent period,
there still could be a walker of the even chain holding a pointer to
the newly-deleted odd chain head.  Thus, it can't actually be reclaimed
until a *second* RCU quiescent period has elapsed.

The first RCU period is to get rid of anyone who needs the link, then you
remove it, then you need to wait until there's nobody who's still using it.

Still, it's probably not too horrible.


(You could index the hash table on the low-order bits, but then you
need to keep the chains sorted by bit-reversed hash value, which is
probably more of a pain.  Still pretty easy, though.  To compare x and
y bit-reversed, just let mask = x^y; mask ^= mask-1; compare (x&mask)
to (y&mask).)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/TOY]Extensible hashing and RCU

2007-02-06 Thread linux

> For purposes of discussion, I've attached a "toy" implementation
> for doing dynamic resizing of a hashtable. It is useless, except
> as a proof of concept.
> 
> I think this is very similar to what you are describing, no?

Er... not quite; that has a lot more overhead than what I was
thinking about.

I have used the trick of distinguishable sentinel values in a
doubly-linked list to maintain read cursors while it's being updated,
but I don't think that's necessary here.

(You can also encode the nh_type flag in the lsbit of the pointer
if you're being sneaky.  That will attract curses from the
memleak detector, though.)

In particular, I was imagining a singly linked list.  To delete
an item, use the hash to find the head pointer and walk it
to find the pointer to be fixed up.  Since the chains are short
anyway, this is entirely reasonable.

Less fundamental comments include:

1) Is the seqlock in get_nh() and nh_replace() really required?
   Is there any architecture that doesn't have atomic pointer stores?
   If you wanted to store the table size in a fixed location as
   well, I could see the need...

2) I think the whole __nh_sort_chain business will royally confuse
   anyone walking the chain while it happens.  This is exactly
   what I was working to avoid.  The partial sorting in __nh_insert
   isn't good enough.

   Instead, try:

/* Return true if bitrev(x) > bitrev(y) */
static bool bitrev_gt(unsigned long x, unsinged long y)
{
/* Identify the bits that differ between x and y */
unsigned long mask = x ^ y; /* Find the bits that differ */
mask ^= mask-1; /* Find lsbit of difference (and all lower bits) */
return (x & mask) > (y & mask);
}

static void __nh_insert(struct nh_entry *entry, struct nh_head *head)
{
struct list_head *p, *n;
unsigned long const hashval = nh_hashval(entry->data);

/*
 * Insert the new entry just before the first element of the list
 * that its hash value is not greater than (bit-reversed).
 */
p = &head->list;
list_for_each_rcu(n, &head->list) {
struct nh_entry *t = container_of(n, struct nh_entry, nh_list);
if (t->nh_type == NH_ENTRY &&
!bitrev_gt(hashval, nh_hashval(t->data)))
break;
p = n;
}
__list_add_rcu(&entry->nh_list, p, n);
}

static int nh_add(unsigned long data)
{
struct nh_entry *entry = kmalloc(sizeof *entry, GFP_KERNEL);
struct nh *nh;

if (!entry) return -ENOMEM;

entry->nh_type = NH_ENTRY;
entry->data = data;

rcu_read_lock();

nh = get_nh();  /* or nh = __nh */

if (nh) {
struct nh_head *h = \
&nh->hash[ nh_bucket(nh_hashval(data), nh->nentries) ];

spin_lock(&h->lock);
__nh_insert(entry, h);
spin_unlock(&h->lock);
}
rcu_read_unlock();


}

   Then there's no need for __nh_sort_chain at all.  Alternatively, if the
   upper bits of nh_hashval are as good as the lower bits, just index the
   hash table on them.

3) Code inside a mutex like nh_resize() can use plain list_for_each();
   the _rcu variant is only required if there can be simultaneous
   mutation.

That's a nice module framework.  I'll see if I can write some code
of the sort I was thinking about.

FWIW, I figured out a way around the need to delay deletion for two
RCU intervals.

Suppose that an expansion is pending, and we have just stretched the
table from 16 entries to 32, and the following hash values are
stored.  (Note the bit-reversed order.)

old[3] --\   
new[3] ---+-> 0x03 -> 0x43 -> 0x23 -> 0x63 -> 0x13 -> 0x53 -> 0x33 -> 0x73
 /
new[3+16]---/

After an RCU period, you can throw away the old[] array and NUL-terminate
the new[i] list after 0x63.  But until then, you need to leave the list alone
to accomodate people who are looking for 0x53 via the old head.

The tricky case comes when you delete 0x13.  If you only delete it from the
new[3+16] list, you can't discard it until the RCU quiescent period
after the one which dicsonnects the 0x63->0x13 link.

The solution is actually very simple: notice when you're
- Deleting the first entry in a list
- While an expension is pending
- And the list is in the second half of the expanded table
then unlink the entry from BOTH the new head and the old list.
It's a bit more work, and requires some lock-ordering care, but
it lets you queue the node for RCU cleanup the normal way.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Extensible hashing and RCU

2007-02-22 Thread linux

> I think you misunderstood me.  If you are trying to DoS me from
> outside with a hash collision attack, you are trying to feed me
> packets that fall into the same hash bucket.  The Jenkins hash does
> not have to be artifact-free, and does not have to be
> cryptographically strong.  It just has to do a passable job of mixing
> a random salt into the tuple, so you don't know which string of
> packets to feed me in order to fill one (or a few) of my buckets.
> XORing salt into a folded tuple doesn't help; it just permutes the
> buckets.

If you want to understand this more formally, read up on "universal
families of hash functions," which is the name cryptologists give to
this concept.

When used according to directions, they are actually *more* secure than
standard cryptographic hashes such as MD5 and SHA.  The key difference
is that *the attacker doesn't get to see the hash output*.

The basic pattern is:
- Here's a family of hash functions, e.g. a salted hash function.
- I pick one at random.  (E.g. choose a salt.)
- Now your challenge is to generate a pair of inputs which will
  collide.
- Note that you never get to see sample input/output pairs of the
  hash function.  All you know is that it's a member of the family.
- It is surprisingly easy to find families of size N such that
  an attacker has on the order of a 1/N chance to construct a collision.
- This remains true even if you assume that the attacker has
  infinite computational power.

This pattern corresponds exactly to an attacker trying to force collisions
in a hash table they can't see.


As far as I know, nobody has proved salted jash a truly universal family,
but so many amazingly simple algorithms have been proved universal that
it wouldn't surprise me if it was.

For example, the family of all CRCs computed modulo n-bit primitive
polynomials is a universal family.  If you do know the polynomial, it's
ridiculously easy to build a collision.  If you don't, it's provably
impossible.

(Footnote: the chance isn't exactly 1/N, but also depends on the size
of the input relative to the size of the hash.  With bigger inputs, it's
easier to make them match according to more of the hashes.  Ultimately,
if you have N k-bit CRC polynomials, you can make them all collide with
an N*k-bit input.  But since N is proportional to 2^k, it's easy to make
k big enough that this is impractical.)

The rehash-every-10-minutes detail is theoretically unnecessary,
but does cover the case where a would-be attacker *does* get a chance
to look at a machine, such as by using routing delays to measure the
effectiveness of a collision attempt.


Now, as for flaming about how xor generates more uniform distributions
than jhash - that's to be expected from a weak hash.  By relying on
non-uniform properties of the input (particularly that hosts tend to walk
linearly through the source port space), you can make hash values walk
linearly through your hash table, and get a completely even distribution
rather than a *random* one.

This is great for efficiency, but depends on letting patterns in the hash
input through to the output, which is exactly the property that makes it
vulnerable to a deliberate DoS attempt.

If you want to test your distribution for randomness, go have a look
at Knuth volume 2 (seminumerical algorithms) and see the discussion of
the Kolmogorov-Smirnov test.  Some lumpiness is *expected* in a truly
random distribution.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [EMAIL PROTECTED]: Re: [RFC PATCH 34/35] Add the Xen virtual network device driver.]

2006-05-12 Thread linux

omains with non-full accounts.

So after an initial accumulation period to fill up the buffers, the
available entropy is divided evenly among all the domains that want it.

I don't know how Xen works at all, whether it's easier to buffer the
entropy in domain0 until requested or immediately push it to the
subdomains, but either way, it's doable.


So I guess, before doing any fancy design, it's worth asking: do people
prefer to have entropy be a service that the Xen hypervisor delivers to
client domains, or should the domains manage it themselves?
They may not both be practicable, but which do you people to explore
first?


A few more issues which have arisen since /dev/random was first written:

- Modern processors change clock rate, causing a real-world jitter
  number to translate into a variable number of timestamp ticks.  +/-10 ns
  may be +/-32 timestamp ticks, or less if the clock is running slower.

  The most recent processors run their timestamp counters at a fixed
  rate, regardless of clock divisor, by incrementing it by more than
  one per cycle at times.  But either way, you still have to reduce
  the entropy estimate when reducing clock speed.

- Wireless keyboards and mice are a lot less unobservable than wired ones.

- On the upside, full-speed timestamp counters are widely available, as
  are > 1 GHz clock rates, making for a rich source of clock jitter.


Oh, and on the theoretical front, there's been a lot of research
into so-called "randomness extraction functions".  In particular,
it's been shown that Shannon entropy (the sum, over the various random
possibilities i = 1, 2, ... n of -p[i] * log(p[i])) is not possible to
base a secure extractor on; you need your sources to have good min-entropy
min -log(p[i]).  In my previous post to linux-kernel, I completely forgot
about this... arrgh, have to post a retraction.

Anyway, min-entropy, being simply the negative log of the highest
probability, is always less than or equal to the Shannon entropy.
It's equal for uniform distributions (all choices equally likely),
but more conservative for lopsided distributions.

Here's the classic teaching example: say you have a source, which produces
31 truly random bits (0..0x7fff) half the time, but produces -1
(0x) the other half of the time.  (If this seems too trivial,
assume it is encrypted with a one-time pad known only to the attacker;
that doesn't change the analysis.)

It is simple to compute the Shannon entropy of this source: 16.5 bits
per sample.  p[-1] = 1/2, while p[0..0x7fff] = 2^-32, and plug all
that into the Shannon entropy formula.

Now, if I take 8 samples from this source (total entropy 132 bits) and mix
them up together (say, with MD5), I should get a good 128-bit key, right?
But 1/256 of the time, the MD5 input is simply zero and the attacker
knows my key in one guess.  An additional 1/32 of the time, only one of
the 8 samples was random and there's only 34 bits of entropy in my key.
(31 for the sample value plus 3 for the sample number.)

The reason for this paradox is that, half of the time, my input contains
more than 128 bits of entropy, and compressing it with MD5 is throwing
the excess away.  The naive Shannon entropy computation is averaging
that excess entropy with the low-entropy cases, which is not valid if
you are producing finite-length output.

The min-entropy measure of 1 bit per sample correctly predicts the
8-bit min-entropy of the output.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

VIA Velocity VLAN vexation

2007-03-22 Thread linux

I have a machine (x86-32, 2.6.20.3) with two ethernet interfaces:
a 100M Tulip and a 1G VIA Velocity.  Both are connected to a common
VLAN-capable switch.  The eventually desired configuration is VLAN
support on the Gbit interface.

If I set the Tulip's switch port to tagged, and configure a VLAN on the
Tulip interface appropriately, packets flow as expected.

But if I try the same configuration on the Velocity interface, things
don't work.
I can see tagged ICMP pings go out, but no responses come back.
I can see ARP requests and responses on the target machine.
If I manually configure the ARP caches, I can see the pings and responses
on the target machine.
If I kludge the target's ARP cache to point back to the source's Tulip
interface, I can see the ping responses on the Tulip interface.

But I don't see the ping responses on the Velocity interface.

The vlan interface name and address is the same, so it can't be
firewall rules distinguishing.

I have tried various ping sizes from 0 to 1472.


Is this likely to be a problem with the via-velocity driver?
Is anyone working on it?  Or should I just get a different gigabit card?

Thanks for any advice!

00:09.0 Ethernet controller [0200]: VIA Technologies, Inc. VT6120/VT6121/VT6122 
Gigabit Ethernet Adapter [1106:3119] (rev 11)
00:0d.0 PCI bridge [0604]: Digital Equipment Corporation DECchip 21152 
[1011:0024] (rev 03)
02:04.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 
21142/43 [1011:0019] (rev 41)
02:05.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 
21142/43 [1011:0019] (rev 41)
02:06.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 
21142/43 [1011:0019] (rev 41)
02:07.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 
21142/43 [1011:0019] (rev 41)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: VIA Velocity VLAN vexation

2007-03-23 Thread linux

>> Or should I just get a different gigabit card ?
>
> This one probably got answered the 2005/11/29. :o)

Ah, that's where I asked before.  I misplaced the e-mail.
I hope you don't mind my asking every year or two.

But I don't see any suggestions for an alternative gigabit
card anywhere.  I had assumed they all mostly worked, but
now it appears I need to know details.

> I'll got to bed in a few minutes but I'll happily resurrect the
> velocity vlan patches.

Haven't they been merged upstream already?


Anyway, thanks for the reply!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-24 Thread linux

> Result with jenkins:
> 1 23880
> 2 12108
> 3 4040
> 4 1019
> 5 200
> 6 30
> 7 8
> 8 1
> 
> Xor:
> 1 65536

Precisely.  This means that the Xor hash SUCKS, because its output is 
conspicuously
non-random.

What you expect is a Poisson distribution, where the chance that a chain will 
contain
k elements is
P(lambda,k) = e^-lambda * lambda^k / k!
lambda is the average loading per chain.  In your example, it's 1 (65536 
inputs, 65536 outputs).
(^ is exponentiation, ! is factorial)

So the distribution I expect to get is:
 0 24109.347656
 1 24109.347656
 2 12054.673828
 3 4018.224609
 4 1004.556152
 5 200.911224
 6 33.485203
 7 4.783601
 8 0.597950
 9 0.066439
10 0.006644

Whick looks a HELL of a lot like what you observed.
(The jenkins result above has 24250 chains with no entries.)


Now, you can sometimes use properties of the inputs to get a distribution
that is more uniform than random, by letting the distribution of the input
"show through" the hash funciton.  Which the xor hash does.  But this
depends on making assumptions about the input distribution, which means
that you're assuming that they're not being chosen maliciously.

If an attacker is choosing maliciously, which is a required assumption
in today's Internet, the best you can do is random.

Now, the core Jenkins hash mix function basically takes three inputs.
What jhash_3words does with it is:
a += K
b += K
c += seed
__jhash_mix(a, b, c)
return c;

Now, the ipv4 hash operation fundamentally involves 96 bits of input,
which is a good match for jhash.  If you want to add a salt, perhaps the
simplest thing would be to just replace those constants K with a 96-bit
salt and be done with it:

a = (lport<<16) + rport + salt[0];
Xb = laddr + salt[1];
c = raddr + salt[2];
__jhash_mix(a,b,c)
return c;

Regarding control by attackers, let's consider the four inputs and see
how much information an attacker can insert into each one:

remote port: An attacker has complete control over this.  16 bits.
remote address: Depends on the size of the bit-net.  Can vary from 0 bits
(one machine) to 20 bits for a large bot-net.
local address: Limited to the number of addresses the local machine has.
Typically 0 bits, rarely more than 2 bits.
May be much larger (8 bits or more) for stateful firewalls and
other sorts of proxies.
local port: Limited to the number of open server ports.  Typically 3-6
bits, but may be lower on heavily firewalled machines.

Certainly combining any two of these in a predictable way without some
non-linear salting makes an attacker's job easier.  While folding the
local and remote addresses before hashing is usually safe because the
local address is usually unique, there are situations in which there are
a large number of possible local addresses.

For example, it allows an attacker with a /24 to attack, say, a stateful
firewall guarding a /24.  If I have my machine at address a.b.c.d connect
to remote machine x.y.z.~d, then they always fold to a^x.b^y.c^z.255,
and I can, by making the local and remote paorts identical for all the
attacks, force 254-entry hash chains without knowing anything else about
the hash function, salt, or whatever.


An interesting question is whether it's better to mix salt into the
bits an attacker has most or least control over.  It's not immediately
obvious to me; does anyone else have insight?  Just mixing in 96 bits
over everything does seem to render the question moot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

"uli526x: I/O base is zero"

2006-11-04 Thread linux

I've got a rather awkward debugging situation.

I helped a friend in another city set up a dual-boot Linux/Windows box
a while ago, and it just got a motherboard upgrade.  Unfortunately, I
had followed my usual instincts and built a custom kernel which didn't
include the new motherboard's drivers.  If I can just get the network
working, I can log in remotely and get everything else going, but until
then, I have to instruct someone in kernel debugging over the telephone.

The motherboard is an MSI K9NU Neo-V ULi M1697 AM2 motherboard,
and PCI device :00:12.0 is an M5263 Ethernet controller,
10b9:5263(rev 60).

It's an older but not ancient 2.6 kernel.  2.6.14, I think, although I
can't seem to find where I wrote it down.  The previous system, which
I set up and was running fine, was a single-core K8, socket 939, with
an nForce4 chipset.  The new one is dual-core and ULi M1697.  So there
are a lot of similarities.

Anyway, we got SMP enabled, and the uli526x driver enabled.  But the
network still didn't work.

Booting with uli526x.debug=1 produced 

uli526x: uli526x_init_one() 0
ACPI: PCI interrupt :00:12.0[A]->GFI 20(level,low)->IRQ 50
uli526x: I/O base is zero

Where I'm confused is the "I/O base is zero" message.  Obviously,
this is a fatal error to the device initialization, but I'm not sure
what causes it.  The obvious "type the error message into google" only
produces a couple of disk images in Romania.

The next step is probably to make and use a more recent boot CD.
But just in case, can I ask:

Can any psychic wizard here suggest, from this very fragmentaty
information, some simple thing I have overlooked that would cause this
problem?

Thank you!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: VIA "Velocity" test report - VLAN reception not working

2005-11-28 Thread linux

> Btw, you may consider using netdev@vger.kernel.org instead of
> the obsolete [EMAIL PROTECTED], especially as M. Cox is in India.

Oh!  I remember once making the opposite error and getting a bounce,
so the fact that netdev is NOT hosted as vger is stuck in my head.
I guess it's changed now.

(Of course, I have only recently gotten my fingers to stop auto-completing
"vger.rutgers.edu", so that might have been a while ago...)

> If you can put the card in a crashme/testme computer, feel free to try
> the patches at:
> http://www.zoreil.com/~romieu/linux/kernel/2.6.x/2.6.15-rc2/via-velocity/20051128

Neat, thanks!  Are they actually likely to mess up the host or make it
unstable, or are you just saying "hey, these are for TESTING, capiche?"

I currently have the card installed in a machine, and while it's a
production machine, taking it down to single-user at some odd hour
and doing a bit of testing is not really any more outage, and less
effort, than taking the machine down, removing the card, reconfiguring
the net without it, installing it in a different machine, testing there,
and reversing the process to put the card back.

Is it so bad there's serious concern it might corrupt a mounted file system?

> The patches apply on top of each other. I'd suggest doing a first round
> of testing without VLAN to check that the usual flow did not experience
> collateral damages.
> 
> If it works fine, enable VLAN when the last patch is applied and add
> a single vlan with vconfig. If it does not crash, tcpdump + ping in
> both direction w/wo VLAN may help fix the issues.

Great!  Tonight is got good, but I'll get to it soon.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: VIA "Velocity" test report - VLAN reception not working

2005-11-29 Thread linux

> I expect the worst behavior to simply translate into a mute interface
> with or without VLAN but...

Actually, I upgraded to 2.6.15-rc3 plus your patches, and the behaviour
is simply exactly the same.  (I've only compiled one 2.6.15-rc3 kernel,
so I can't have possibly booted the wrong one.)  Tcpdump on a tagged
port with the velocity driver:

# tcpdump -n -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
03:54:26.437624 802.1d unknown version
03:54:28.438430 802.1d unknown version
03:54:30.438488 802.1d unknown version
03:54:32.442433 802.1d unknown version
03:54:33.446611 LACPv1, length: 110
03:54:34.55 802.1d unknown version
03:54:36.445276 802.1d unknown version
03:54:38.448251 802.1d unknown version
03:54:40.448322 802.1d unknown version
03:54:42.451189 802.1d unknown version

10 packets captured
20 packets received by filter
0 packets dropped by kernel

While listening on the same port with a Tulip-based 100baseT card:

# tcpdump -n -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
03:55:28.295843 vlan 2, p 0, arp who-has 192.35.100.95 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.92
03:55:28.492928 802.1d unknown version 22:55:28.546414 vlan 2, p 0, arp who-has 
192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 192.35.100.23
03:55:28.739570 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.110
03:55:28.748937 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.86
03:55:29.739545 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.110
03:55:29.748873 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.86
03:55:30.126945 vlan 2, p 0, IP 192.35.100.23.41946 > 198.69.104.19.53:  28350+ 
PTR? 59.100.35.192.in-addr.arpa. (44)
03:55:30.174782 vlan 2, p 0, arp who-has 192.35.100.1 tell 192.35.100.3
03:55:30.494294 802.1d unknown version
03:55:30.739527 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.110
03:55:30.748865 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.86
03:55:30.872635 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.38
03:55:31.174739 vlan 2, p 0, arp who-has 192.35.100.1 tell 192.35.100.3
03:55:31.228356 vlan 2, p 0, arp who-has 192.35.100.1 tell 192.35.100.59
03:55:31.739538 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.110
03:55:31.748865 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.86
03:55:31.755004 vlan 2, p 0, arp who-has 192.35.100.95 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.92
03:55:31.872379 vlan 2, p 0, arp who-has 192.35.100.1 (ff:ff:ff:ff:ff:ff) tell 
192.35.100.38

19 packets captured
38 packets received by filter
0 packets dropped by kernel

Basiclly, all the vlan packets disappear, even during promiscuous
receive.

Thanks for trying, though!  I'm running your patches untagged at the
moment with no obvious problems.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

4.8.0-rc1: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)

2016-08-09 Thread linux


L.S.,

Just tested 4.8.0-rc1, but i get the stack trace below, everything seems 
to continue fine afterwards though
(haven't tried to bisect it yet, hopefully someone has an insight 
without having to go through that :) )

My network config consists of a bridge and NAT.

--
Sander

[10469.336815] swapper/0: page allocation failure: order:3, 
mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
[10469.336820] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.8.0-rc1-20160808-linus-doflr+ #1
[10469.336821] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
V1.8B1 09/13/2010
[10469.336825]   88005f603228 81456ca5 

[10469.336828]  0003 88005f6032b0 811633ed 
020840205fd0f000
[10469.336830]   88005f603278 02084028 
00035fd0f500

[10469.336832] Call Trace:
[10469.336834][] dump_stack+0x87/0xb2
[10469.336845]  [] warn_alloc_failed+0xdd/0x140
[10469.336847]  [] __alloc_pages_nodemask+0x3e1/0xcf0
[10469.336851]  [] ? check_preempt_curr+0x4f/0x90
[10469.336852]  [] ? ttwu_do_wakeup+0x12/0x90
[10469.336855]  [] alloc_pages_current+0x8d/0x110
[10469.336857]  [] kmalloc_order+0x1f/0x70
[10469.336859]  [] __kmalloc+0x129/0x140
[10469.336861]  [] bucket_table_alloc+0xc1/0x1d0
[10469.336862]  [] rhashtable_insert_rehash+0x5d/0xe0
[10469.336865]  [] ? __nf_nat_l4proto_find+0x20/0x20
[10469.336866]  [] nf_nat_setup_info+0x2ef/0x400
[10469.336869]  [] nf_nat_masquerade_ipv4+0xd5/0x100
[10469.336870]  [] masquerade_tg+0x32/0x40
[10469.336872]  [] ipt_do_table+0x29e/0x3b0
[10469.336873]  [] iptable_nat_do_chain+0x1a/0x20
[10469.336875]  [] nf_nat_ipv4_fn+0x12f/0x1e0
[10469.336876]  [] ? iptable_nat_ipv4_fn+0x20/0x20
[10469.336877]  [] nf_nat_ipv4_out+0x37/0x40
[10469.336878]  [] iptable_nat_ipv4_out+0x10/0x20
[10469.336880]  [] nf_iterate+0x58/0x70
[10469.336881]  [] nf_hook_slow+0x5f/0xb0
[10469.336884]  [] ip_output+0xb5/0xd0
[10469.336886]  [] ? 
ip_fragment.constprop.43+0x80/0x80

[10469.336887]  [] ip_forward_finish+0x3b/0x60
[10469.336888]  [] ip_forward+0x2c8/0x390
[10469.336890]  [] ? ip_frag_mem+0x40/0x40
[10469.336891]  [] ip_rcv_finish+0x1b5/0x3a0
[10469.336892]  [] ip_rcv+0x279/0x380
[10469.336895]  [] ? skb_copy_ubufs+0xf2/0x290
[10469.336896]  [] ? 
ip_local_deliver_finish+0x120/0x120
[10469.336898]  [] 
__netif_receive_skb_core+0x2d2/0x9e0

[10469.336900]  [] __netif_receive_skb+0x11/0x70
[10469.336901]  [] 
netif_receive_skb_internal+0x1e/0x80

[10469.336902]  [] ? nf_hook_slow+0x5f/0xb0
[10469.336906]  [] netif_receive_skb+0x9/0x10
[10469.336910]  [] br_pass_frame_up+0x6e/0xe0
[10469.336911]  [] ? 
__br_handle_local_finish+0x40/0x40

[10469.336913]  [] br_handle_frame_finish+0x123/0x4a0
[10469.336914]  [] ? nf_nat_ipv4_fn+0x18e/0x1e0
[10469.336916]  [] 
br_nf_pre_routing_finish+0x183/0x380

[10469.336918]  [] ? br_pass_frame_up+0xe0/0xe0
[10469.336919]  [] br_nf_pre_routing+0x2b2/0x390
[10469.336920]  [] ? br_nf_forward_ip+0x410/0x410
[10469.336921]  [] nf_iterate+0x58/0x70
[10469.336922]  [] nf_hook_slow+0x5f/0xb0
[10469.336924]  [] br_handle_frame+0x1ce/0x2d0
[10469.336926]  [] ? br_pass_frame_up+0xe0/0xe0
[10469.336927]  [] ? br_handle_local_finish+0x40/0x40
[10469.336928]  [] 
__netif_receive_skb_core+0x12b/0x9e0

[10469.336932]  [] ? set_phys_to_machine+0x14/0x40
[10469.336934]  [] ? 
set_foreign_p2m_mapping+0x1a0/0x3a0

[10469.336935]  [] __netif_receive_skb+0x11/0x70
[10469.336937]  [] 
netif_receive_skb_internal+0x1e/0x80

[10469.336939]  [] netif_receive_skb+0x9/0x10
[10469.336941]  [] xenvif_tx_action+0x693/0x820
[10469.336944]  [] ? 
__handle_irq_event_percpu+0x31/0x100

[10469.336945]  [] xenvif_poll+0x29/0x70
[10469.336949]  [] ? do_raw_spin_unlock+0x55/0xa0
[10469.336950]  [] net_rx_action+0x211/0x320
[10469.336953]  [] __do_softirq+0x103/0x210
[10469.336955]  [] irq_exit+0x4b/0xa0
[10469.336957]  [] xen_evtchn_do_upcall+0x30/0x40
[10469.336961]  [] 
xen_do_hypervisor_callback+0x1e/0x40
[10469.336962][] ? 
xen_hypercall_sched_op+0xa/0x20

[10469.336965]  [] ? xen_hypercall_sched_op+0xa/0x20
[10469.336967]  [] ? xen_safe_halt+0x10/0x20
[10469.336970]  [] ? default_idle+0x13/0x20
[10469.336971]  [] ? arch_cpu_idle+0xa/0x10
[10469.336973]  [] ? default_idle_call+0x2e/0x50
[10469.336974]  [] ? cpu_startup_entry+0x256/0x2c0
[10469.336975]  [] ? rest_init+0x72/0x80
[10469.336979]  [] ? start_kernel+0x410/0x41d
[10469.336981]  [] ? 
x86_64_start_reservations+0x2f/0x31

[10469.336983]  [] ? xen_start_kernel+0x547/0x553
[10469.336984] Mem-Info:
[10469.336989] active_anon:55875 inactive_anon:70482 isolated_anon:0
 active_file:71530 inactive_file:77473 isolated_file:0
 unevictable:601 dirty:3644 writeback:0 unstable:0
 slab_reclaimable:23415 slab_unreclaimable:11066
 mapped:7925 shmem:322 pagetables:2756 bounce:0
 free:11987 free_pcp:747 free_cma:0
[10469.336993] Node 0 active_anon:223500kB inactive_anon:281928kB 
active_file:286120kB inactive_file:309892kB unevictable:2404kB 
isolated(anon):0kB isolated(file):0

Xen-unstable + Linux 4.2-rc4: GPF RIP: e030:[] [] detach_if_pending+0x18/0x80

2015-07-29 Thread linux


Hi,

Running on Xen testing a 4.2-rc4 kernel it got the crash below.

Could this be related to the changes in 
3bb475a3446facd0425d3f2fe7e85bf03c5c6c05 ?


It crashes dom0 when i put some strain onto the network + bridge.

--
Sander


[ 2108.078763] general protection fault:  [#1] SMP
[ 2108.102839] Modules linked in:
[ 2108.121598] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.2.0-rc4-20150728-linus-noipv6-doflr+ #1
[ 2108.157188] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
V1.8B1 09/13/2010
[ 2108.190430] task: 8221a580 ti: 8220 task.ti: 
8220
[ 2108.222309] RIP: e030:[]  [] 
detach_if_pending+0x18/0x80

[ 2108.257037] RSP: e02b:88005f603818  EFLAGS: 00010086
[ 2108.282424] RAX: 88005f6cf410 RBX: 8800511a6a60 RCX: 
dead00200200
[ 2108.313210] RDX:  RSI: 88005f60e5c0 RDI: 
8800511a6a60
[ 2108.344013] RBP: 88005f603818 R08: 0001 R09: 
0001
[ 2108.374795] R10: 0003 R11: 8800511a69c0 R12: 

[ 2108.405446] R13: 000100098b49 R14: 00015f90 R15: 
88005f60e5c0
[ 2108.435784] FS:  7fe7e790d700() GS:88005f60() 
knlGS:

[ 2108.469081] CS:  e033 DS:  ES:  CR0: 8005003b
[ 2108.495152] CR2: 017eeca0 CR3: 04c3 CR4: 
0660

[ 2108.525520] Stack:
[ 2108.540406]  88005f603868 8110edbf 810fb1e1 
0200
[ 2108.571288]  0003 8800511a69c0  
880004d5d600
[ 2108.602370]  00015f90  88005f603898 
819b3ad3

[ 2108.633153] Call Trace:
[ 2108.649088]  
[ 2108.654784]  [] mod_timer_pending+0x3f/0xe0
[ 2108.689320]  [] ? 
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20

[ 2108.721899]  [] __nf_ct_refresh_acct+0xa3/0xb0
[ 2108.748221]  [] tcp_packet+0xb3b/0x1290
[ 2108.772816]  [] ? br_forward_finish+0x25/0x80
[ 2108.798706]  [] ? irq_to_desc+0x12/0x20
[ 2108.822802]  [] ? __local_bh_enable_ip+0x2a/0x90
[ 2108.849145]  [] ? 
__nf_conntrack_find_get+0x129/0x2a0

[ 2108.876796]  [] nf_conntrack_in+0x29c/0x7c0
[ 2108.901830]  [] ipv4_conntrack_in+0x21/0x30
[ 2108.926819]  [] nf_iterate+0x4c/0x80
[ 2108.949863]  [] nf_hook_slow+0x64/0xc0
[ 2108.973297]  [] br_nf_pre_routing+0x33c/0x350
[ 2108.34]  [] ? br_nf_forward_ip+0x3d0/0x3d0
[ 2109.025256]  [] nf_iterate+0x4c/0x80
[ 2109.047697]  [] nf_hook_slow+0x64/0xc0
[ 2109.070579]  [] br_handle_frame+0x190/0x270
[ 2109.094725]  [] ? br_handle_local_finish+0x50/0x50
[ 2109.120526]  [] ? 
br_handle_frame_finish+0x4b0/0x4b0
[ 2109.146523]  [] 
__netif_receive_skb_core+0x12b/0x970
[ 2109.172583]  [] ? 
set_foreign_p2m_mapping+0x19d/0x3a0

[ 2109.198956]  [] __netif_receive_skb+0x15/0x70
[ 2109.223214]  [] 
netif_receive_skb_internal+0x1e/0x80

[ 2109.249274]  [] netif_receive_skb_sk+0xc/0x10
[ 2109.273450]  [] xenvif_tx_action+0x6a9/0x830
[ 2109.297488]  [] ? rtl8169_poll+0x8d/0x600
[ 2109.320699]  [] xenvif_poll+0x29/0x70
[ 2109.342620]  [] net_rx_action+0x1f7/0x300
[ 2109.365337]  [] __do_softirq+0x103/0x210
[ 2109.387677]  [] irq_exit+0x4b/0xa0
[ 2109.408519]  [] xen_evtchn_do_upcall+0x34/0x50
[ 2109.432419]  [] 
xen_do_hypervisor_callback+0x1e/0x40

[ 2109.457772]  
[ 2109.463493]  [] ? xen_hypercall_sched_op+0xa/0x20
[ 2109.494059]  [] ? xen_hypercall_sched_op+0xa/0x20
[ 2109.518541]  [] ? xen_safe_halt+0x10/0x20
[ 2109.540952]  [] ? default_idle+0x13/0x20
[ 2109.563055]  [] ? arch_cpu_idle+0xa/0x10
[ 2109.585071]  [] ? default_idle_call+0x2e/0x50
[ 2109.608257]  [] ? cpu_startup_entry+0x272/0x2e0
[ 2109.631954]  [] ? rest_init+0x77/0x80
[ 2109.652953]  [] ? start_kernel+0x442/0x44f
[ 2109.675099]  [] ? 
x86_64_start_reservations+0x2a/0x2c

[ 2109.700145]  [] ? xen_start_kernel+0x550/0x55c
[ 2109.723264] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 
00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 
74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48

[ 2109.789794] RIP  [] detach_if_pending+0x18/0x80
[ 2109.813416]  RSP 
[ 2109.829199] ---[ end trace 042bd0c1a92729d3 ]---
[ 2109.848376] Kernel panic - not syncing: Fatal exception in interrupt
[ 2109.872793] Kernel Offset: disabled
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80

2015-08-12 Thread linux


Hi,

On my box running Xen with a 4.2-rc6 kernel i still get this splat in 
dom0,

which crashes the box.
(i reported a similar splat before (at rc4) here, 
http://www.spinics.net/lists/netdev/msg337570.html)


Never seen this one on 4.1, so it seems a regression.

--
Sander


[81133.193439] general protection fault:  [#1] SMP
[81133.204284] Modules linked in:
[81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 
4.2.0-rc6-20150811-linus-doflr+ #1
[81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
V1.8B1 09/13/2010
[81133.236237] task: 880059b91580 ti: 880059bb4000 task.ti: 
880059bb4000
[81133.246808] RIP: e030:[]  [] 
detach_if_pending+0x18/0x80

[81133.257354] RSP: e02b:880059bb7848  EFLAGS: 00010086
[81133.267749] RAX: 88004eddc7f0 RBX: 88000e20ae08 RCX: 
dead00200200
[81133.278201] RDX:  RSI: 88005f60e600 RDI: 
88000e20ae08
[81133.288723] RBP: 880059bb7848 R08: 0001 R09: 
0001
[81133.298930] R10: 0003 R11: 88000e20ad68 R12: 

[81133.308875] R13: 000101735569 R14: 00015f90 R15: 
88005f60e600
[81133.318845] FS:  7f28c6f7c800() GS:88005f60() 
knlGS:

[81133.328864] CS:  e033 DS:  ES:  CR0: 8005003b
[81133.338693] CR2: 807f6800 CR3: 3d55c000 CR4: 
0660

[81133.348462] Stack:
[81133.358005]  880059bb7898 8110fe3f 810fc261 
0200
[81133.367682]  0003 88000e20ad68  
88005854d400
[81133.377064]  00015f90  880059bb78c8 
819b5243

[81133.386374] Call Trace:
[81133.395596]  [] mod_timer_pending+0x3f/0xe0
[81133.404999]  [] ? 
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20

[81133.414255]  [] __nf_ct_refresh_acct+0xa3/0xb0
[81133.423137]  [] tcp_packet+0xb3b/0x1290
[81133.431894]  [] ? __local_bh_enable_ip+0x2a/0x90
[81133.440622]  [] ? 
__nf_conntrack_find_get+0x129/0x2a0

[81133.449339]  [] nf_conntrack_in+0x29c/0x7c0
[81133.457940]  [] ipv4_conntrack_in+0x21/0x30
[81133.466296]  [] nf_iterate+0x4c/0x80
[81133.474401]  [] nf_hook_slow+0x64/0xc0
[81133.482615]  [] ip_rcv+0x2ec/0x380
[81133.490781]  [] ? 
ip_local_deliver_finish+0x130/0x130
[81133.498790]  [] 
__netif_receive_skb_core+0x2a0/0x970

[81133.506714]  [] ? inet_gro_receive+0x1c8/0x200
[81133.514609]  [] __netif_receive_skb+0x15/0x70
[81133.522333]  [] 
netif_receive_skb_internal+0x1e/0x80

[81133.529840]  [] napi_gro_receive+0x6b/0x90
[81133.537173]  [] rtl8169_poll+0x2e6/0x600
[81133.54]  [] ? 
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20

[81133.551566]  [] net_rx_action+0x1f7/0x300
[81133.558412]  [] __do_softirq+0x103/0x210
[81133.565353]  [] run_ksoftirqd+0x37/0x60
[81133.572359]  [] smpboot_thread_fn+0x130/0x190
[81133.579215]  [] ? sort_range+0x20/0x20
[81133.586042]  [] kthread+0xee/0x110
[81133.592792]  [] ? 
kthread_create_on_node+0x1b0/0x1b0

[81133.599694]  [] ret_from_fork+0x3f/0x70
[81133.606662]  [] ? 
kthread_create_on_node+0x1b0/0x1b0
[81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 
00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 
74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48

[81133.627196] RIP  [] detach_if_pending+0x18/0x80
[81133.634036]  RSP 
[81133.640817] ---[ end trace eaf596e1fcf6a591 ]---
[81133.647521] Kernel panic - not syncing: Fatal exception in interrupt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux 4.2-rc6 regression: RIP: e030:[] [] detach_if_pending+0x18/0x80

2015-08-12 Thread linux


On 2015-08-12 22:41, Eric Dumazet wrote:

On Wed, 2015-08-12 at 21:19 +0200, li...@eikelenboom.it wrote:

Hi,

On my box running Xen with a 4.2-rc6 kernel i still get this splat in
dom0,
which crashes the box.
(i reported a similar splat before (at rc4) here,
http://www.spinics.net/lists/netdev/msg337570.html)

Never seen this one on 4.1, so it seems a regression.

--
Sander


[81133.193439] general protection fault:  [#1] SMP
[81133.204284] Modules linked in:
[81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
4.2.0-rc6-20150811-linus-doflr+ #1
[81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , 
BIOS

V1.8B1 09/13/2010
[81133.236237] task: 880059b91580 ti: 880059bb4000 task.ti:
880059bb4000
[81133.246808] RIP: e030:[]  []
detach_if_pending+0x18/0x80
[81133.257354] RSP: e02b:880059bb7848  EFLAGS: 00010086
[81133.267749] RAX: 88004eddc7f0 RBX: 88000e20ae08 RCX:
dead00200200
[81133.278201] RDX:  RSI: 88005f60e600 RDI:
88000e20ae08
[81133.288723] RBP: 880059bb7848 R08: 0001 R09:
0001
[81133.298930] R10: 0003 R11: 88000e20ad68 R12:

[81133.308875] R13: 000101735569 R14: 00015f90 R15:
88005f60e600
[81133.318845] FS:  7f28c6f7c800() GS:88005f60()
knlGS:
[81133.328864] CS:  e033 DS:  ES:  CR0: 8005003b
[81133.338693] CR2: 807f6800 CR3: 3d55c000 CR4:
0660
[81133.348462] Stack:
[81133.358005]  880059bb7898 8110fe3f 810fc261
0200
[81133.367682]  0003 88000e20ad68 
88005854d400
[81133.377064]  00015f90  880059bb78c8
819b5243
[81133.386374] Call Trace:
[81133.395596]  [] mod_timer_pending+0x3f/0xe0
[81133.404999]  [] ?
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[81133.414255]  [] __nf_ct_refresh_acct+0xa3/0xb0
[81133.423137]  [] tcp_packet+0xb3b/0x1290
[81133.431894]  [] ? __local_bh_enable_ip+0x2a/0x90
[81133.440622]  [] ?
__nf_conntrack_find_get+0x129/0x2a0
[81133.449339]  [] nf_conntrack_in+0x29c/0x7c0
[81133.457940]  [] ipv4_conntrack_in+0x21/0x30
[81133.466296]  [] nf_iterate+0x4c/0x80
[81133.474401]  [] nf_hook_slow+0x64/0xc0
[81133.482615]  [] ip_rcv+0x2ec/0x380
[81133.490781]  [] ?
ip_local_deliver_finish+0x130/0x130
[81133.498790]  []
__netif_receive_skb_core+0x2a0/0x970
[81133.506714]  [] ? inet_gro_receive+0x1c8/0x200
[81133.514609]  [] __netif_receive_skb+0x15/0x70
[81133.522333]  []
netif_receive_skb_internal+0x1e/0x80
[81133.529840]  [] napi_gro_receive+0x6b/0x90
[81133.537173]  [] rtl8169_poll+0x2e6/0x600
[81133.54]  [] ?
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[81133.551566]  [] net_rx_action+0x1f7/0x300
[81133.558412]  [] __do_softirq+0x103/0x210
[81133.565353]  [] run_ksoftirqd+0x37/0x60
[81133.572359]  [] smpboot_thread_fn+0x130/0x190
[81133.579215]  [] ? sort_range+0x20/0x20
[81133.586042]  [] kthread+0xee/0x110
[81133.592792]  [] ?
kthread_create_on_node+0x1b0/0x1b0
[81133.599694]  [] ret_from_fork+0x3f/0x70
[81133.606662]  [] ?
kthread_create_on_node+0x1b0/0x1b0
[81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 
00
00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 
08

74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48
[81133.627196] RIP  [] detach_if_pending+0x18/0x80
[81133.634036]  RSP 
[81133.640817] ---[ end trace eaf596e1fcf6a591 ]---
[81133.647521] Kernel panic - not syncing: Fatal exception in 
interrupt


This looks like the bug fixed in David Miller net tree :

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=2235f2ac75fd2501c251b0b699a9632e80239a6d


Will pull the net-tree in and re-test.
But since it only seems to crash after a day or two, that will take some 
time.


Thanks,

Sander
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/3] vhost_net: access ptr ring using tap recvmsg

2019-10-11 Thread prashantbhole . linux

From: Prashant Bhole 

vhost_net needs to peek tun packet sizes to allocate virtio buffers.
Currently it directly accesses tap ptr ring to do it. Jason Wang
suggested to achieve this using msghdr->msg_control and modifying the
behavior of tap recvmsg.

This change will be useful in future in case of virtio-net XDP
offload. Where packets will be XDP processed in tap recvmsg and vhost
will see only non XDP_DROP'ed packets.

Patch 1: reorganizes the tun_msg_ctl so that it can be extended by
 the means of different commands. tap sendmsg recvmsg will behave
 according to commands.

Patch 2: modifies recvmsg implementation to produce packet pointers.
 vhost_net uses recvmsg API instead of ptr_ring_consume().

Patch 3: removes ptr ring usage in vhost and functions those export
 ptr ring from tun/tap.

Prashant Bhole (3):
  tuntap: reorganize tun_msg_ctl usage
  vhost_net: user tap recvmsg api to access ptr ring
  tuntap: remove usage of ptr ring in vhost_net

 drivers/net/tap.c  | 44 ++-
 drivers/net/tun.c  | 45 +++-
 drivers/vhost/net.c| 79 ++
 include/linux/if_tun.h |  9 +++--
 4 files changed, 103 insertions(+), 74 deletions(-)

-- 
2.21.0

[PATCH net-next 1/3] tuntap: reorganize tun_msg_ctl usage

2019-10-11 Thread prashantbhole . linux

From: Prashant Bhole 

In order to extend the usage of tun_msg_ctl structure, this patch
changes the member name from type to cmd. Also following definitions
are changed:
TUN_MSG_PTR : TUN_CMD_BATCH
TUN_MSG_UBUF: TUN_CMD_PACKET

Signed-off-by: Prashant Bhole 
---
 drivers/net/tap.c  | 9 ++---
 drivers/net/tun.c  | 8 ++--
 drivers/vhost/net.c| 4 ++--
 include/linux/if_tun.h | 6 +++---
 4 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 3ae70c7e6860..01bd260ce60c 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1213,9 +1213,10 @@ static int tap_sendmsg(struct socket *sock, struct 
msghdr *m,
struct tap_queue *q = container_of(sock, struct tap_queue, sock);
struct tun_msg_ctl *ctl = m->msg_control;
struct xdp_buff *xdp;
+   void *ptr = NULL;
int i;
 
-   if (ctl && (ctl->type == TUN_MSG_PTR)) {
+   if (ctl && ctl->cmd == TUN_CMD_BATCH) {
for (i = 0; i < ctl->num; i++) {
xdp = &((struct xdp_buff *)ctl->ptr)[i];
tap_get_user_xdp(q, xdp);
@@ -1223,8 +1224,10 @@ static int tap_sendmsg(struct socket *sock, struct 
msghdr *m,
return 0;
}
 
-   return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
-   m->msg_flags & MSG_DONTWAIT);
+   if (ctl && ctl->cmd == TUN_CMD_PACKET)
+   ptr = ctl->ptr;
+
+   return tap_get_user(q, ptr, &m->msg_iter, m->msg_flags & MSG_DONTWAIT);
 }
 
 static int tap_recvmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0413d182d782..29711671959b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2529,11 +2529,12 @@ static int tun_sendmsg(struct socket *sock, struct 
msghdr *m, size_t total_len)
struct tun_struct *tun = tun_get(tfile);
struct tun_msg_ctl *ctl = m->msg_control;
struct xdp_buff *xdp;
+   void *ptr = NULL;
 
if (!tun)
return -EBADFD;
 
-   if (ctl && (ctl->type == TUN_MSG_PTR)) {
+   if (ctl && ctl->cmd == TUN_CMD_BATCH) {
struct tun_page tpage;
int n = ctl->num;
int flush = 0;
@@ -2560,7 +2561,10 @@ static int tun_sendmsg(struct socket *sock, struct 
msghdr *m, size_t total_len)
goto out;
}
 
-   ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
+   if (ctl && ctl->cmd == TUN_CMD_PACKET)
+   ptr = ctl->ptr;
+
+   ret = tun_get_user(tun, tfile, ptr, &m->msg_iter,
   m->msg_flags & MSG_DONTWAIT,
   m->msg_flags & MSG_MORE);
 out:
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1a2dd53caade..5946d2775bd0 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -462,7 +462,7 @@ static void vhost_tx_batch(struct vhost_net *net,
   struct msghdr *msghdr)
 {
struct tun_msg_ctl ctl = {
-   .type = TUN_MSG_PTR,
+   .cmd = TUN_CMD_BATCH,
.num = nvq->batched_xdp,
.ptr = nvq->xdp,
};
@@ -902,7 +902,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, 
struct socket *sock)
ubuf->desc = nvq->upend_idx;
refcount_set(&ubuf->refcnt, 1);
msg.msg_control = &ctl;
-   ctl.type = TUN_MSG_UBUF;
+   ctl.cmd = TUN_CMD_PACKET;
    ctl.ptr = ubuf;
msg.msg_controllen = sizeof(ctl);
    ubufs = nvq->ubufs;
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 5bda8cf457b6..bdfa671612db 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -11,10 +11,10 @@
 
 #define TUN_XDP_FLAG 0x1UL
 
-#define TUN_MSG_UBUF 1
-#define TUN_MSG_PTR  2
+#define TUN_CMD_PACKET 1
+#define TUN_CMD_BATCH  2
 struct tun_msg_ctl {
-   unsigned short type;
+   unsigned short cmd;
unsigned short num;
void *ptr;
 };
-- 
2.21.0

[PATCH net-next 2/3] vhost_net: user tap recvmsg api to access ptr ring

2019-10-11 Thread prashantbhole . linux

From: Prashant Bhole 

Currently vhost_net directly accesses ptr ring of tap driver to
fetch Rx packet pointers. In order to avoid it this patch modifies
tap driver's recvmsg api to do additional task of fetching Rx packet
pointers.

A special struct tun_msg_ctl is already being usedd via msg_control
for tun Rx XDP batching. This patch extends tun_msg_ctl usage to
send sub commands to recvmsg api. recvmsg can now produce/unproduce
pointers from ptr ring as an additional task.

This will be useful in future in implementation of virtio-net XDP
offload feature. Where packets will be XDP batch processed in
tun_recvmsg.

Signed-off-by: Prashant Bhole 
---
 drivers/net/tap.c  | 22 +++-
 drivers/net/tun.c  | 24 +-
 drivers/vhost/net.c| 46 +-
 include/linux/if_tun.h |  3 +++
 4 files changed, 83 insertions(+), 12 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 01bd260ce60c..3d0bf382dbbc 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1234,8 +1234,28 @@ static int tap_recvmsg(struct socket *sock, struct 
msghdr *m,
   size_t total_len, int flags)
 {
struct tap_queue *q = container_of(sock, struct tap_queue, sock);
-   struct sk_buff *skb = m->msg_control;
+   struct tun_msg_ctl *ctl = m->msg_control;
+   struct sk_buff *skb = NULL;
int ret;
+
+   if (ctl) {
+   switch (ctl->cmd) {
+   case TUN_CMD_PACKET:
+   skb = ctl->ptr;
+   break;
+   case TUN_CMD_PRODUCE_PTRS:
+   return ptr_ring_consume_batched(&q->ring,
+   ctl->ptr_array,
+   ctl->num);
+   case TUN_CMD_UNPRODUCE_PTRS:
+   ptr_ring_unconsume(&q->ring, ctl->ptr_array, ctl->num,
+  tun_ptr_free);
+   return 0;
+   default:
+   return -EINVAL;
+   }
+   }
+
if (flags & ~(MSG_DONTWAIT|MSG_TRUNC)) {
kfree_skb(skb);
return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 29711671959b..7d4886f53389 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2577,7 +2577,8 @@ static int tun_recvmsg(struct socket *sock, struct msghdr 
*m, size_t total_len,
 {
struct tun_file *tfile = container_of(sock, struct tun_file, socket);
struct tun_struct *tun = tun_get(tfile);
-   void *ptr = m->msg_control;
+   struct tun_msg_ctl *ctl = m->msg_control;
+   void *ptr = NULL;
int ret;
 
if (!tun) {
@@ -2585,6 +2586,27 @@ static int tun_recvmsg(struct socket *sock, struct 
msghdr *m, size_t total_len,
goto out_free;
}
 
+   if (ctl) {
+   switch (ctl->cmd) {
+   case TUN_CMD_PACKET:
+   ptr = ctl->ptr;
+   break;
+   case TUN_CMD_PRODUCE_PTRS:
+   ret = ptr_ring_consume_batched(&tfile->tx_ring,
+  ctl->ptr_array,
+  ctl->num);
+   goto out;
+   case TUN_CMD_UNPRODUCE_PTRS:
+   ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr_array,
+  ctl->num, tun_ptr_free);
+   ret = 0;
+   goto out;
+   default:
+   ret = -EINVAL;
+   goto out_put_tun;
+   }
+   }
+
if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) {
ret = -EINVAL;
goto out_put_tun;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5946d2775bd0..5e5c1063606c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -175,24 +175,44 @@ static void *vhost_net_buf_consume(struct vhost_net_buf 
*rxq)
 
 static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq)
 {
+   struct vhost_virtqueue *vq = &nvq->vq;
+   struct socket *sock = vq->private_data;
struct vhost_net_buf *rxq = &nvq->rxq;
+   struct tun_msg_ctl ctl = {
+   .cmd = TUN_CMD_PRODUCE_PTRS,
+   .ptr_array = rxq->queue,
+   .num = VHOST_NET_BATCH,
+   };
+   struct msghdr msg = {
+   .msg_control = &ctl,
+   };
 
rxq->head = 0;
-   rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue,
- VHOST_NET_BATCH);
+   rxq->tail = sock->ops->recvmsg(sock, &msg, 0, 0);
+   if (rxq->tail < 0)
+

[PATCH net-next 3/3] tuntap: remove usage of ptr ring in vhost_net

2019-10-11 Thread prashantbhole . linux

From: Prashant Bhole 

Remove usage of ptr ring of tuntap in vhost_net and remove the
functions exported from tuntap drivers to get ptr ring.

Signed-off-by: Prashant Bhole 
---
 drivers/net/tap.c   | 13 -
 drivers/net/tun.c   | 13 -
 drivers/vhost/net.c | 31 ---
 3 files changed, 4 insertions(+), 53 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 3d0bf382dbbc..27ffd2210375 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1298,19 +1298,6 @@ struct socket *tap_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tap_get_socket);
 
-struct ptr_ring *tap_get_ptr_ring(struct file *file)
-{
-   struct tap_queue *q;
-
-   if (file->f_op != &tap_fops)
-   return ERR_PTR(-EINVAL);
-   q = file->private_data;
-   if (!q)
-   return ERR_PTR(-EBADFD);
-   return &q->ring;
-}
-EXPORT_SYMBOL_GPL(tap_get_ptr_ring);
-
 int tap_queue_resize(struct tap_dev *tap)
 {
struct net_device *dev = tap->dev;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7d4886f53389..75893921411b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3750,19 +3750,6 @@ struct socket *tun_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tun_get_socket);
 
-struct ptr_ring *tun_get_tx_ring(struct file *file)
-{
-   struct tun_file *tfile;
-
-   if (file->f_op != &tun_fops)
-   return ERR_PTR(-EINVAL);
-   tfile = file->private_data;
-   if (!tfile)
-   return ERR_PTR(-EBADFD);
-   return &tfile->tx_ring;
-}
-EXPORT_SYMBOL_GPL(tun_get_tx_ring);
-
 module_init(tun_init);
 module_exit(tun_cleanup);
 MODULE_DESCRIPTION(DRV_DESCRIPTION);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5e5c1063606c..0d302efadf44 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -122,7 +122,6 @@ struct vhost_net_virtqueue {
/* Reference counting for outstanding ubufs.
 * Protected by vq mutex. Writers must also take device mutex. */
struct vhost_net_ubuf_ref *ubufs;
-   struct ptr_ring *rx_ring;
struct vhost_net_buf rxq;
/* Batched XDP buffs */
struct xdp_buff *xdp;
@@ -997,8 +996,9 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, 
struct sock *sk)
int len = 0;
unsigned long flags;
 
-   if (rvq->rx_ring)
-   return vhost_net_buf_peek(rvq);
+   len = vhost_net_buf_peek(rvq);
+   if (len)
+   return len;
 
spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
head = skb_peek(&sk->sk_receive_queue);
@@ -1189,7 +1189,7 @@ static void handle_rx(struct vhost_net *net)
goto out;
}
busyloop_intr = false;
-   if (nvq->rx_ring) {
+   if (!vhost_net_buf_is_empty(&nvq->rxq)) {
ctl.cmd = TUN_CMD_PACKET;
ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
msg.msg_control = &ctl;
@@ -1345,7 +1345,6 @@ static int vhost_net_open(struct inode *inode, struct 
file *f)
n->vqs[i].batched_xdp = 0;
n->vqs[i].vhost_hlen = 0;
n->vqs[i].sock_hlen = 0;
-   n->vqs[i].rx_ring = NULL;
vhost_net_buf_init(&n->vqs[i].rxq);
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
@@ -1374,7 +1373,6 @@ static struct socket *vhost_net_stop_vq(struct vhost_net 
*n,
vhost_net_disable_vq(n, vq);
vq->private_data = NULL;
vhost_net_buf_unproduce(nvq);
-   nvq->rx_ring = NULL;
mutex_unlock(&vq->mutex);
return sock;
 }
@@ -1470,25 +1468,6 @@ static struct socket *get_raw_socket(int fd)
return ERR_PTR(r);
 }
 
-static struct ptr_ring *get_tap_ptr_ring(int fd)
-{
-   struct ptr_ring *ring;
-   struct file *file = fget(fd);
-
-   if (!file)
-   return NULL;
-   ring = tun_get_tx_ring(file);
-   if (!IS_ERR(ring))
-   goto out;
-   ring = tap_get_ptr_ring(file);
-   if (!IS_ERR(ring))
-   goto out;
-   ring = NULL;
-out:
-   fput(file);
-   return ring;
-}
-
 static struct socket *get_tap_socket(int fd)
 {
struct file *file = fget(fd);
@@ -1572,8 +1551,6 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
r = vhost_net_enable_vq(n, vq);
if (r)
goto err_used;
-   if (index == VHOST_NET_VQ_RX)
-   nvq->rx_ring = get_tap_ptr_ring(fd);
 
oldubufs = nvq->ubufs;
nvq->ubufs = ubufs;
-- 
2.21.0

Re: The future of the TI ACX wireless driver

2007-06-10 Thread linux-wireless

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi

Nice work David.

I have extracted the acxsm driver out of the wireless-2.6 git tree and
made it compile on at least kernel 2.6.20 and 2.6.22-rc4
It can be found here:
http://www.hauke-m.de/fileadmin/acx/tiacx-20070522.tar.bz2

I also chanced some lines in the normal acx driver so it compiles with
kenrel 2.6.22 and later
http://www.hauke-m.de/fileadmin/acx/acx-20070610.tar.bz2

Bouth are loading on my system, but I can't test them because I haven't
got any acx100 chip based card any more, it would be nice to get
positive feedback, when it's working.

I will write some pages into the Wiki for these versions, so everyone
can test them.

I am not a big kernel hacker but I want to learn it too.

Hauke Mehrtens
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGbC+UrcX0gpXFjnsRAiZvAJ9HqeLhYD71ziIY8nn1/s165IgW9wCeOkta
jcvdcupbincXF0WpXVQse5s=
=1Dty
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bonding : Monitoring of 4965 wireless card

2008-01-09 Thread patnel972-linux

Hi,

I want to make a bond with my wireless card. The ipw driver create two
 interfaces (wlan0 and wmaster0). When i switch the rf_kill button,
 ifplug detect wlan0 unplugged but not wmaster0. If i down wlan0 (while
 rf_kil ), bonding detect the inactivity when i up the interface.

Have you some idea where is the problem? the driver or the miimon of
 the module?

my module parameters mode=1 miimon=100 primary eth0

Thanks



  
_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
http://mail.yahoo.fr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re : Bonding : Monitoring of 4965 wireless card

2008-01-09 Thread patnel972-linux

I'm doing a bonding with my eth0(e1000 driver) and my wlan
 card(iwl4965). It work like i want, when i'm in wifi the dhcp give me my 
ethernet
 adress. When i unplug the cable, my wlan card become in charge of
 network. My problem is when i disconnect the wlan card, the bonding does not
 detect it correctly, and ifplugstatus show me wlan0 not connected and
 wmaster0 connected!! The bonding module does not say no active
 interface, it work like wlan is on.

Am i clear?

Ps:(sorry i have trouble with my mail)
- Message d'origine 
De : John W. Linville <[EMAIL PROTECTED]>
À : [EMAIL PROTECTED]
Cc : netdev@vger.kernel.org
Envoyé le : Mercredi, 9 Janvier 2008, 18h02mn 05s
Objet : Re: Bonding : Monitoring of 4965 wireless card

On Wed, Jan 09, 2008 at 09:00:05AM +, [EMAIL PROTECTED]
 wrote:
> Hi,
> 
> I want to make a bond with my wireless card. The ipw driver create
 two
>  interfaces (wlan0 and wmaster0). When i switch the rf_kill button,
>  ifplug detect wlan0 unplugged but not wmaster0. If i down wlan0
 (while
>  rf_kil ), bonding detect the inactivity when i up the interface.
> 
> Have you some idea where is the problem? the driver or the miimon of
>  the module?
> 
> my module parameters mode=1 miimon=100 primary eth0

I'm not sure I understand your description...what are you trying to do?
How exactly is it failing?

John
-- 
John W. Linville
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  _ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
 Mail http://mail.yahoo.fr

_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
http://mail.yahoo.fr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re : Re : Bonding : Monitoring of 4965 wireless card

2008-01-09 Thread patnel972-linux

I ignore it, but it seems like it prevent bonding detect link of wlan0. I 
enslave wlan0 and i already use use_carrier=1;
I use bond to have my etherenet ip in wifi at office, else the wireless 
connection give temporary and you must pass through proxy then.
I'll try arp monitoring but this is annoying i c'ant test localhost. Is there a 
way to test localhost with arp, without pass through lo ? 

- Message d'origine 
De : John W. Linville <[EMAIL PROTECTED]>
À : [EMAIL PROTECTED]
Envoyé le : Mercredi, 9 Janvier 2008, 21h24mn 10s
Objet : Re: Re : Bonding : Monitoring of 4965 wireless card

On Wed, Jan 09, 2008 at 07:31:37PM +, [EMAIL PROTECTED]
 wrote:
> I'm doing a bonding with my eth0(e1000 driver) and my wlan
> card(iwl4965). It work like i want, when i'm in wifi the dhcp give
> me my ethernet adress. When i unplug the cable, my wlan card become
> in charge of network. My problem is when i disconnect the wlan card,
> the bonding does not detect it correctly, and ifplugstatus show me
> wlan0 not connected and wmaster0 connected!! The bonding module does
> not say no active interface, it work like wlan is on.
> 
> Am i clear?

Yes, that is much more clear to me.

What (if anything) are you doing to wmaster0?  You should just
ignore it.

FWIW, miimon is not going to work with a mac80211-based device at
this time.  The miimon option relies on support for either miitool
or ethtool, and mac80211 device support neither of those.

Hmmm...it looks like there is a use_carrier option for miimon.
Based on its description I would think it would work.  Of course,
I think it is supposed to be the default and you don't seem to be
disabling it.  So, I'm not sure what is happening.

Are you enslaving wlan0?  Or wmaster0?  Make sure it is wlan0.
Also, please add use_carrier=1 to your bonding module options.
Does this change the behaviour?  If not, please open a bug at either
bugzilla.redhat.com (if you are a Fedora, RHEL, or even CentOS user)
or bugzilla.kernel.org (otherwise).

In the meantime, you might try using NetworkManger.  Or you
might consider using ARP monitoring.  The former probably is the
best solution if you are mobile (e.g.  at a cafe or other hotspot)
while the latter might be appropriate if you are just plugging and
un-plugging within the same network (like at home or office).

Hth!

John
-- 
John W. Linville
[EMAIL PROTECTED]

_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
http://mail.yahoo.fr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re : Re : Re : Bonding : Monitoring of 4965 wireless card

2008-01-09 Thread patnel972-linux

I mean that instead of arp test an ip in lan or else, i want it to test 
127.0.0.1 but in order to do this it must go out and re-enter and then use 
wlan0 to go out.

- Message d'origine 
De : Jay Vosburgh <[EMAIL PROTECTED]>
À : [EMAIL PROTECTED]
Cc : John W. Linville <[EMAIL PROTECTED]>; netdev@vger.kernel.org
Envoyé le : Mercredi, 9 Janvier 2008, 22h36mn 00s
Objet : Re: Re : Re : Bonding : Monitoring of 4965 wireless card 

[EMAIL PROTECTED] wrote:

>I ignore it, but it seems like it prevent bonding detect link of
 wlan0. I enslave wlan0 and i already use use_carrier=1;

The default for bonding is use_carrier=1, which makes bonding
use the device driver's netif_carrier_on/off state for link detection.
Bonding only checks via ethtool/mii if use_carrier=0.

>I'll try arp monitoring but this is annoying i c'ant test localhost.
 Is there a way to test localhost with arp, without pass through lo ? 

What do you mean by "test localhost with arp, without pass
through lo"?  ARP monitoring issues probes (ARPs) to a remote
destination to confirm that there is connectivity; I'm not sure what
localhost has to do with it.

In general, though, I have not tested bonding with wireless
adapters, so I'm unfamiliar with how well it does or does not work.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]

_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
http://mail.yahoo.fr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card

2008-01-10 Thread patnel972-linux

Yes it's what i'm looking for. I don't understand how to change the 
arp_ip_target with the gateway, arp_ip_target is a module option.


- Message d'origine 
De : Jay Vosburgh <[EMAIL PROTECTED]>
À : [EMAIL PROTECTED]
Cc : netdev@vger.kernel.org
Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s
Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card 

[EMAIL PROTECTED] wrote:

>I mean that instead of arp test an ip in lan or else, i want it to
 test 127.0.0.1 but in order to do this it must go out and re-enter and
 then use wlan0 to go out.

In other words, what I think you're saying (and I'm not entirely
sure here) is that you want probes to go to a remote node on the
network, and back, without having to actually know the identity of the
remote node (because, presumably, on a roaming type of wireless
configuration, your gateway and whatnot can change from time to time).

Is that what you're looking for?

That isn't available now, but might be straightforward to plug
into the address update system to keep the arp_ip_target up to date as
the current gateway as the gateway changes.  I haven't looked into the
details of doing that, but in theory it sounds straightforward.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]


>
>- Message d'origine 
>De : Jay Vosburgh <[EMAIL PROTECTED]>
>À : [EMAIL PROTECTED]
>Cc : John W. Linville <[EMAIL PROTECTED]>; netdev@vger.kernel.org
>Envoyé le : Mercredi, 9 Janvier 2008, 22h36mn 00s
>Objet : Re: Re : Re : Bonding : Monitoring of 4965 wireless card 
>
>[EMAIL PROTECTED] wrote:
>
>>I ignore it, but it seems like it prevent bonding detect link of
> wlan0. I enslave wlan0 and i already use use_carrier=1;
>
>The default for bonding is use_carrier=1, which makes bonding
>use the device driver's netif_carrier_on/off state for link detection.
>Bonding only checks via ethtool/mii if use_carrier=0.
>
>>I'll try arp monitoring but this is annoying i c'ant test localhost.
> Is there a way to test localhost with arp, without pass through lo ? 
>
>What do you mean by "test localhost with arp, without pass
>through lo"?  ARP monitoring issues probes (ARPs) to a remote
>destination to confirm that there is connectivity; I'm not sure what
>localhost has to do with it.
>
>In general, though, I have not tested bonding with wireless
>adapters, so I'm unfamiliar with how well it does or does not work.
>
>-J
>
>---
>-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
>
>
>
>
>
>
  _ 
>Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers
 Yahoo! Mail http://mail.yahoo.fr
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to [EMAIL PROTECTED]
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html





  
_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
http://mail.yahoo.fr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re : Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card

2008-01-10 Thread patnel972-linux

I try arp monitoring but it doesn't work! Test an ip, the interface must have 
an address, and the dhcpcd is launch by ifplugd if bond0 is linked ... so it 
goes around in circles.
So i return to miimon, and i figured out that bond detect when wlan0 is 
associated and set it active interface. But when i switch rf_kill it don't 
react. So i try to deassociate and magic it detect interface off!! I presume it 
is a bug of the wlan driver which not re-initialise the info on the wlan. So i 
made a small script in acpi to provide that behavior.



- Message d'origine 
De : Jay Vosburgh <[EMAIL PROTECTED]>
À : [EMAIL PROTECTED]
Cc : netdev@vger.kernel.org
Envoyé le : Jeudi, 10 Janvier 2008, 21h59mn 20s
Objet : Re: Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card 

[EMAIL PROTECTED] wrote:

>Yes it's what i'm looking for. I don't understand how to change the
 arp_ip_target with the gateway, arp_ip_target is a module option.

If you're running a relatively recent bonding driver (version
3.0.0 or later), the arp_ip_targets can be changed on the fly via
 sysfs,
e.g.,

echo +10.0.0.1 > /sys/class/net/bond0/bonding/arp_ip_target
echo -20.0.0.1 > /sys/class/net/bond0/bonding/arp_ip_target

You can check out Documentation/networking/bonding.txt (in the
kernel source code) for more details.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]


>- Message d'origine 
>De : Jay Vosburgh <[EMAIL PROTECTED]>
>À : [EMAIL PROTECTED]
>Cc : netdev@vger.kernel.org
>Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s
>Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card 
>
>[EMAIL PROTECTED] wrote:
>
>>I mean that instead of arp test an ip in lan or else, i want it to
> test 127.0.0.1 but in order to do this it must go out and re-enter
 and
> then use wlan0 to go out.
>
>In other words, what I think you're saying (and I'm not entirely
>sure here) is that you want probes to go to a remote node on the
>network, and back, without having to actually know the identity of the
>remote node (because, presumably, on a roaming type of wireless
>configuration, your gateway and whatnot can change from time to time).
>
>Is that what you're looking for?
>
>That isn't available now, but might be straightforward to plug
>into the address update system to keep the arp_ip_target up to date as
>the current gateway as the gateway changes.  I haven't looked into the
>details of doing that, but in theory it sounds straightforward.
>
>-J
>
>---
>-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html





  
_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
http://mail.yahoo.fr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

problem with DMA when writting driver for rtl8139?

2006-09-10 Thread mwitosz-linux

hi, everybody
my name is Mariusz, I am newbie to linux kernel, 
For several weeks I have been writing kernel driver for network card based on 
rtl8139c chip. I am writing this driver for micrococontrollers technology 
course in my university

I have some problems with DMA, i suppose.

there is a bit in Transmit Status Descriptor of RTL8139c which after 
clearing(It must be cleared to 
start transmit operation) shouldb be placed in 1 state - which according to 
RTL8139 specification means that 
DMA copy from memory to internal RTL fifo has finished.

The problem is: rtl doesn't clear this bit

I use  pci_map_single to map address of packet buffer to dma capable memory, 
then cpu_to_le32 to get physicall
address of this buffer. 

Do you have any idea what may work wrong?

here is my code:
1) rtlmodule contains functions related to initialization issues
2) rtlopen contains functions related to open device issues and interrupt 
handling
3) rtltransmit contains functions related to transmision issues

and here is output from kernel

best regards, 
Mariusz Witosz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: dsa: mv88e6xxx not receiving IPv6 multicast packets

2020-11-14 Thread Tj (Elloe Linux)

MV88E6085 switch not passing IPv6 multicast packets to CPU.

Seems to be related to interface not being in promiscuous mode.

This issue has been ongoing since at least July 2020. Latest v5.10-rc3
still suffers the issue on a Turris Mox with mv88e6085. We've not been
able to reproduce it on the Turris v4.14 stable kernel series so it
appears to be a regression.

Mox is using Debian 10 Buster.

First identified due to DHCPv6 leases not being renewed on clients being
served by isc-dhcp-server on the Mox.

Analysis showed the client IPv6 multicast solicit packets were being
received by the Mox hardware (proved via a mirror port on a managed LAN
switch) but the CPU was not receiving them (observed using tcpdump).

Further investigation has identified this also affects IPv6 neighbour
discovery for clients when not using frequent RAs from the Mox.

Currently we've found two reproducible scenarios:

1) with isc-dhcp-server configured with very short lease times (180
seconds). After mox reboot (or systemctl restart systemd-networkd)
clients successfully obtain a lease and a couple of RENEWs (requested
after 90 seconds) but then all goes silent, Mox OS no longer sees the
IPv6 multicast RENEW packets and client leases expire.

2) Immediately after reboot when DHCPv6 renewals are still possible if
on the Mox we do "tcdump -ni eth1 ip6" and immediately terminate,
tcpdump takes the interface out of promiscuous mode and IPv6 multicast
packets immediately cease to be received by the CPU. If we use 'tcpdump
--no-promiscuous-mode ..." so on termination it doesn't try to take the
interface out of promiscuous mode IPv6 multicast packets continue to be
seen by the CPU.

We've been pointed to the mv8e6xxx_dump tool and can capture data but
not sure what specifically to look for.

We've also added some pr_info() debugging into mvneta to analyse when
promiscuous mode is enabled or disabled since this seems to be strongly
related to the issue.

We believe there's a big clue in being able to reset the issue by
restarting systemd-networkd on the Mox. We've looked for but not found
any clues or indications of services on the Mox causing this but aren't
ruling this out.

Re: dsa: mv88e6xxx not receiving IPv6 multicast packets

2020-11-14 Thread Tj (Elloe Linux)

On 14/11/2020 15:56, Andrew Lunn wrote:
>> 1) with isc-dhcp-server configured with very short lease times (180
>> seconds). After mox reboot (or systemctl restart systemd-networkd)
>> clients successfully obtain a lease and a couple of RENEWs (requested
>> after 90 seconds) but then all goes silent, Mox OS no longer sees the
>> IPv6 multicast RENEW packets and client leases expire.
> 
> So it takes about 3 minutes to reproduce this?
> 
> Can you do a git bisect to figure out which change broke it? It will
> take you maybe 5 minutes per step, and given the wide range of
> kernels, i'm guessing you need around 15 steps. So maybe two hours of
> work.
> 
>   Andrew
>

I'll check if we can - the problem might be the Turris Mox kernel is
based on a board support package drop by Marvell so I'm not clear right
now how divergent they are. Hopefully the Turris kernel devs can help on
that.

Re: dsa: mv88e6xxx not receiving IPv6 multicast packets

2020-11-15 Thread Tj (Elloe Linux)

On 14/11/2020 18:49, Vladimir Oltean wrote:
> On Sat, Nov 14, 2020 at 03:39:28PM +, Tj (Elloe Linux) wrote:
>> MV88E6085 switch not passing IPv6 multicast packets to CPU.

> Is there a simple step-by-step reproducer for the issue, that doesn't
> require a lot of configuration? I've got a Mox with the 6190 switch
> running net-next and Buildroot that I could try on.

Our set-up is Mox A (CPU) + G (mini-PCIe) + F (4x USB 3.0) + 3 x E (8
port Marvell switch) + D (SFP cage)

Whilst working on this we've moved one of the E modules to another A CPU
in our lab so as not to mess with the gateway.

Running Debian 10, using systemd-networkd, which configures:

eth0 (WAN) static IPv4 and IPv6 - DHCP=no
eth1 (uplink to the switch ports) DHCP=no
lan1 (connected to external managed switch) Bridge=br-lan
br-lan static IPv4 and IPv6, Kind=bridge, IPForward=true

Whilst we're working on this issue only lan1 is connected to anything
external; a 48-port managed switch. No connection to SPF either.

We assign an IPv6 from our delegated /48 prefix to br-lan and have
isc-dhcp-server configured on a very short lease (180 seconds) to issue
leases.

On a LAN client we request a lease using:

dhclient -d -6 wlp4s0

Usually, if this is started just after the Mox systemd-networkd was
restarted, it'll manage to obtain and then renew a lease about 3 or 4 times.

These will show up in the Mox logs too.

At some point, with absolutely nothing showing in any Mox log in the
meantime, additional renewals will fail.

We later noticed that after this happens sometime later clients on the
network lose IPv6 connectivity to the Mox because neighbour discovery is
also failing - took us a while to spot this because the Mox occasionally
sends RAs at which point the clients can talk to the Mox again. The
symptom here was unexplained random-length 'hangs' of SSH sessions to
the Mox that would affect LAN clients only when the neighbour table
entry had expired.

I'm trying to create a very small reproducer root file-system on the lab
Mox.

Right now I've not been able to reproduce it on the lab unit even with a
clone of the gateway Mox's micro SD-card, but that seems due to it
failing to complete a regular boot - hence creating a fresh root
file-system.

Re: dsa: mv88e6xxx not receiving IPv6 multicast packets

2020-11-15 Thread Tj (Elloe Linux)

[On 15/11/2020 16:02, Andrew Lunn wrote:

> What might be interesting is running
> 
> ip monitor
> 
> and
> 
> bridge monitor
> 
> Look for neighbours being timed out do to inactivity.

Funny you write that! This afternoon I've narrowed it down although I
still don't understand the 'why'.

Watching on the 'good' (lab) and 'bad' (gateway) Mox devices I noticed that:

# bridge -d -s mdb show

23: br-lan  br-lan  ff02::2  temp   257.05

23: br-lan  br-lan  ff05::2  temp   257.05

23: br-lan  br-lan  ff02::6a  temp   257.05
23: br-lan  br-lan  ff02::1:ff77:2b20  temp   257.05
23: br-lan  br-lan  ff02::1:ff00:  temp   257.05

23: br-lan  br-lan  ff02::fb  temp   257.05

23: br-lan  br-lan  ff02::1:ff00:0  temp   257.05
23: br-lan  br-lan  ff02::1:2  temp   257.05
23: br-lan  br-lan  ff05::1:3  temp   257.05

indicates that the entries time out on 'bad' but are reset to a high
value on 'good'

# bridge monitor on 'bad' reported:

Deleted Deleted 23: br-lan  br-lan  ff02::2  temp
Deleted Deleted 23: br-lan  br-lan  ff05::2  temp
Deleted Deleted 23: br-lan  br-lan  ff02::6a  temp
Deleted Deleted 23: br-lan  br-lan  ff02::1:ff77:2b20  temp
Deleted Deleted 23: br-lan  br-lan  ff02::1:ff00:  temp
Deleted Deleted 23: br-lan  br-lan  ff02::fb  temp
Deleted Deleted 23: br-lan  br-lan  ff02::1:ff00:0  temp
Deleted Deleted 23: br-lan  br-lan  ff02::1:2  temp
Deleted Deleted 23: br-lan  br-lan  ff05::1:3  temp

On the laptop I'm testing from (tcpdump always on the laptop):

Using tcpdump I *think* enp2s0 (wired link direct into lan1 on 'good')
always showed the laptop sending multicast listener report v2 packets on
a regular cadence of about 60-100 seconds as well as the DHCPv6
solicit/renews and that cadence matched when the timers on the output of
"bridge -d -s mdb show" reset to approximately 258.

But for wlp4s0 (wifi to 'bad') the DHCPv6 solicit/renew didn't seem to
be accompanied by multicast listener reports and the mdb timers expired.

I need to re-affirm that tomorrow because I've got slightly lost
attempting to compare multiple aspects on both 'good' and 'bad' and seem
to be seeing inconsistent results.

On the laptops we are using Xubuntu 20.04 amd64 with NetworkManager.
I'll try to test from a range of different devices tomorrow in case this
is only affecting staff laptops.

Many thanks for the pointers.

Re: dsa: mv88e6xxx not receiving IPv6 multicast packets

2020-11-15 Thread Tj (Elloe Linux)

On 15/11/2020 17:27, Andrew Lunn wrote:

> So check if you have an IGMP querier in the network. If not, try
> turning it on in the bridge,
> 
> ip link set br0 type bridge mcast_querier 1

Thanks Andrew - that does indeed seem to have solved the issue.

I'm relieved this isn't a hardware or driver issue after all but annoyed
we didn't figure this out ourselves months ago!

Is there any other kernel 'nob' to alter this? I'm trying to understand
why we're seeing two different results with seemingly identical
kernel/OS versions and network configurations.

Re: dsa: mv88e6xxx not receiving IPv6 multicast packets

2020-11-16 Thread Tj (Elloe Linux)

On 15/11/2020 17:27, Andrew Lunn wrote:

> So check if you have an IGMP querier in the network. If not, try
> turning it on in the bridge,
> 
> ip link set br0 type bridge mcast_querier 
Thankfully it turns out this is totally unrelated to Linux - our TP-Link
Jetstream T1600G-PS has some unfortunate default behaviour and a bug.

Specifically, we are operating an IPv6-only network and Layer 2 MLD
snooping was enabled and set to forward unknown multicast groups and as
such the switch should be broadcasting all multicast packets.

However, buried in the TP-Link manual there's a note that says:

"Note: IGMP Snooping and MLD Snooping share the setting of Unknown
Multicast Groups, so you have to enable IGMP Snooping globally on the L2
FEATURES > Multicast > IGMP Snooping > Global Config page at the same
time."

We hadn't enabled IGMP snooping since we don't use IPv4!

Many thanks for the help resolving this and apologies for mis-reporting it.

kernel BUG at net/core/skbuff.c:109!

2021-02-03 Thread Tj (Elloe Linux)

On a recent build (5.10.0) we've seen several hard-to-pinpoint complete
lock-ups requiring power-off restarts.

Today we found a small clue in the kernel log but unfortunately the
complete backtrace wasn't captured (presumably system froze before log
could be flushed) but I thought I should share it for investigation.

kernel BUG at net/core/skbuff.c:109!

kernel: skbuff: skb_under_panic: text:c103c622 len:1228 put:48
head:a00202858000 data:a00202857ff2 tail:0x4be end:0x6c0 dev:wlp4s0
kernel: [ cut here ]
kernel: kernel BUG at net/core/skbuff.c:109!

Obviously this ought not to happen and we'd like to discover the cause.

Whilst writing this report it happened again. Checking the logs we see
three instances of the BUG none of which capture a stack trace:

Jan 27
Feb 03 #1
Feb 03 #2

The only slight clue may be a k3s service that we were unaware was
constantly restarting and had reached 26,636 iterations just before the
Feb 03 #1 BUG. However, we removed k3s immediately after and there were
no similar clues 20 minutes later for the Feb 03 #2 BUG.

Feb 03 11:11:13 elloe001 k3s[1209978]:
time="2021-02-03T11:11:13.452745479Z" level=fatal msg="starting
kubernetes: preparing server: start cluster and https:
listen tcp 10.1.2.1:6443: bind: cannot assign requested address"
Feb 03 11:11:13 elloe001 systemd[1]: k3s-main.service: Main process
exited, code=exited, status=1/FAILURE
Feb 03 11:11:13 elloe001 systemd[1]: k3s-main.service: Failed with
result 'exit-code'.
Feb 03 11:11:13 elloe001 systemd[1]: Failed to start Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: k3s-dev.service: Scheduled restart
job, restart counter is at 26636.
Feb 03 11:11:18 elloe001 systemd[1]: k3s-main.service: Scheduled restart
job, restart counter is at 26636.
Feb 03 11:11:18 elloe001 systemd[1]: Stopped Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: Starting Lightweight Kubernetes...
Feb 03 11:11:18 elloe001 systemd[1]: Stopped Lightweight Kubernetes.
Feb 03 11:11:18 elloe001 systemd[1]: Starting Lightweight Kubernetes...

We don't think this is hardware related as we have several identical
Lenovo E495 laptops and they have never suffered this.

We don't know of any way to reproduce it at will.

Re: dsa: mv88e6xxx losing DHCPv6 solicit packets / IPv6 multicast packets?

2020-07-24 Thread Tj (Elloe Linux)

> As another thought do you know what DHCPv6 client/server is being
> used.
> There was a fairly recent bugfix for busybox that was needed because
>the v6 code was using the wrong MAC address.

I'm the customer experiencing this issue. It appears unrelated to the
DHCP server software. On the Turris Mox with Debian 10 we have
isc-dhcp-server 4.4.1-2. Clients are Xubuntu 20.04 withNetworkManager
1.22.10-1ubuntu2.1 using isc-dclient 4.4.1-2.1ubuntu5.

Quoting from another email I sent to Turris:

We've now done more testing and CONFIRMED the Mox is losing DHCPv6
solicit packets.

Specifically, it seems the 88E6190 hardware switches in the Peridot
module is swallowing IPv6 multicast packets (sent to ff02::1:2 ).

We tested this by mirroring the Mox LAN port on an external switch and
saw the DHCPv6 solicit packet egress the switch but the Mox kernel
didn't see it ingress (using tcpdump).

Re: [Bridge] [PATCH] bridge: flush forwarding table when device carrier off

2006-10-24 Thread ArcosCom Linux User

Is this patch submitted into kernel tree? What version of kernel will have
this patch applied (thinking on 2.6.x and 2.4.x branchs)?

Thanks

El Jue, 12 de Octubre de 2006, 20:24, Stephen Hemminger escribió:
> Flush the forwarding table when carrier is lost. This helps for
> availability because we don't want to forward to a downed device and
> new packets may come in on other links.
>
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> ---
>  net/bridge/br_fdb.c |7 ++-
>  net/bridge/br_if.c  |4 ++--
>  net/bridge/br_private.h |2 +-
>  net/bridge/br_stp_if.c  |2 ++
>  4 files changed, 11 insertions(+), 4 deletions(-)
>
> --- bridge.orig/net/bridge/br_fdb.c
> +++ bridge/net/bridge/br_fdb.c
> @@ -128,7 +128,10 @@ void br_fdb_cleanup(unsigned long _data)
>   mod_timer(&br->gc_timer, jiffies + HZ/10);
>  }
>
> -void br_fdb_delete_by_port(struct net_bridge *br, struct net_bridge_port
> *p)
> +
> +void br_fdb_delete_by_port(struct net_bridge *br,
> +const struct net_bridge_port *p,
> +int do_all)
>  {
>   int i;
>
> @@ -142,6 +145,8 @@ void br_fdb_delete_by_port(struct net_br
>   if (f->dst != p)
>   continue;
>
> + if (f->is_static & !do_all)
> + continue;
>   /*
>* if multiple ports all have the same device address
>* then when one port is deleted, assign
> --- bridge.orig/net/bridge/br_if.c
> +++ bridge/net/bridge/br_if.c
> @@ -163,7 +163,7 @@ static void del_nbp(struct net_bridge_po
>   br_stp_disable_port(p);
>   spin_unlock_bh(&br->lock);
>
> - br_fdb_delete_by_port(br, p);
> + br_fdb_delete_by_port(br, p, 1);
>
>   list_del_rcu(&p->list);
>
> @@ -448,7 +448,7 @@ int br_add_if(struct net_bridge *br, str
>
>   return 0;
>  err2:
> - br_fdb_delete_by_port(br, p);
> + br_fdb_delete_by_port(br, p, 1);
>  err1:
>   kobject_del(&p->kobj);
>  err0:
> --- bridge.orig/net/bridge/br_private.h
> +++ bridge/net/bridge/br_private.h
> @@ -143,7 +143,7 @@ extern void br_fdb_changeaddr(struct net
> const unsigned char *newaddr);
>  extern void br_fdb_cleanup(unsigned long arg);
>  extern void br_fdb_delete_by_port(struct net_bridge *br,
> -struct net_bridge_port *p);
> +   const struct net_bridge_port *p, int do_all);
>  extern struct net_bridge_fdb_entry *__br_fdb_get(struct net_bridge *br,
>const unsigned char *addr);
>  extern struct net_bridge_fdb_entry *br_fdb_get(struct net_bridge *br,
> --- bridge.orig/net/bridge/br_stp_if.c
> +++ bridge/net/bridge/br_stp_if.c
> @@ -113,6 +113,8 @@ void br_stp_disable_port(struct net_brid
>   del_timer(&p->forward_delay_timer);
>   del_timer(&p->hold_timer);
>
> + br_fdb_delete_by_port(br, p, 0);
> +
>   br_configuration_update(br);
>
>   br_port_state_selection(br);
> ___
> Bridge mailing list
> Bridge@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/bridge
>


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 02/13] net: phy: sfp: handle non-wired SFP connectors

2018-05-08 Thread Russell King - ARM Linux

On Fri, May 04, 2018 at 03:56:32PM +0200, Antoine Tenart wrote:
> SFP connectors can be solder on a board without having any of their pins
> (LOS, i2c...) wired. In such cases the SFP link state cannot be guessed,
> and the overall link status reporting is left to other layers.
> 
> In order to achieve this, a new SFP_DEV status is added, named UNKNOWN.
> This mode is set when it is not possible for the SFP code to get the
> link status and as a result the link status is reported to be always UP
> from the SFP point of view.

This looks weird to me.  SFP_DEV_* states track the netdevice up/down
state and have little to do with whether LOS or MODDEF0 are implemented.

I think it would be better to have a new SFP_MOD_* and to force
sm_mod_state to that in this circumstance.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next v2 03/13] net: phy: sfp: warn the user when no tx_disable pin is available

2018-05-08 Thread Russell King - ARM Linux

On Fri, May 04, 2018 at 03:56:33PM +0200, Antoine Tenart wrote:
> In case no Tx disable pin is available the SFP modules will always be
> emitting. This could be an issue when using modules using laser as their
> light source as we would have no way to disable it when the fiber is
> removed. This patch adds a warning when registering an SFP cage which do
> not have its tx_disable pin wired or available.
> 
> Signed-off-by: Antoine Tenart 

Looks fine, thanks.

Acked-by: Russell King 

> ---
>  drivers/net/phy/sfp.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
> index 8e323a4b70da..d4f503b2e3e2 100644
> --- a/drivers/net/phy/sfp.c
> +++ b/drivers/net/phy/sfp.c
> @@ -1093,6 +1093,15 @@ static int sfp_probe(struct platform_device *pdev)
>   if (!sfp->gpio[GPIO_MODDEF0] && !sfp->gpio[GPIO_LOS])
>   sfp->sm_dev_state = SFP_DEV_UNKNOWN;
>  
> + /* We could have an issue in cases no Tx disable pin is available or
> +  * wired as modules using a laser as their light source will continue to
> +  * be active when the fiber is removed. This could be a safety issue and
> +  * we should at least warn the user about that.
> +  */
> + if (!sfp->gpio[GPIO_TX_DISABLE])
> + dev_warn(sfp->dev,
> +  "No tx_disable pin: SFP modules will always be 
> emitting.\n");
> +
>   return 0;
>  }
>  
> -- 
> 2.17.0
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net] net: phy: sfp: fix the BR,min computation

2018-05-08 Thread Russell King - ARM Linux

On Fri, May 04, 2018 at 05:10:54PM +0200, Antoine Tenart wrote:
> In an SFP EEPROM values can be read to get information about a given SFP
> module. One of those is the bitrate, which can be determined using a
> nominal bitrate in addition with min and max values (in %). The SFP code
> currently compute both BR,min and BR,max values thanks to this nominal
> and min,max values.
> 
> This patch fixes the BR,min computation as the min value should be
> subtracted to the nominal one, not added.
> 
> Fixes: 9962acf7fb8c ("sfp: add support for 1000Base-PX and 1000Base-BX10")
> Signed-off-by: Antoine Tenart 

I know David has already applied it, but for the record, your fix looks
correct, thanks.

> ---
>  drivers/net/phy/sfp-bus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
> index 0381da78d228..fd6c23f69c2f 100644
> --- a/drivers/net/phy/sfp-bus.c
> +++ b/drivers/net/phy/sfp-bus.c
> @@ -125,7 +125,7 @@ void sfp_parse_support(struct sfp_bus *bus, const struct 
> sfp_eeprom_id *id,
>   if (id->base.br_nominal) {
>   if (id->base.br_nominal != 255) {
>   br_nom = id->base.br_nominal * 100;
> - br_min = br_nom + id->base.br_nominal * id->ext.br_min;
> + br_min = br_nom - id->base.br_nominal * id->ext.br_min;
>   br_max = br_nom + id->base.br_nominal * id->ext.br_max;
>   } else if (id->ext.br_max) {
>   br_nom = 250 * id->ext.br_max;
> -- 
> 2.17.0
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next] net: phy: sfp: handle cases where neither BR,min nor BR,max is given

2018-05-08 Thread Russell King - ARM Linux

On Fri, May 04, 2018 at 05:21:03PM +0200, Antoine Tenart wrote:
> When computing the bitrate using values read from an SFP module EEPROM,
> we use the nominal BR plus BR,min and BR,max to determine the
> boundaries. But in some cases BR,min and BR,max aren't provided, which
> led the SFP code to end up having the nominal value for both the minimum
> and maximum bitrate values. When using a passive cable, the nominal
> value should be used as the maximum one, and there is no minimum one
> so we should use 0.
> 
> Signed-off-by: Antoine Tenart 
> ---
> 
> Hi Russell,
> 
> I'm not completely sure about this patch as this case is not really
> specified in the specification. But the issue is there, and I've discuss
> this with others. It seemed logical (at least to us :)) to use the
> BR,nominal values as br_max and 0 as br_min when using a passive cable
> which only provides BR,nominal as this would be the highest rate at
> which the cable could work. And because it's passive, there should be no
> issues using it at a lower rate.
> 
> I've tested this with one passive cable which only reports its
> BR,nominal (which was 10300) while it could be used when using 1000baseX
> or 2500baseX modes.

The electronic engineer in me says that using zero isn't really valid
because there are coupling capacitors in the SFP module that block DC.
These blocking capacitors are required by the SFP+ specs to have a high
pass pole of between 20kHz and 100kHz - in other words, frequencies
below this are attenuated by the coupling capacitors.  The relationship
between this and the bit rate will be a function of the encoding, so we
can't come to a definitive figure without some math (and I want to be
lazy about that!)

Practically, we're talking about SerDes Ethernet, where the bit rate is
no lower than 100Mbps [*], which will always have a frequency well above
this cut-off.  So, I don't have any problem with your approach to
setting the minimum to zero.  Therefore,

Acked-by: Russell King 

Please send me the EEPROM dump using:

ethtool -m  raw on > foo.bin

so I can add it to my database for future testing and validation.

Thanks.

* - 10Mbps SGMII is 1Gbps SGMII with each bit repeated 100 times.
100Mbps SGMII is 1Gbps SGMII with each bit repeated 10 times.
There is a capability bits for transceivers supporting
100base-FX/LX but no one has tested those yet.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next v2 06/13] phy: add 2.5G SGMII mode to the phy_mode enum

2018-05-08 Thread Russell King - ARM Linux

On Fri, May 04, 2018 at 03:56:36PM +0200, Antoine Tenart wrote:
> This patch adds one more generic PHY mode to the phy_mode enum, to allow
> configuring generic PHYs to the 2.5G SGMII mode by using the set_mode
> callback.
> 
> Signed-off-by: Antoine Tenart 
> Acked-by: Kishon Vijay Abraham I 

Hi,

Would it be possible to get the 2.5G SGMII comphy support merged
ahead of the rest of this series please - I don't think there's been
any objections to it, and having it in mainline would then mean I can
drop the Marvell Comphy code from my tree and transition to the bootlin
Comphy code instead.

Of course, the perfect solution would be to get the whole series merged,
but I'm just thinking about the situation where we're still discussing
points when the next merge window opens.

Thanks.

> ---
>  include/linux/phy/phy.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
> index c9d147f5..9713aebdd348 100644
> --- a/include/linux/phy/phy.h
> +++ b/include/linux/phy/phy.h
> @@ -36,6 +36,7 @@ enum phy_mode {
>   PHY_MODE_USB_DEVICE_SS,
>   PHY_MODE_USB_OTG,
>   PHY_MODE_SGMII,
> + PHY_MODE_2500SGMII,
>   PHY_MODE_10GKR,
>   PHY_MODE_UFS_HS_A,
>   PHY_MODE_UFS_HS_B,
> -- 
> 2.17.0
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next] net: phy: sfp: handle cases where neither BR,min nor BR,max is given

2018-05-08 Thread Russell King - ARM Linux

On Sat, May 05, 2018 at 01:35:34PM -0700, Florian Fainelli wrote:
> On May 4, 2018 8:21:03 AM PDT, Antoine Tenart  
> wrote:
> >When computing the bitrate using values read from an SFP module EEPROM,
> >we use the nominal BR plus BR,min and BR,max to determine the
> >boundaries. But in some cases BR,min and BR,max aren't provided, which
> >led the SFP code to end up having the nominal value for both the
> >minimum
> >and maximum bitrate values. When using a passive cable, the nominal
> >value should be used as the maximum one, and there is no minimum one
> >so we should use 0.
> >
> >Signed-off-by: Antoine Tenart 
> >---
> >
> >Hi Russell,
> >
> >I'm not completely sure about this patch as this case is not really
> >specified in the specification. But the issue is there, and I've
> >discuss
> >this with others. It seemed logical (at least to us :)) to use the
> >BR,nominal values as br_max and 0 as br_min when using a passive cable
> >which only provides BR,nominal as this would be the highest rate at
> >which the cable could work. And because it's passive, there should be
> >no
> >issues using it at a lower rate.
> >
> >I've tested this with one passive cable which only reports its
> >BR,nominal (which was 10300) while it could be used when using
> >1000baseX
> >or 2500baseX modes.
> 
> Which SFP modules (vendor and model) exposed this out of curiosity?
> Russell and I already saw the Cotsworks modules having so e issues
> with checksums, so building a table of quirks would help. Thanks!

I think this is just manufacturers being lazy with their EEPROM
contents - looking around, most passive cables are specified to be
"up to" some figure, and that's definitely what's specified by the
SFP+ specification by way of the high-pass pole requirement of the
coupling capacitors.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next v2 03/13] net: phy: sfp: warn the user when no tx_disable pin is available

2018-05-08 Thread Russell King - ARM Linux

On Sat, May 05, 2018 at 10:52:42PM +0200, Andrew Lunn wrote:
> On Sat, May 05, 2018 at 01:38:31PM -0700, Florian Fainelli wrote:
> > On May 4, 2018 10:14:25 AM PDT, Andrew Lunn  wrote:
> > >On Fri, May 04, 2018 at 10:07:53AM -0700, Florian Fainelli wrote:
> > >> On 05/04/2018 06:56 AM, Antoine Tenart wrote:
> > >> > In case no Tx disable pin is available the SFP modules will always
> > >be
> > >> > emitting. This could be an issue when using modules using laser as
> > >their
> > >> > light source as we would have no way to disable it when the fiber
> > >is
> > >> > removed. This patch adds a warning when registering an SFP cage
> > >which do
> > >> > not have its tx_disable pin wired or available.
> > >> 
> > >> Is this something that was done in a possibly earlier revision of a
> > >> given board design and which was finally fixed? Nothing wrong with
> > >the
> > >> patch, but this seems like a pretty serious board design mistake,
> > >that
> > >> needs to be addressed.
> > >
> > >Hi Florian
> > >
> > >Zii Devel B is like this. Only the "Signal Detect" pin is wired to a
> > >GPIO.
> > 
> 
> > Good point, indeed. BTW what do you think about exposing the SFF's
> > EEPROM and diagnostics through the standard ethtool operations even
> > if we have to keep the description of the SFF as a fixed link in
> > Device Tree because of the unfortunate wiring?
> 
> I believe in Antoine case, all the control plane is broken. He cannot
> read the EEPROM, nor any of the modules pins via GPIOs.

Correct.

> For Zii Devel B, the EEPROM is accessible, and so is the SD pin. What
> is missing is transmit disable. So i would expose it as an SFF module.

Agreed.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next v2 3/9] net: phy: phylink: Poll link GPIOs

2018-05-10 Thread Russell King - ARM Linux

On Thu, May 10, 2018 at 01:17:31PM -0700, Florian Fainelli wrote:
> From: Russell King 
> 
> When using a fixed link with a link GPIO, we need to poll that GPIO to
> determine link state changes. This is consistent with what fixed_phy.c does.
> 
> Signed-off-by: Florian Fainelli 

I'd like this to use the GPIO interrupt where available, only falling back
to the timer approach when there's no interrupt.  Unfortunately, I don't
have much time to devote to this at the moment, having recently been away
on vacation, and now having to work on ARM specific issues for probably
all of the remainder of this kernel cycle.

That means I won't have time to test your series on any of the boards
I have available to me.

> ---
>  drivers/net/phy/phylink.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 6392b5248cf5..581ce93ecaf9 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "sfp.h"
> @@ -54,6 +55,7 @@ struct phylink {
>   /* The link configuration settings */
>   struct phylink_link_state link_config;
>   struct gpio_desc *link_gpio;
> + struct timer_list link_poll;
>   void (*get_fixed_state)(struct net_device *dev,
>   struct phylink_link_state *s);
>  
> @@ -500,6 +502,15 @@ static void phylink_run_resolve(struct phylink *pl)
>   queue_work(system_power_efficient_wq, &pl->resolve);
>  }
>  
> +static void phylink_fixed_poll(struct timer_list *t)
> +{
> + struct phylink *pl = container_of(t, struct phylink, link_poll);
> +
> + mod_timer(t, jiffies + HZ);
> +
> + phylink_run_resolve(pl);
> +}
> +
>  static const struct sfp_upstream_ops sfp_phylink_ops;
>  
>  static int phylink_register_sfp(struct phylink *pl,
> @@ -572,6 +583,7 @@ struct phylink *phylink_create(struct net_device *ndev,
>   pl->link_config.an_enabled = true;
>   pl->ops = ops;
>   __set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
> + timer_setup(&pl->link_poll, phylink_fixed_poll, 0);
>  
>   bitmap_fill(pl->supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
>   linkmode_copy(pl->link_config.advertising, pl->supported);
> @@ -905,6 +917,8 @@ void phylink_start(struct phylink *pl)
>   clear_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
>   phylink_run_resolve(pl);
>  
> + if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio))
> + mod_timer(&pl->link_poll, jiffies + HZ);
>   if (pl->sfp_bus)
>   sfp_upstream_start(pl->sfp_bus);
>   if (pl->phydev)
> @@ -929,6 +943,8 @@ void phylink_stop(struct phylink *pl)
>   phy_stop(pl->phydev);
>   if (pl->sfp_bus)
>   sfp_upstream_stop(pl->sfp_bus);
> + if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio))
> + del_timer_sync(&pl->link_poll);
>  
>   set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
>   queue_work(system_power_efficient_wq, &pl->resolve);
> -- 
> 2.14.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 2/4] net: phy: phylink: Provide PHY interface to mac_link_{up,down}

2018-03-28 Thread Russell King - ARM Linux

On Sun, Mar 18, 2018 at 11:52:44AM -0700, Florian Fainelli wrote:
> In preparation for having DSA transition entirely to PHYLINK, we need to pass 
> a
> PHY interface type to the mac_link_{up,down} callbacks because we may have to
> make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not 
> pass
> an entire phylink_link_state because not all parameters (pause, duplex etc.) 
> are
> defined when the link is down, only link and interface are.

If we're going to make this change, we ought to decide whether David
should pick this up for the coming merge window or not independently
of the remaining patches - there are other users of phylink in the
pipeline (bootlin are working on mvpp2 support, so this will be a
minor source of build error pain for folk.)

To that end,

Acked-by: Russell King 

However, the documentation probably ought to make it clear that the
configuration of the interface mode of the MAC should always happen
in the mac_config() callback, not in the mac_link_*() functions.

Thanks.

> Update mvneta accordingly since it currently implements phylink_mac_ops.
> 
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/ethernet/marvell/mvneta.c |  4 +++-
>  drivers/net/phy/phylink.c |  6 +-
>  include/linux/phylink.h   | 10 --
>  3 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index 25e9a551cc8c..60de9b8d62c2 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3396,7 +3396,8 @@ static void mvneta_set_eee(struct mvneta_port *pp, bool 
> enable)
>   mvreg_write(pp, MVNETA_LPI_CTRL_1, lpi_ctl1);
>  }
>  
> -static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode)
> +static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode,
> +  phy_interface_t interface)
>  {
>   struct mvneta_port *pp = netdev_priv(ndev);
>   u32 val;
> @@ -3415,6 +3416,7 @@ static void mvneta_mac_link_down(struct net_device 
> *ndev, unsigned int mode)
>  }
>  
>  static void mvneta_mac_link_up(struct net_device *ndev, unsigned int mode,
> +phy_interface_t interface,
>  struct phy_device *phy)
>  {
>   struct mvneta_port *pp = netdev_priv(ndev);
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 51a011a349fe..cef3c1356a8c 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -423,8 +423,10 @@ static void phylink_resolve(struct work_struct *w)
>   if (pl->phylink_disable_state) {
>   pl->mac_link_dropped = false;
>   link_state.link = false;
> + link_state.interface = pl->phy_state.interface;
>   } else if (pl->mac_link_dropped) {
>   link_state.link = false;
> + link_state.interface = pl->phy_state.interface;
>   } else {
>   switch (pl->link_an_mode) {
>   case MLO_AN_PHY:
> @@ -470,10 +472,12 @@ static void phylink_resolve(struct work_struct *w)
>   if (link_state.link != netif_carrier_ok(ndev)) {
>   if (!link_state.link) {
>   netif_carrier_off(ndev);
> - pl->ops->mac_link_down(ndev, pl->link_an_mode);
> + pl->ops->mac_link_down(ndev, pl->link_an_mode,
> +pl->phy_state.interface);
>   netdev_info(ndev, "Link is Down\n");
>   } else {
>   pl->ops->mac_link_up(ndev, pl->link_an_mode,
> +  pl->phy_state.interface,
>pl->phydev);
>  
>   netif_carrier_on(ndev);
> diff --git a/include/linux/phylink.h b/include/linux/phylink.h
> index bd137c273d38..f29a40947de9 100644
> --- a/include/linux/phylink.h
> +++ b/include/linux/phylink.h
> @@ -73,8 +73,10 @@ struct phylink_mac_ops {
>   void (*mac_config)(struct net_device *ndev, unsigned int mode,
>  const struct phylink_link_state *state);
>   void (*mac_an_restart)(struct net_device *ndev);
> - void (*mac_link_down)(struct net_device *ndev, unsigned int mode);
> + void (*mac_link_down)(struct net_device *ndev, unsigned int mode,
> +   phy_interface_t interface);
>   void (*mac_link_up)(struct net_device *ndev, unsigned int mode,
> + phy_interface_t interface,
>   struct phy_device *phy);
>

Re: [PATCH] sfp: allow cotsworks modules

2018-03-28 Thread Russell King - ARM Linux

On Wed, Mar 28, 2018 at 03:33:57AM -0700, Joe Perches wrote:
> On Wed, 2018-03-28 at 11:18 +0100, Russell King wrote:
> > Cotsworks modules fail the checksums - it appears that Cotsworks
> > reprograms the EEPROM at the end of production with the final product
> > information (serial, date code, and exact part number for module
> > options) and fails to update the checksum.
> 
> trivia:
> 
> > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
> []
> > @@ -574,23 +575,43 @@ static int sfp_sm_mod_probe(struct sfp *sfp)
> []
> > +   if (cotsworks) {
> > +   dev_warn(sfp->dev,
> > +"EEPROM base structure checksum failure 
> > (0x%02x != 0x%02x)\n",
> > +check, id.base.cc_base);
> > +   } else {
> > +   dev_err(sfp->dev,
> > +   "EEPROM base structure checksum failure: 0x%02x 
> > != 0x%02x\n",
> 
> It'd be better to move this above the if and
> use only a single format string instead of
> using 2 slightly different formats.

No.  I think you've missed the fact that one is a _warning_ the other is
an _error_ and they are emitted at the appropriate severity.  It's not
just that the format strings are slightly different.

> 
> > +   check, id.base.cc_base);
> > +   print_hex_dump(KERN_ERR, "sfp EE: ", DUMP_PREFIX_OFFSET,
> > +  16, 1, &id, sizeof(id), true);
> > +   return -EINVAL;
> > +   }
> > }
> >  
> > check = sfp_check(&id.ext, sizeof(id.ext) - 1);
> > if (check != id.ext.cc_ext) {
> > -   dev_err(sfp->dev,
> > -   "EEPROM extended structure checksum failure: 0x%02x\n",
> > -   check);
> > -   memset(&id.ext, 0, sizeof(id.ext));
> > +   if (cotsworks) {
> > +   dev_warn(sfp->dev,
> > +"EEPROM extended structure checksum failure 
> > (0x%02x != 0x%02x)\n",
> > +check, id.ext.cc_ext);
> > +   } else {
> > +   dev_err(sfp->dev,
> > +   "EEPROM extended structure checksum failure: 
> > 0x%02x != 0x%02x\n",
> > +   check, id.ext.cc_ext);
> 
> 
> here too

Same applies.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH] sfp: allow cotsworks modules

2018-03-28 Thread Russell King - ARM Linux

On Wed, Mar 28, 2018 at 09:19:01AM -0700, Joe Perches wrote:
> On Wed, 2018-03-28 at 11:41 +0100, Russell King - ARM Linux wrote:
> > On Wed, Mar 28, 2018 at 03:33:57AM -0700, Joe Perches wrote:
> > > On Wed, 2018-03-28 at 11:18 +0100, Russell King wrote:
> > > > Cotsworks modules fail the checksums - it appears that Cotsworks
> > > > reprograms the EEPROM at the end of production with the final product
> > > > information (serial, date code, and exact part number for module
> > > > options) and fails to update the checksum.
> > > 
> > > trivia:
> > > 
> > > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
> > > 
> > > []
> > > > @@ -574,23 +575,43 @@ static int sfp_sm_mod_probe(struct sfp *sfp)
> > > 
> > > []
> > > > +   if (cotsworks) {
> > > > +   dev_warn(sfp->dev,
> > > > +"EEPROM base structure checksum 
> > > > failure (0x%02x != 0x%02x)\n",
> > > > +check, id.base.cc_base);
> > > > +   } else {
> > > > +   dev_err(sfp->dev,
> > > > +   "EEPROM base structure checksum 
> > > > failure: 0x%02x != 0x%02x\n",
> > > 
> > > It'd be better to move this above the if and
> > > use only a single format string instead of
> > > using 2 slightly different formats.
> > 
> > No.  I think you've missed the fact that one is a _warning_ the other is
> > an _error_ and they are emitted at the appropriate severity.  It's not
> > just that the format strings are slightly different.
> 
> Right.  Still nicer to use the same formats.

I'll stick a "Warning:" and "Error:" tag before them if you really
want the rest of the message to be identically formatted - otherwise,
when seeing reports from people's dmesg, there will be nothing to
indicate which message was printed.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 1/2] net: phy: phylink: Provide PHY interface to mac_link_{up,down}

2018-03-28 Thread Russell King - ARM Linux

On Wed, Mar 28, 2018 at 12:03:38PM -0700, Florian Fainelli wrote:
> In preparation for having DSA transition entirely to PHYLINK, we need to pass 
> a
> PHY interface type to the mac_link_{up,down} callbacks because we may have to
> make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not 
> pass
> an entire phylink_link_state because not all parameters (pause, duplex etc.) 
> are
> defined when the link is down, only link and interface are.
> 
> Update mvneta accordingly since it currently implements phylink_mac_ops.
> 
> Signed-off-by: Florian Fainelli 

Similar comments to previous version wrt documentation, but...

Acked-by: Russell King 

> ---
>  drivers/net/ethernet/marvell/mvneta.c |  4 +++-
>  drivers/net/phy/phylink.c |  4 +++-
>  include/linux/phylink.h   | 10 --
>  3 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index eaa4bb80f1c9..cd09bde55596 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3396,7 +3396,8 @@ static void mvneta_set_eee(struct mvneta_port *pp, bool 
> enable)
>   mvreg_write(pp, MVNETA_LPI_CTRL_1, lpi_ctl1);
>  }
>  
> -static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode)
> +static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode,
> +  phy_interface_t interface)
>  {
>   struct mvneta_port *pp = netdev_priv(ndev);
>   u32 val;
> @@ -3415,6 +3416,7 @@ static void mvneta_mac_link_down(struct net_device 
> *ndev, unsigned int mode)
>  }
>  
>  static void mvneta_mac_link_up(struct net_device *ndev, unsigned int mode,
> +phy_interface_t interface,
>  struct phy_device *phy)
>  {
>   struct mvneta_port *pp = netdev_priv(ndev);
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 51a011a349fe..9b1e4721ea3a 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -470,10 +470,12 @@ static void phylink_resolve(struct work_struct *w)
>   if (link_state.link != netif_carrier_ok(ndev)) {
>   if (!link_state.link) {
>   netif_carrier_off(ndev);
> - pl->ops->mac_link_down(ndev, pl->link_an_mode);
> + pl->ops->mac_link_down(ndev, pl->link_an_mode,
> +pl->phy_state.interface);
>   netdev_info(ndev, "Link is Down\n");
>   } else {
>   pl->ops->mac_link_up(ndev, pl->link_an_mode,
> +  pl->phy_state.interface,
>    pl->phydev);
>  
>   netif_carrier_on(ndev);
> diff --git a/include/linux/phylink.h b/include/linux/phylink.h
> index bd137c273d38..f29a40947de9 100644
> --- a/include/linux/phylink.h
> +++ b/include/linux/phylink.h
> @@ -73,8 +73,10 @@ struct phylink_mac_ops {
>   void (*mac_config)(struct net_device *ndev, unsigned int mode,
>  const struct phylink_link_state *state);
>   void (*mac_an_restart)(struct net_device *ndev);
> - void (*mac_link_down)(struct net_device *ndev, unsigned int mode);
> + void (*mac_link_down)(struct net_device *ndev, unsigned int mode,
> +   phy_interface_t interface);
>   void (*mac_link_up)(struct net_device *ndev, unsigned int mode,
> + phy_interface_t interface,
>   struct phy_device *phy);
>  };
>  
> @@ -161,17 +163,20 @@ void mac_an_restart(struct net_device *ndev);
>   * mac_link_down() - take the link down
>   * @ndev: a pointer to a &struct net_device for the MAC.
>   * @mode: link autonegotiation mode
> + * @interface: link &typedef phy_interface_t mode
>   *
>   * If @mode is not an in-band negotiation mode (as defined by
>   * phylink_autoneg_inband()), force the link down and disable any
>   * Energy Efficient Ethernet MAC configuration.
>   */
> -void mac_link_down(struct net_device *ndev, unsigned int mode);
> +void mac_link_down(struct net_device *ndev, unsigned int mode,
> +phy_interface_t interface);
>  
>  /**
>   * mac_link_up() - allow the link to come up
>   * @ndev: a pointer to a &struct net_device for the MAC.
>   * @mode: link autonegotiation mode
> + * @interface: link &typedef phy_interface_t mode
>   * @phy: any attached phy
>   *
>   * If @mode is not an in-band negotiation mode (a

Re: [PATCH net-next 2/2] sfp/phylink: move module EEPROM ethtool access into netdev core ethtool

2018-03-28 Thread Russell King - ARM Linux

On Wed, Mar 28, 2018 at 12:03:39PM -0700, Florian Fainelli wrote:
> From: Russell King 
> 
> Provide a pointer to the SFP bus in struct net_device, so that the
> ethtool module EEPROM methods can access the SFP directly, rather
> than needing every user to provide a hook for it.
> 
> Signed-off-by: Russell King 

This probably ought to have your sign-off too as you're passing the
patch along rather than me submitting it directly.  DCO v1.1 (c)
seems to apply to this situation.

> ---
>  drivers/net/ethernet/marvell/mvneta.c | 18 --
>  drivers/net/phy/phylink.c | 28 
>  drivers/net/phy/sfp-bus.c     |  6 ++
>  include/linux/netdevice.h |  3 +++
>  include/linux/phylink.h   |  3 ---
>  net/core/ethtool.c|  7 +++
>  6 files changed, 12 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index cd09bde55596..25ced96750bf 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -4075,22 +4075,6 @@ static int mvneta_ethtool_set_wol(struct net_device 
> *dev,
>   return ret;
>  }
>  
> -static int mvneta_ethtool_get_module_info(struct net_device *dev,
> -   struct ethtool_modinfo *modinfo)
> -{
> - struct mvneta_port *pp = netdev_priv(dev);
> -
> - return phylink_ethtool_get_module_info(pp->phylink, modinfo);
> -}
> -
> -static int mvneta_ethtool_get_module_eeprom(struct net_device *dev,
> - struct ethtool_eeprom *ee, u8 *buf)
> -{
> - struct mvneta_port *pp = netdev_priv(dev);
> -
> - return phylink_ethtool_get_module_eeprom(pp->phylink, ee, buf);
> -}
> -
>  static int mvneta_ethtool_get_eee(struct net_device *dev,
> struct ethtool_eee *eee)
>  {
> @@ -4165,8 +4149,6 @@ static const struct ethtool_ops mvneta_eth_tool_ops = {
>   .set_link_ksettings = mvneta_ethtool_set_link_ksettings,
>   .get_wol= mvneta_ethtool_get_wol,
>   .set_wol= mvneta_ethtool_set_wol,
> - .get_module_info = mvneta_ethtool_get_module_info,
> - .get_module_eeprom = mvneta_ethtool_get_module_eeprom,
>   .get_eee= mvneta_ethtool_get_eee,
>   .set_eee= mvneta_ethtool_set_eee,
>  };
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 9b1e4721ea3a..c582b2d7546c 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -1250,34 +1250,6 @@ int phylink_ethtool_set_pauseparam(struct phylink *pl,
>  }
>  EXPORT_SYMBOL_GPL(phylink_ethtool_set_pauseparam);
>  
> -int phylink_ethtool_get_module_info(struct phylink *pl,
> - struct ethtool_modinfo *modinfo)
> -{
> - int ret = -EOPNOTSUPP;
> -
> - WARN_ON(!lockdep_rtnl_is_held());
> -
> - if (pl->sfp_bus)
> - ret = sfp_get_module_info(pl->sfp_bus, modinfo);
> -
> - return ret;
> -}
> -EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_info);
> -
> -int phylink_ethtool_get_module_eeprom(struct phylink *pl,
> -   struct ethtool_eeprom *ee, u8 *buf)
> -{
> - int ret = -EOPNOTSUPP;
> -
> - WARN_ON(!lockdep_rtnl_is_held());
> -
> - if (pl->sfp_bus)
> - ret = sfp_get_module_eeprom(pl->sfp_bus, ee, buf);
> -
> - return ret;
> -}
> -EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_eeprom);
> -
>  /**
>   * phylink_ethtool_get_eee_err() - read the energy efficient ethernet error
>   *   counter
> diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
> index 3d4ff5d0d2a6..0381da78d228 100644
> --- a/drivers/net/phy/sfp-bus.c
> +++ b/drivers/net/phy/sfp-bus.c
> @@ -342,6 +342,7 @@ static int sfp_register_bus(struct sfp_bus *bus)
>   }
>   if (bus->started)
>   bus->socket_ops->start(bus->sfp);
> + bus->netdev->sfp_bus = bus;
>   bus->registered = true;
>   return 0;
>  }
> @@ -356,6 +357,7 @@ static void sfp_unregister_bus(struct sfp_bus *bus)
>   if (bus->phydev && ops && ops->disconnect_phy)
>   ops->disconnect_phy(bus->upstream);
>   }
> + bus->netdev->sfp_bus = NULL;
>   bus->registered = false;
>  }
>  
> @@ -371,8 +373,6 @@ static void sfp_unregister_bus(struct sfp_bus *bus)
>   */
>  int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo)
>  {
> - if (!bus->registered)
> -

Re: [EXT] [PATCH net-next v2 0/2] phylink: API changes

2018-03-29 Thread Russell King - ARM Linux

On Thu, Mar 29, 2018 at 05:58:43AM +, Yan Markman wrote:
> Hi Florian
> Please keep CCYelena Krivosheev 
> for changes with  drivers/net/ethernet/marvell/mvneta.c
> Thanks

We have a way to ensure such things happen - it's the MAINTAINERS
file.  Please use the established community methods rather than
sending emails asking for people to remember such quirks.  Thanks.

> Yan Markman
> Tel. 05-44732819
> 
> 
> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com] 
> Sent: Thursday, March 29, 2018 1:44 AM
> To: netdev@vger.kernel.org
> Cc: Florian Fainelli ; Thomas Petazzoni 
> ; Andrew Lunn ; David S. 
> Miller ; Russell King ; open 
> list ; Antoine Tenart 
> ; Yan Markman ; Stefan 
> Chulski ; Maxime Chevallier 
> ; Miquel Raynal 
> ; Marcin Wojtas 
> Subject: [EXT] [PATCH net-next v2 0/2] phylink: API changes
> 
> External Email
> 
> --
> Hi all,
> 
> This patch series contains two API changes to PHYLINK which will later be 
> used by DSA to migrate to PHYLINK. Because these are API changes that impact 
> other outstanding work (e.g: MVPP2) I would rather get them included sooner 
> to minimize conflicts.
> 
> Thank you!
> 
> Changes in v2:
> 
> - added missing documentation to mac_link_{up,down} that the interface
>   must be configured in mac_config()
> 
> - added Russell's, Andrew's and my tags
> 
> Florian Fainelli (1):
>   net: phy: phylink: Provide PHY interface to mac_link_{up,down}
> 
> Russell King (1):
>   sfp/phylink: move module EEPROM ethtool access into netdev core
> ethtool
> 
>  drivers/net/ethernet/marvell/mvneta.c | 22 +++---
>  drivers/net/phy/phylink.c | 32 +++-
>  drivers/net/phy/sfp-bus.c |  6 ++
>  include/linux/netdevice.h |  3 +++
>  include/linux/phylink.h   | 17 +++--
>  net/core/ethtool.c|  7 +++
>  6 files changed, 29 insertions(+), 58 deletions(-)
> 
> --
> 2.14.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next] phylink: Fix an uninitialized variable bug

2017-08-17 Thread Russell King - ARM Linux

On Thu, Aug 10, 2017 at 05:21:12PM +0200, Andrew Lunn wrote:
> On Thu, Aug 10, 2017 at 12:35:50AM +0300, Dan Carpenter wrote:
> > "ret" isn't necessarily initialized here.
> > 
> > Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> > Signed-off-by: Dan Carpenter 
> 
> Reviewed-by: Andrew Lunn 

Thanks, not sure how that got missed - it was probably introduced when
migrating the code to ksettings.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [RFC PATCH] dt-binding: net: sfp binding documentation

2017-08-21 Thread Russell King - ARM Linux

On Sun, Aug 20, 2017 at 01:28:06PM +0300, Baruch Siach wrote:
> Add device-tree binding documentation SFP transceivers. Support for SFP
> transceivers has been recently introduced (drivers/net/phy/sfp.c).
> 
> Signed-off-by: Baruch Siach 
> ---
> 
> The SFP driver is on net-next.
> 
> Not sure about the rate-select-gpio property name. The SFP+ standard
> (not supported yet) uses two signals, RS0 and RS1. RS0 is compatible
> with the SFP rate select signal, while RS1 controls the Tx rate.

SFP+ is usable with this, but the platforms I have do not wire the
rate select pins on the SFP+ sockets to GPIOs, but hard-wire them.

Note that I didn't expect the SFP code to just get merged with very
little in the way of real in-depth review of things like:

* the way the SFP code works, and its structure
* analysis of the bindings checking that they're fit for everyone's
  purposes.

The implementation that I've designed is based around the boards that
I have access to and the various public SFP documentation.  I think
documenting the bindings suggests that they are stable - I don't think
we're really ready to make that assertion yet - there may be things
that have been missed which will only come up when other people start
using this code.

> ---
>  Documentation/devicetree/bindings/net/sff-sfp.txt | 24 
> +++
>  1 file changed, 24 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/sff-sfp.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/sff-sfp.txt 
> b/Documentation/devicetree/bindings/net/sff-sfp.txt
> new file mode 100644
> index ..f0c27bc3925e
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/sff-sfp.txt
> @@ -0,0 +1,24 @@
> +Small Form Factor (SFF) Committee Small Form-factor Pluggable (SFP)
> +Transceiver
> +
> +Required properties:
> +
> +- compatible : must be "sff,sfp"
> +
> +Optional Properties:
> +
> +- i2c-bus : phandle of an I2C bus controller for the SFP two wire serial
> +  interface

The code as it currently stands pretty much requires an I2C bus to be
functional - but when I wrote the code, I left the possibility open for
an implementation (eg, network driver) to provide its own functionality
for reading the I2C EEPROM on the module.  Some adapters which already
have SFP support do this.

Hence, for current implementations, this is required.

> +
> +- moddef0-gpio : phandle of the MOD-DEF0 (AKA Mod_ABS) module presence input
> +  gpio signal
> +
> +- los-gpio : phandle of the Receiver Loss of Signal Indication input gpio
> +  signal
> +
> +- tx-fault-gpio : phandle of the Module Transmitter Fault input gpio signal
> +
> +- tx-disable-gpio : phandle of the Transmitter Disable output gpio signal
> +
> +- rate-select-gpio : phandle of the Rx Signaling Rate Select (AKA RS0) output
> +  gpio
> -- 
> 2.14.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [RFC PATCH] dt-binding: net: sfp binding documentation

2017-08-23 Thread Russell King - ARM Linux

On Mon, Aug 21, 2017 at 02:10:33PM -0500, Rob Herring wrote:
> On Sun, Aug 20, 2017 at 5:28 AM, Baruch Siach  wrote:
> > Add device-tree binding documentation SFP transceivers. Support for SFP
> > transceivers has been recently introduced (drivers/net/phy/sfp.c).
> >
> > Signed-off-by: Baruch Siach 
> > ---
> >
> > The SFP driver is on net-next.
> >
> > Not sure about the rate-select-gpio property name. The SFP+ standard
> > (not supported yet) uses two signals, RS0 and RS1. RS0 is compatible
> > with the SFP rate select signal, while RS1 controls the Tx rate.
> > ---
> >  Documentation/devicetree/bindings/net/sff-sfp.txt | 24 
> > +++
> >  1 file changed, 24 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/net/sff-sfp.txt
> >
> > diff --git a/Documentation/devicetree/bindings/net/sff-sfp.txt 
> > b/Documentation/devicetree/bindings/net/sff-sfp.txt
> > new file mode 100644
> > index ..f0c27bc3925e
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/sff-sfp.txt
> > @@ -0,0 +1,24 @@
> > +Small Form Factor (SFF) Committee Small Form-factor Pluggable (SFP)
> > +Transceiver
> > +
> > +Required properties:
> > +
> > +- compatible : must be "sff,sfp"
> 
> Need to document "sff" vendor prefix.
> 
> Kind of a short name, but I guess it is sufficient. Are there
> revisions of the standard (not SFP+) or more than one form factor (I
> don't recall any)?

The standards get revised and reorganised, so you can't really name any
particular standard.  SFP+ is a supplement to SFP, and I suspect that's
going to continue into the future.

> > +
> > +Optional Properties:
> > +
> > +- i2c-bus : phandle of an I2C bus controller for the SFP two wire serial
> > +  interface
> 
> Why not a child of the i2c bus it is on? IOW, what should this be a child of?

What reg= value would you use to identify it?  There's no particular
I2C bus address.  There's an EEPROM on the actual module, and there
may be a PHY on the I2C bus (some PHYs include I2C as an alternative
way to speak to them other than MDIO.)

I2C couldn't probe these as they are effectively hotplugged.

However, there's also the question about why it should be a child of
the I2C bus - the I2C bus is just a means of communicating with and
identifying the module.  You could equally argue that it should be
a child of the GPIO controller, because that's how it's controlled.
You could also argue that it should be a child of the ethernet
interface, since that's the main data path.

> > +
> > +- moddef0-gpio : phandle of the MOD-DEF0 (AKA Mod_ABS) module presence 
> > input
> > +  gpio signal
> 
> mod-def0-gpios?

It all depends on the standard you read.  Some call it MOD_DEF0, Mod-DEF0,
Mod_ABS, and some call it MOD-DEF0.  And confusingly, some standards call
the binary combination of the three MOD-DEF signals "MOD-DEF 0"...
"MOD-DEF 7".  These signals come from the GBIC module era.  It's something
of a mess.

> > +
> > +- los-gpio : phandle of the Receiver Loss of Signal Indication input gpio
> > +  signal
> > +
> > +- tx-fault-gpio : phandle of the Module Transmitter Fault input gpio signal
> > +
> > +- tx-disable-gpio : phandle of the Transmitter Disable output gpio signal
> > +
> > +- rate-select-gpio : phandle of the Rx Signaling Rate Select (AKA RS0) 
> > output
> > +  gpio
> 
> -gpios is the preferred form for all of these.

Even if there's only _one_ - using the plural leads one to think that
you can list many GPIOs, which is not correct here.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [RFC PATCH] dt-binding: net: sfp binding documentation

2017-08-23 Thread Russell King - ARM Linux

On Mon, Aug 21, 2017 at 02:12:42PM -0500, Rob Herring wrote:
> On Mon, Aug 21, 2017 at 10:06 AM, Baruch Siach  wrote:
> > Hi Russell,
> >
> > On Mon, Aug 21, 2017 at 01:53:17PM +0100, Russell King - ARM Linux wrote:
> >> On Sun, Aug 20, 2017 at 01:28:06PM +0300, Baruch Siach wrote:
> >> > Add device-tree binding documentation SFP transceivers. Support for SFP
> >> > transceivers has been recently introduced (drivers/net/phy/sfp.c).
> >> >
> >> > Signed-off-by: Baruch Siach 
> >> > ---
> >> >
> >> > The SFP driver is on net-next.
> >> >
> >> > Not sure about the rate-select-gpio property name. The SFP+ standard
> >> > (not supported yet) uses two signals, RS0 and RS1. RS0 is compatible
> >> > with the SFP rate select signal, while RS1 controls the Tx rate.
> >>
> >> SFP+ is usable with this, but the platforms I have do not wire the
> >> rate select pins on the SFP+ sockets to GPIOs, but hard-wire them.
> >
> > So maybe naming this signal 'rate-select0-gpio' would make it more future
> > (SPF+) proof? Or 'rate-select-rx-gpio'?
> 
> Just extend it by making it an array of 2 gpios.

What do you do if you have only one rate select wired up and it doesn't
correspond with the first?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [RFC PATCH] dt-binding: net: sfp binding documentation

2017-08-23 Thread Russell King - ARM Linux

On Mon, Aug 21, 2017 at 06:06:53PM +0300, Baruch Siach wrote:
> Hi Russell,
> 
> On Mon, Aug 21, 2017 at 01:53:17PM +0100, Russell King - ARM Linux wrote:
> > Note that I didn't expect the SFP code to just get merged with very
> > little in the way of real in-depth review of things like:
> > 
> > * the way the SFP code works, and its structure
> > * analysis of the bindings checking that they're fit for everyone's
> >   purposes.
> 
> I was also surprised to see the "sff,sfp" compatible string with no ack from 
> DT maintainers. Hence this RFC.

I've been pushed into submitting the code for merging, and I hadn't
got around to writing the DT docs (thanks for doing that).  As I've
already said, I'm disappointed that the code didn't get more of a
review before it was merged - it seems Linux review is not what it
was, people care more about reviewing for spelling errors and style
than code structure and functionality, stating that "if we don't like
it we can always rework it" or similar.

It also seems that people believe that they can't make use of other
people's work until it gets merged into mainline kernels (which is
what has been behind the pressure of getting this merged.)

What isn't realised is that having other people use the code before
it gets merged allows design issues to be identified and resolved
when there is great flexibility available - for example, changing the
DT binding.  Once it's merged, changing DT bindings becomes harder,
especially if they need to be changed in an incompatible way.

I'm fed up about this, and way past caring about these details today
through.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode

2017-08-24 Thread Russell King - ARM Linux

On Thu, Aug 24, 2017 at 04:56:09PM +0200, Andrew Lunn wrote:
> On Thu, Aug 24, 2017 at 10:38:19AM +0200, Antoine Tenart wrote:
> > This patch adds logic to reconfigure the comphy/gop when the link status
> > change at runtime. This is very useful on boards such as the mcbin which
> > have SFP and Ethernet ports connected to the same MAC port: depending on
> > what the user connects the driver will automatically reconfigure the
> > link mode.
> 
> Hi Antoine
> 
> I would expect each of these external Ethernet ports to have its own
> Ethernet PHY. Don't you need to disconnect from one Ethernet phy and
> connect to the other Ethernet PHY when you change external Ethernet
> port?

I think you're all getting confused.  The link mode has very little to
do with whether you're using SFP+ or whether you're using the RJ45 at
10G speeds.  The link mode has everything to do with the speed at which
the link is negotiated at.

So please, put SFP+ out of your minds for this - SFP+ isn't the reason
why you need to switch the MAC link mode.

In all cases, the mvpp2 to 88x3310 link ends up in one of two modes:
1. SGMII for RJ45 speeds less than 10G.  Autonegotiation on SGMII
at the mvpp2 end *must* be enabled for the PHY to work.

2. 10Gbase-R for 10G speeds, whether that be for SFP+ or RJ45 at 10G.
   Note: mcbin does not support SFP (1G) modules on the SFP+ ports.

The 88x3310 driver in the kernel knows about these combinations and
sets the phy interface parameter correctly depending on whether the
PHY has configured itself for copper at whatever speed or SFP+.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode

2017-08-24 Thread Russell King - ARM Linux

On Thu, Aug 24, 2017 at 06:57:43PM +0200, Andrew Lunn wrote:
> > I see what could be the issue but I do not understand one aspect though:
> > how could we switch from one PHY to another, as there's only one output
> > between the SoC (and so a given GoP#) and the board. So if a given PHY
> > can handle multiple modes I see, but in the other case a muxing
> > somewhere would be needed? Or did I miss something?
> 
> I think we need a hardware diagram...
> 
> How are the RJ45, copper PHY, SFP module connected to the SoC?
> 
> Somewhere there must be a mux, to select between copper and
> fibre. Where is that mux?

In the 88x3310 PHY:

 .--- RJ45
MVPP2 - 88x3310 PHY
 `--- SFP+

Here's the commentry I've provided at the very top of the 88x3310 driver
which describes all these modes:

 * There appears to be several different data paths through the PHY which
 * are automatically managed by the PHY.  The following has been determined
 * via observation and experimentation:
 *
 *   SGMII PHYXS -- BASE-T PCS -- 10G PMA -- AN -- Copper (for <= 1G)
 *  10GBASE-KR PHYXS -- BASE-T PCS -- 10G PMA -- AN -- Copper (for 10G)
 *  10GBASE-KR PHYXS -- BASE-R PCS -- Fiber
 *
 * If both the fiber and copper ports are connected, the first to gain
 * link takes priority and the other port is completely locked out.

It's not a copper-only PHY, it's just like most other PHYs out there
that support multiple connections, like the 88e151x series that support
both RJ45 and fibre and can auto-switch between them.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode

2017-08-24 Thread Russell King - ARM Linux

On Thu, Aug 24, 2017 at 07:45:19PM +0200, Andrew Lunn wrote:
> > The 88x3310 driver in the kernel knows about these combinations and
> > sets the phy interface parameter correctly depending on whether the
> > PHY has configured itself for copper at whatever speed or SFP+.
> 
> So when the PHY decides to swap from copper to fibre etc, is the
> phylib state machine kept up to date. Does it see a down, followed by
> an up?

I'd have to re-check to make sure, but I believe it does, because the
negotiation is held off on the "other" media until the currently active
link has gone down.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [EXT] Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode

2017-08-25 Thread Russell King - ARM Linux

On Thu, Aug 24, 2017 at 07:14:18PM +0200, Antoine Tenart wrote:
> On Thu, Aug 24, 2017 at 05:08:29PM +, Stefan Chulski wrote:
> > > > Imagine phylib is using the copper Ethernet PHY, but the MAC is using
> > > > the SFP port. Somebody pulls out the copper cable, phylib says the
> > > > link is down, turns the carrier off and calls the callback. Not good,
> > > > since your SFP cable is still plugged in...  Ethtool is
> > > > returning/setting stuff in the Copper Ethernet PHY, when in fact you
> > > > intend to be setting SFP settings.
> > > 
> > > I see what could be the issue but I do not understand one aspect though:
> > > how could we switch from one PHY to another, as there's only one output
> > > between the SoC (and so a given GoP#) and the board. So if a given PHY can
> > > handle multiple modes I see, but in the other case a muxing somewhere 
> > > would
> > > be needed? Or did I miss something?
> > 
> > I think PHY name and PHY mode struct that describe here both MAC to
> > PHY and PHY to PHY connection create confusion...  Serdes IP lane
> > doesn't care if connector is SFP, RJ45 or direct attached cable.
> > mvpp22_comphy_init only configures MAC to PHY
> > connection. SFI for 10G(KR in mainline), SGMII for 1G and HS_SGMII for
> > 2.5G.
> 
> So maybe one confusion was to name them PHY_MODE_10GKR and
> PHY_MODE_SGMII. It could be PHY_MODE_10G and PHY_MODE_1G instead.

SGMII mode supports 100M and 10M as well using data repetition, so 1G
makes it look like those speeds are not supported.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next v2 05/14] net: mvpp2: do not force the link mode

2017-08-25 Thread Russell King - ARM Linux

On Fri, Aug 25, 2017 at 04:48:12PM +0200, Antoine Tenart wrote:
> The link mode (speed, duplex) was forced based on what the phylib
> returns. This should not be the case, and only forced by ethtool
> functions manually. This patch removes the link mode enforcement from
> the phylib link_event callback.

So how does RGMII work (which has no in-band signalling between the PHY
and MAC)?

phylib expects the network driver to configure it according to the PHY
state at link_event time - I think you need to explain more why you
think that this is not necessary.

> 
> Signed-off-by: Antoine Tenart 
> ---
>  drivers/net/ethernet/marvell/mvpp2.c | 24 
>  1 file changed, 24 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
> b/drivers/net/ethernet/marvell/mvpp2.c
> index fab231858a41..498a4969dc58 100644
> --- a/drivers/net/ethernet/marvell/mvpp2.c
> +++ b/drivers/net/ethernet/marvell/mvpp2.c
> @@ -5741,30 +5741,10 @@ static void mvpp2_link_event(struct net_device *dev)
>   struct mvpp2_port *port = netdev_priv(dev);
>   struct phy_device *phydev = dev->phydev;
>   int status_change = 0;
> - u32 val;
>  
>   if (phydev->link) {
>   if ((port->speed != phydev->speed) ||
>   (port->duplex != phydev->duplex)) {
> - u32 val;
> -
> - val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> - val &= ~(MVPP2_GMAC_CONFIG_MII_SPEED |
> -  MVPP2_GMAC_CONFIG_GMII_SPEED |
> -  MVPP2_GMAC_CONFIG_FULL_DUPLEX |
> -  MVPP2_GMAC_AN_SPEED_EN |
> -  MVPP2_GMAC_AN_DUPLEX_EN);
> -
> - if (phydev->duplex)
> - val |= MVPP2_GMAC_CONFIG_FULL_DUPLEX;
> -
> - if (phydev->speed == SPEED_1000)
> - val |= MVPP2_GMAC_CONFIG_GMII_SPEED;
> - else if (phydev->speed == SPEED_100)
> - val |= MVPP2_GMAC_CONFIG_MII_SPEED;
> -
> - writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> -
>   port->duplex = phydev->duplex;
>   port->speed  = phydev->speed;
>   }
> @@ -5782,10 +5762,6 @@ static void mvpp2_link_event(struct net_device *dev)
>  
>   if (status_change) {
>   if (phydev->link) {
> - val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> - val |= (MVPP2_GMAC_FORCE_LINK_PASS |
> - MVPP2_GMAC_FORCE_LINK_DOWN);
> - writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
>   mvpp2_egress_enable(port);
>   mvpp2_ingress_enable(port);
>   } else {
> -- 
> 2.13.5
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next v2 09/14] net: mvpp2: dynamic reconfiguration of the PHY mode

2017-08-25 Thread Russell King - ARM Linux

On Fri, Aug 25, 2017 at 04:48:16PM +0200, Antoine Tenart wrote:
> This patch adds logic to reconfigure the comphy/gop when the link status
> change at runtime. This is very useful on boards such as the mcbin which
> have SFP and Ethernet ports connected to the same MAC port: depending on
> what the user connects the driver will automatically reconfigure the
> link mode.

This commit commentry needs updating - as I've already pointed out in
the previous round, the need to reconfigure things has *nothing* to do
with there being SFP and "Ethernet" ports present.  Hence, your commit
message is entirely misleading.

> 
> Signed-off-by: Antoine Tenart 
> ---
>  drivers/net/ethernet/marvell/mvpp2.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
> b/drivers/net/ethernet/marvell/mvpp2.c
> index 49a6789a4142..04e0c8ab7b51 100644
> --- a/drivers/net/ethernet/marvell/mvpp2.c
> +++ b/drivers/net/ethernet/marvell/mvpp2.c
> @@ -5740,6 +5740,7 @@ static void mvpp2_link_event(struct net_device *dev)
>  {
>   struct mvpp2_port *port = netdev_priv(dev);
>   struct phy_device *phydev = dev->phydev;
> + bool link_reconfigured = false;
>  
>   if (!netif_running(dev))
>   return;
> @@ -5750,9 +5751,27 @@ static void mvpp2_link_event(struct net_device *dev)
>   port->duplex = phydev->duplex;
>   port->speed  = phydev->speed;
>   }
> +
> + if (port->phy_interface != phydev->interface && port->comphy) {
> + /* disable current port for reconfiguration */
> + mvpp2_interrupts_disable(port);
> + netif_carrier_off(port->dev);
> + mvpp2_port_disable(port);
> + phy_power_off(port->comphy);
> +
> + /* comphy reconfiguration */
> + port->phy_interface = phydev->interface;
> + mvpp22_comphy_init(port);
> +
> + /* gop/mac reconfiguration */
> + mvpp22_gop_init(port);
> + mvpp2_port_mii_set(port);
> +
> + link_reconfigured = true;
> + }
>   }
>  
> - if (phydev->link != port->link) {
> + if (phydev->link != port->link || link_reconfigured) {
>   port->link = phydev->link;
>  
>   if (phydev->link) {
> -- 
> 2.13.5
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next v2 05/14] net: mvpp2: do not force the link mode

2017-08-28 Thread Russell King - ARM Linux

On Mon, Aug 28, 2017 at 10:38:37AM +0200, Marcin Wojtas wrote:
> Hi Antoine,
>
> Can you be 100% sure that when using SGMII with PHY's (like Marvell
> Alaska 88E1xxx series), is in-band link information always available?
> I'd be very cautious with such assumption and use in-band management
> only when set in the DT, like mvneta. I think phylib can properly can
> do its work when MDIO connection is provided on the board.

There is another issue to be aware of: if you're wanting to use flow
control autonegotiation, that is not carried across SGMII's in-band
signalling.  If you want to use SGMII's in-band signalling for the
duplex and speed information, you still need phylib's notification
to properly set the flow control.

Switching mvpp2 to use phylink (which is needed for the 1G SFP slot on
mcbin) will handle all this for you - dealing with both in-band and
out-of-band negotiation methods, and combining them in the appropriate
manner for the selected operation mode.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next v2 05/14] net: mvpp2: do not force the link mode

2017-08-28 Thread Russell King - ARM Linux

On Mon, Aug 28, 2017 at 11:40:51AM +0200, Antoine Tenart wrote:
> On Mon, Aug 28, 2017 at 09:51:52AM +0100, Russell King - ARM Linux wrote:
> > On Mon, Aug 28, 2017 at 10:38:37AM +0200, Marcin Wojtas wrote:
> > >
> > > Can you be 100% sure that when using SGMII with PHY's (like Marvell
> > > Alaska 88E1xxx series), is in-band link information always available?
> > > I'd be very cautious with such assumption and use in-band management
> > > only when set in the DT, like mvneta. I think phylib can properly can
> > > do its work when MDIO connection is provided on the board.
> > 
> > There is another issue to be aware of: if you're wanting to use flow
> > control autonegotiation, that is not carried across SGMII's in-band
> > signalling.  If you want to use SGMII's in-band signalling for the
> > duplex and speed information, you still need phylib's notification
> > to properly set the flow control.
> > 
> > 
> > Switching mvpp2 to use phylink (which is needed for the 1G SFP slot on
> > mcbin) will handle all this for you - dealing with both in-band and
> > out-of-band negotiation methods, and combining them in the appropriate
> > manner for the selected operation mode.
> > 
> 
> So probably the best move here is to remove this patch, and wait for the
> phylink support in the PPv2 driver.

I've nothing on that specifically for the mvpp2 driver - what I have is
for mvneta and the Marvell mvpp2x driver, with GMAC support extracted
from mvneta (that last bit is rather dirty at the moment so not
published anywhere, and doesn't cater for PP v2.1 at all.)

I ought to have posted the mvneta part of the phylink patches, but I
didn't get around to it early enough in this cycle - there are probably
quite a number of conflicts with net-next now, so I think it's too late
to submit it for mainline.

I know Andrew has already looked at them in my git tree as part of the
review of phylink when that was merged - which should be adequate to
give an example of how to implement it for the mainline PP v2 driver.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next] net: mvpp2: phylink support

2017-10-09 Thread Russell King - ARM Linux

On Mon, Oct 09, 2017 at 02:55:27PM +0200, Antoine Tenart wrote:
> Hi Russell,
> 
> On Mon, Sep 25, 2017 at 11:55:14AM +0200, Antoine Tenart wrote:
> > On Fri, Sep 22, 2017 at 12:07:31PM +0100, Russell King - ARM Linux wrote:
> > > On Thu, Sep 21, 2017 at 03:45:22PM +0200, Antoine Tenart wrote:
> > 
> > > > +static int mvpp2_phylink_mac_link_state(struct net_device *dev,
> > > > +   struct phylink_link_state 
> > > > *state)
> > > > +{
> > > > +   struct mvpp2_port *port = netdev_priv(dev);
> > > > +   u32 val;
> > > > +
> > > > +   if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
> > > > +   port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> > > > +   return 0;
> > > 
> > > You're blocking this for 1000base-X and 10G connections, which is not
> > > correct.  The expectation is that this function returns the current
> > > MAC state irrespective of the interface mode.
> > 
> > I moved what was already supported in the PPv2 driver and did not
> > implemented the full set of what is supported. It's not perfect, but it
> > does move what was already supported.
> > 
> > Any reason not to first move what's already supported to phylink, and
> > then add more supported modes in separate patches?
> 
> Any thoughts on this?

You're asking me to comment about something I know little about as
I've not used mvpp2.c.  I don't know the details of what your "already
supported" statement refers to.  Maybe you could give some clues -
maybe produce a list of what mvpp2 currently supports?

Here's the link modes that phylink supports:
1. PHY based links
2. PHYless fixed links with details specified in DT, in the same way as
   the existing "fixed-link" support works, but without needing to create
   fake PHYs.
3. PHYless fixed links with GPIO link indication (again, same way as the
   existing fixed-link support.)
4. Direct fibre connections via fixed-link or SFP.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH v9 00/20] simplify crypto wait for async op

2017-10-17 Thread Russell King - ARM Linux

On Sun, Oct 15, 2017 at 10:19:45AM +0100, Gilad Ben-Yossef wrote:
> Many users of kernel async. crypto services have a pattern of
> starting an async. crypto op and than using a completion
> to wait for it to end.
> 
> This patch set simplifies this common use case in two ways:
> 
> First, by separating the return codes of the case where a
> request is queued to a backlog due to the provider being
> busy (-EBUSY) from the case the request has failed due
> to the provider being busy and backlogging is not enabled
> (-EAGAIN).
> 
> Next, this change is than built on to create a generic API
> to wait for a async. crypto operation to complete.
> 
> The end result is a smaller code base and an API that is
> easier to use and more difficult to get wrong.
> 
> The patch set was boot tested on x86_64 and arm64 which
> at the very least tests the crypto users via testmgr and
> tcrypt but I do note that I do not have access to some
> of the HW whose drivers are modified nor do I claim I was
> able to test all of the corner cases.
> 
> The patch set is based upon linux-next release tagged
> next-20171013.

Has there been any performance impact analysis of these changes?  I
ended up with patches for one of the crypto drivers which converted
its interrupt handling to threaded interrupts being reverted because
it caused a performance degredation.

Moving code to latest APIs to simplify it is not always beneficial.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-07-04 Thread Russell King - ARM Linux

Subject says offlist, but this isn't...

On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote:
> Sorry for the delay on this from my end. I noticed there was some bpf
> bits land in the last net fixes pull request landed Monday so I built
> a kernel with the JIT reenabled. It seems it's improved in that the
> completely dead no output boot has gone but the original problem that
> arrived in the merge window still persists:
> 
> [   17.564142] note: systemd-udevd[194] exited with preempt_count 1
> [   17.592739] Unable to handle kernel NULL pointer dereference at
> virtual address 000c
> [   17.601002] pgd = (ptrval)
> [   17.603819] [000c] *pgd=
> [   17.607487] Internal error: Oops: 805 [#10] SMP ARM
> [   17.612396] Modules linked in:
> [   17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G  D
> 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1
> [   17.626056] Hardware name: Generic AM33XX (Flattened Device Tree)
> [   17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc
> [   17.637102] LR is at   (null)
> [   17.640086] pc : []lr : [<>]psr: 6013
> [   17.646384] sp : cfe1dd48  ip :   fp : 
> [   17.651635] r10: d837e000  r9 : d833be00  r8 : 
> [   17.656887] r7 : 0001  r6 : e003d000  r5 :   r4 : 
> [   17.663447] r3 : 0007  r2 :   r1 :   r0 : 
> [   17.670009] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> none
> [   17.677180] Control: 10c5387d  Table: 8fe20019  DAC: 0051
> [   17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval))
> [   17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000)

Can you provide a full disassembly of sk_filter_trim_cap from vmlinux
(iow, annotated with its linked address) for the above dump please -
alternatively a new dump with matching disassembly.  Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-07-05 Thread Russell King - ARM Linux

On Thu, Jul 05, 2018 at 12:41:54AM +0100, Russell King - ARM Linux wrote:
> Subject says offlist, but this isn't...
> 
> On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote:
> > Sorry for the delay on this from my end. I noticed there was some bpf
> > bits land in the last net fixes pull request landed Monday so I built
> > a kernel with the JIT reenabled. It seems it's improved in that the
> > completely dead no output boot has gone but the original problem that
> > arrived in the merge window still persists:
> > 
> > [   17.564142] note: systemd-udevd[194] exited with preempt_count 1
> > [   17.592739] Unable to handle kernel NULL pointer dereference at
> > virtual address 000c
> > [   17.601002] pgd = (ptrval)
> > [   17.603819] [000c] *pgd=
> > [   17.607487] Internal error: Oops: 805 [#10] SMP ARM
> > [   17.612396] Modules linked in:
> > [   17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G  D
> > 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1
> > [   17.626056] Hardware name: Generic AM33XX (Flattened Device Tree)
> > [   17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc
> > [   17.637102] LR is at   (null)
> > [   17.640086] pc : []lr : [<>]psr: 6013
> > [   17.646384] sp : cfe1dd48  ip :   fp : 
> > [   17.651635] r10: d837e000  r9 : d833be00  r8 : 
> > [   17.656887] r7 : 0001  r6 : e003d000  r5 :   r4 : 
> > [   17.663447] r3 : 0007  r2 :   r1 :   r0 : 
> > [   17.670009] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> > none
> > [   17.677180] Control: 10c5387d  Table: 8fe20019  DAC: 0051
> > [   17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval))
> > [   17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000)
> 
> Can you provide a full disassembly of sk_filter_trim_cap from vmlinux
> (iow, annotated with its linked address) for the above dump please -
> alternatively a new dump with matching disassembly.  Thanks.

Also probably a good idea to have bpf_jit_enable set to 2 to get a
dump of the bpf program being run, which I think for your problem,
you'll have to hack the kernel source to do that.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

[PATCH 00/13] ARM BPF jit compiler improvements

2018-07-10 Thread Russell King - ARM Linux

Hi,

This series improves the ARM BPF JIT compiler by:
- enumerating the stack layout rather than using constants that happen
  to be multiples of four
- rejig the BPF "register" accesses to use negative numbers instead of
  positive, which could be confused with register numbers in the bpf2a32
  array.
- since we maintain the ARM FP register as a pointer to the top of our
  scratch space (or, with frame pointers enabled, a valid ARM frame
  pointer register), we can access our scratch space using FP, which is
  constant across all BPF programs, including tail-called programs.
- use immediate forms of ARM instructions where possible, rather than
  first loading the immediate into an ARM register.
- use load-with-shift instruction rather than seperate shift instruction
  followed by load
- avoid reloading index and array in the tail-call code
- use double-word load/store instructions where available

 arch/arm/net/bpf_jit_32.c | 927 +++---
 arch/arm/net/bpf_jit_32.h |  44 +--
 2 files changed, 493 insertions(+), 478 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

Re: [PATCH net-next 13/13] ARM: net: bpf: use double-word load/stores where available

2018-07-10 Thread Russell King - ARM Linux

On Tue, Jul 10, 2018 at 10:03:33AM -0700, Olof Johansson wrote:
> Hi Russell,
> > @@ -663,13 +679,27 @@ static inline void emit_a32_mov_r(const s8 dst, const 
> > s8 src,
> >  static inline void emit_a32_mov_r64(const bool is64, const s8 dst[],
> >   const s8 src[],
> >   struct jit_ctx *ctx) {
> > -   emit_a32_mov_r(dst_lo, src_lo, ctx);
> > -   if (is64) {
> > +   if (!is64) {
> > +   emit_a32_mov_r(dst_lo, src_lo, ctx);
> > +   /* Zero out high 4 bytes */
> > +   emit_a32_mov_i(dst_hi, 0, ctx);
> > +   } else if (__LINUX_ARM_ARCH__ < 6 &&
> > +  ctx->cpu_architecture < CPU_ARCH_ARMv5) {
> > /* complete 8 byte move */
> > +   emit_a32_mov_r(dst_lo, src_lo, ctx);
> > emit_a32_mov_r(dst_hi, src_hi, ctx);
> 
> 
> Tiny nit: Looks like you compare for >= ARMv5TE above and  I'm not aware of any vanilla v5 implementations (all I can find are
> v5TE or <=v4T), so it doesn't seem like something actually causing
> problems. Mostly pointing it out for consistency's sake.

They're rare - I think the only one is an ARM1020 (ARMv5T) as opposed
to the ARM1020E (ARMv5TE).  Whether any are in the wild or not is
another matter.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

Re: [PATCH net-next 01/13] ARM: net: bpf: enumerate the JIT scratch stack layout

2018-07-10 Thread Russell King - ARM Linux

On Tue, Jul 10, 2018 at 08:30:04PM +0200, Daniel Borkmann wrote:
> Hi Russell,
> 
> thanks a lot for your work on the arm32 JIT!
> 
> On 07/10/2018 02:36 PM, Russell King wrote:
> > Enumerate the contents of the JIT scratch stack layout used for storing
> > some of the JITs 64-bit registers, tail call counter and AX register.
> > 
> > XXX: what about the skb_copy_bits buffer - this appears to overlap with
> > the first word of the JITs accessible stack.
> 
> Could you elaborate on that case? Unless I'm missing something there should
> be no use of the skb_copy_bits buffer anymore (aka former SKB_BUFFER at
> STACK_VAR(SCRATCH_SIZE) offset), but aside from that it's not supposed to
> overlap either.

Probably an old comment - these were originally developed back in
January timeframe when there was the SKB_BUFFER stuff, but that was
removed during the 4.18 merge window.  I'll kill the comment.

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

[PATCH 00/14] ARM BPF jit compiler improvements

2018-07-11 Thread Russell King - ARM Linux

Hi,

This series improves the ARM BPF JIT compiler by:
- enumerating the stack layout rather than using constants that happen
  to be multiples of four
- rejig the BPF "register" accesses to use negative numbers instead of
  positive, which could be confused with register numbers in the bpf2a32
  array.
- since we maintain the ARM FP register as a pointer to the top of our
  scratch space (or, with frame pointers enabled, a valid ARM frame
  pointer register), we can access our scratch space using FP, which is
  constant across all BPF programs, including tail-called programs.
- use immediate forms of ARM instructions where possible, rather than
  first loading the immediate into an ARM register.
- use load-with-shift instruction rather than seperate shift instruction
  followed by load
- avoid reloading index and array in the tail-call code
- use double-word load/store instructions where available

Version 2:
- Fix ARMv5 test pointed out by Olof
- Fix build error found by 0-day (adding an additional patch)

 arch/arm/net/bpf_jit_32.c | 982 --
 arch/arm/net/bpf_jit_32.h |  42 +-
 2 files changed, 543 insertions(+), 481 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

[PATCH net-next 0/4] Further ARM BPF jit compiler improvements

2018-07-12 Thread Russell King - ARM Linux

Four further jit compiler improves for 32-bit ARM.

 arch/arm/net/bpf_jit_32.c | 120 --
 1 file changed, 73 insertions(+), 47 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

Re: [PATCH 00/14] ARM BPF jit compiler improvements

2018-07-12 Thread Russell King - ARM Linux

On Thu, Jul 12, 2018 at 09:02:41PM +0200, Daniel Borkmann wrote:
> Applied to bpf-next, thanks a lot Russell!

Thanks, I've just sent four more patches, which is the sum total of
what I'm intending to send for BPF improvements for the next merge
window.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

Re: [PATCH 00/14] ARM BPF jit compiler improvements

2018-07-12 Thread Russell King - ARM Linux

On Thu, Jul 12, 2018 at 11:12:45PM +0200, Daniel Borkmann wrote:
> On 07/12/2018 11:02 PM, Russell King - ARM Linux wrote:
> > On Thu, Jul 12, 2018 at 09:02:41PM +0200, Daniel Borkmann wrote:
> >> Applied to bpf-next, thanks a lot Russell!
> > 
> > Thanks, I've just sent four more patches, which is the sum total of
> > what I'm intending to send for BPF improvements for the next merge
> > window.
> 
> Great, thanks a lot for the batch of improvements, Russell!
> 
> Did you manage to get the BPF kselftest suite working on arm32 under
> tools/testing/selftests/bpf/? In particular the test_verfier with
> bpf_jit_enabled set to 1 and test_kmod.sh has a bigger number of
> runtime tests that would stress it.

I have a big issue with almost all of the tools/ subdirectory, and
that is that it isn't "portable".

It seems that cross-build environments just weren't considered when
the tools subdirectory was created - it appears to require the entire
kernel tree and build tree to be accessible on the target in order
to build almost everything there.  (I also exclusively do split-object
builds, I never do an in-source-tree build.)

At least perf has the ability to ask Kbuild to package it up as a
tar.* file.  That can be easily transported to the target as a
self-contained buildable tree, and then be able to built from that.

My cross-build environment for the kernel is just for building
kernels, it does not have the facilities to build for userspace - I
have a wide range of userspaces across targets, with a multitude of
different glibc versions, and even when they're compatible versions,
they're built differently.

As far as I can see, basically, most tools/ stuff requires too much
effort to work around this to be of any use to me.  Even if I did
unpick it from the kernel source tree by hand, that would be wasted
effort, because I'd need to repeat that same process whenever
anything there gets updated.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR interface to use in-band negotiation

2018-03-16 Thread Russell King - ARM Linux

On Fri, Mar 16, 2018 at 11:33:43AM +0100, Antoine Tenart wrote:
> The PHY mode 10GKR can use in-band negotiation. This patches allows this
> mode to be used with MLO_AN_INBAND in phylink.
> 
> Signed-off-by: Antoine Tenart 
> ---
>  drivers/net/phy/phylink.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 51a011a349fe..7224b005f0dd 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -768,7 +768,8 @@ int phylink_of_phy_connect(struct phylink *pl, struct 
> device_node *dn,
>   /* Fixed links and 802.3z are handled without needing a PHY */
>   if (pl->link_an_mode == MLO_AN_FIXED ||
>   (pl->link_an_mode == MLO_AN_INBAND &&
> -  phy_interface_mode_is_8023z(pl->link_interface)))
> +  (phy_interface_mode_is_8023z(pl->link_interface) ||
> +   pl->link_interface == PHY_INTERFACE_MODE_10GKR)))

There is no inband negotiation like there is with 802.3z or SGMII,
so this makes no sense.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 03/10] net: mvpp2: phylink support

2018-03-16 Thread Russell King - ARM Linux

On Fri, Mar 16, 2018 at 11:33:44AM +0100, Antoine Tenart wrote:
> +static void mvpp2_phylink_validate(struct net_device *dev,
> +unsigned long *supported,
> +struct phylink_link_state *state)
> +{
> + __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
> +
> + phylink_set(mask, Autoneg);
> + phylink_set_port_modes(mask);
> + phylink_set(mask, Pause);
> + phylink_set(mask, Asym_Pause);
> +
> + phylink_set(mask, 10baseT_Half);
> + phylink_set(mask, 10baseT_Full);
> + phylink_set(mask, 100baseT_Half);
> + phylink_set(mask, 100baseT_Full);
> + phylink_set(mask, 1000baseT_Full);
> + phylink_set(mask, 1000baseX_Full);

AFAICS, the driver (before these patches) does not support 1000baseX
as it always clears the MVPP2_GMAC_PORT_TYPE_MASK bit, so adding this
mode should be part of the patch adding 1000baseX support.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR interface to use in-band negotiation

2018-03-19 Thread Russell King - ARM Linux

On Mon, Mar 19, 2018 at 09:52:52AM +0100, Antoine Tenart wrote:
> Hi Russell,
> 
> On Fri, Mar 16, 2018 at 03:53:07PM +, Russell King - ARM Linux wrote:
> > On Fri, Mar 16, 2018 at 11:33:43AM +0100, Antoine Tenart wrote:
> > > The PHY mode 10GKR can use in-band negotiation. This patches allows this
> > > mode to be used with MLO_AN_INBAND in phylink.
> > > 
> > > Signed-off-by: Antoine Tenart 
> > > ---
> > >  drivers/net/phy/phylink.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > > index 51a011a349fe..7224b005f0dd 100644
> > > --- a/drivers/net/phy/phylink.c
> > > +++ b/drivers/net/phy/phylink.c
> > > @@ -768,7 +768,8 @@ int phylink_of_phy_connect(struct phylink *pl, struct 
> > > device_node *dn,
> > >   /* Fixed links and 802.3z are handled without needing a PHY */
> > >   if (pl->link_an_mode == MLO_AN_FIXED ||
> > >   (pl->link_an_mode == MLO_AN_INBAND &&
> > > -  phy_interface_mode_is_8023z(pl->link_interface)))
> > > +  (phy_interface_mode_is_8023z(pl->link_interface) ||
> > > +   pl->link_interface == PHY_INTERFACE_MODE_10GKR)))
> > 
> > There is no inband negotiation like there is with 802.3z or SGMII,
> > so this makes no sense.
> 
> Oh, that's what I feared. I read some docs but probably will need more
> :)
> 
> Anyway, the reason to use in-band negotiation was also to avoid using
> fixed-link. It would work but always report the link is up, which for
> the user isn't a great experience as we have a way to detect this.
> 
> What would you suggest to achieve this in a reasonable way?

The intention of this test in phylink_of_phy_connect() is to avoid
failing when there is no requirement for a PHY to be present (such as
a fixed link, or an 802.3z link.)  However, with 10G PHYs such as the
3310, we need the PHY so we can read the speed from it, and so know
whether to downgrade the MAC to SGMII mode, or having downgraded the
MAC, upgrade it back to 10G mode when the PHY switches to 10G.

I'm guessing that you're wanting this for the DB boards, but I don't
see why.  Do they not have PHYs?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [EXT] Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR interface to use in-band negotiation

2018-03-19 Thread Russell King - ARM Linux

On Mon, Mar 19, 2018 at 01:01:07PM +, Yan Markman wrote:
> The DTS-patch for this board (in "old" format) is attached
> 
> 
> Yan Markman
> Tel. 05-44732819
> 
> 
> -Original Message-
> From: Stefan Chulski 
> Sent: Monday, March 19, 2018 2:58 PM
> To: Russell King - ARM Linux ; Antoine Tenart 
> 
> Cc: da...@davemloft.net; kis...@ti.com; gregory.clem...@bootlin.com; 
> and...@lunn.ch; ja...@lakedaemon.net; sebastian.hesselba...@gmail.com; 
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; 
> thomas.petazz...@bootlin.com; maxime.chevall...@bootlin.com; 
> miquel.ray...@bootlin.com; Nadav Haklai ; Yan Markman 
> ; m...@semihalf.com; 
> linux-arm-ker...@lists.infradead.org
> Subject: RE: [EXT] Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR 
> interface to use in-band negotiation
> 
> > > > There is no inband negotiation like there is with 802.3z or SGMII, 
> > > > so this makes no sense.
> > >
> > > Oh, that's what I feared. I read some docs but probably will need 
> > > more
> > > :)
> > >
> > > Anyway, the reason to use in-band negotiation was also to avoid 
> > > using fixed-link. It would work but always report the link is up, 
> > > which for the user isn't a great experience as we have a way to detect 
> > > this.
> > >
> > > What would you suggest to achieve this in a reasonable way?
> > 
> > The intention of this test in phylink_of_phy_connect() is to avoid 
> > failing when there is no requirement for a PHY to be present (such as 
> > a fixed link, or an 802.3z link.)  However, with 10G PHYs such as the 
> > 3310, we need the PHY so we can read the speed from it, and so know 
> > whether to downgrade the MAC to SGMII mode, or having downgraded the 
> > MAC, upgrade it back to 10G mode when the PHY switches to 10G.
> > 
> > I'm guessing that you're wanting this for the DB boards, but I don't see 
> > why.
> > Do they not have PHYs?
> 
> New Solid Run board MACCHIATObin Single Shot doesn't has  3310 PHY either, 
> like DB boards.
> https://www.cnx-software.com/2017/12/20/solidrun-macchiatobin-single-shot-networking-board-launched-for-269-and-up/

Correct, but this DTS is wrong.  It connects to a SFP cage, and as SFP
cages are supported in mainline now, there's no need to mess around
with fixed links or similar.

I haven't tested phylink in that configuration yet as SolidRun haven't
sent me a SingleShot board yet - and I need any board I do get to have
the pull-up resistors on the I2C lines of the correct value, because
I'm not risking corruption of the EEPROMs in my SFP* modules.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 02/10] net: phy: phylink: allow 10GKR interface to use in-band negotiation

2018-03-19 Thread Russell King - ARM Linux

On Mon, Mar 19, 2018 at 02:10:09PM +0100, Antoine Tenart wrote:
> Hi Andrew,
> 
> On Mon, Mar 19, 2018 at 01:59:53PM +0100, Andrew Lunn wrote:
> > 
> > If they don't have PHYs, how are the connected to the outside world?
> 
> On 7k/8k you have the following scheme for 10G only interfaces:
> 
>MAC -- Comphy -- PHY -- SFP cage -- ...
> 
> Or
> 
>MAC -- Comphy -- SFP cage -- ...
> 
> The comphy provides serdes lanes, and can be configured in various
> modes (SGMII, 2500SGMII, 10GKR...).

Right - the correct mode is dependent on the SFP module plugged into
the cage.  Trying to describe this by ignoring the SFP cage isn't
going to work out well for end-user functionality, though is fine if
you're just hacking a configuration to test (which would not be
suitable for mainline kernels!)

As I've recently replied to Yan, this is a configuration I haven't
tested yet, and it's entirely possible that phylink may need some
tweaks for it.

What you have is a very similar setup to what is on Clearfog with
its SFP cage, where the SFP cage is connected directly to the
Armada 388.  That only has to deal with 2500base-X / 1000base-X /
SGMII and not 10G.

What I want is to avoid hacks as much as possible here - if there is
a short-coming with SFP/phylink here, we need to address that
properly.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1206 matches

Mail list logo