This is just a drive by guess, but I think this is a driver issue.


Issue 1 seems like a red herring, cat doesn't modify output, nor does ethtool
know if its output is going to a console or a pipe, its all the same.  And given
issue 2 (that the output of the thresholds, etc are spurriously changing and
wrong), suggests that they are spurriously changing and wrong regardless of what
cat does.

That said, I think issue two is a problem with the mlx4 driver.  Specifically
that the driver is copying garbage data.

The three ethtool functions at work here are:
mlx4_en_get_module_info
mlx4_en_get_module_eeprom
mlx4_get_module_info

When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info
to determine the length of the eeprom, and that value will be either 256 or 512
bytes.  Lets assume that the value is 256 for the sake of argument

Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually
read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data
from hardware, again, passing in 256 as the size for the first call (theres a
loop, but it will only get executed once in this scenario)

mlx4_get_module_info then issues the appropriate mailbox commands to dump the
eeprom.  Here it starts to go sideways.  The mailbox buffer allocated for the
return data is of type mlx4_mad_ifc, which has some front matter information and
a data buffer that is 192 bytes long!

A little further down in the function, size gets restricted if the buffer
crosses a page boundary, but given that the size is 256 on the first call here,
and offset is zero on the first call, we're not crossing anything, so size
remains unchanged.

The output mailbox buffer outmad->data (a 192 byte array), then gets cast to a
sturt mlx4_cable_info structure, which has its own internal data buffer that is
only 48 bytes long.

Hi guys,
Thanks for digging into it.
Here are some observations I found:

1. Chris system has CX4 (which is served by mlx5 driver), all analysis by Neil was done over mlx4 driver (which serves the older generation of NICs. e.h CX3Pro).
2. In general, MAD commands are limited to 192 bytes of data.
3. CableInfo MAD command info is limited to 48 Bytes.
4. First check that mlx4_get_module_info is having:
        if (size > MODULE_INFO_MAX_READ)
                size = MODULE_INFO_MAX_READ;
So this is the info that were missing in the analysis. x <= 48 is also returned by this function. No trash copy or overrun. It is expected from the caller(also inside mlx4) to recall with new offset in order to fetch more data.

5. I reviewed mlx5 driver, and it have reading mechanism (small diff: via MCIA register and not via MAD)

Both drivers read up to 256 bytes. 0-127 (from page 0). and 128-256 (from page 0). Driver is not capable of reading over 256 bytes currently.

looking on qsfp.c parser in ethtool.c (user space), I see an uninitialized bug issue that have caused bug #1 + #2. Applied it locally solved the issue (Not showing alarm data, which should be expected as driver do not fill it).

diff --git a/qsfp.c b/qsfp.c
index 32e195d12dc0..d196aa1753de 100644
--- a/qsfp.c
+++ b/qsfp.c
@@ -671,7 +671,7 @@ static void sff8636_dom_parse(const __u8 *id, struct sff_diags *sd)

 static void sff8636_show_dom(const __u8 *id, __u32 eeprom_len)
 {
-   struct sff_diags sd;
+ struct sff_diags sd = {0};
        char *rx_power_string = NULL;
        char power_string[MAX_DESC_SIZE];
        int i;

I will soon post a fix for it.

Thanks,
Eran


Thanks,
Eran


The memcpy in this functionthen copies cable_info->data to the buffer that gets
returned to ethtool, but it copies size bytes (256), even though the source data
buffer is only 48 bytes long.  That 48 byte array is embedded in the larger 192
byte structure, so there won't be a panic on the overrun, but theres no telling
what garbage is in the buffer beyond those first 48 bytes.  Even if the
remaining 144 bytes have valid eeprom data, its less than the required 256
bytes.  The additional copy may cause a panic, but if the buffer commonly bumps
up against other allocated memory, that will go unnoticed.

after the memcpy, mlx4_get_module_info just returns the size of the passed in
buffer (256), and so the calling function thinks its work is done, and lets  the
kernel send back the buffer with garbage data to ethtool.

I think the mlx4 guys have some work to do here.

My $0.02
Neil

Reply via email to