This patch set aims to avoid dumping registers, data structures and
coredump to dmesg and also to reduce the code size of the qlge driver.
As pointed out by Benjamin [1],
> At 2000 lines, qlge_dbg.c alone is larger than some entire ethernet
> drivers. Most of what it does is dump kernel data structures or pci
> memory mapped registers to dmesg. There are better facilities for that.
> My thinking is not simply to delete qlge_dbg.c but to replace it, making
> sure that most of the same information is still available. For data
> structures, crash or drgn can be used; possibly with a script for the
> latter which formats the data. For pci registers, they should be
> included in the ethtool register dump and a patch added to ethtool to
> pretty print them. That's what other drivers like e1000e do. For the
> "coredump", devlink health can be used.
So the debugging features are re-written following Benjamin's advice,
- use ethtool to dump registers
- dump kernel data structures in drgn
- use devlink health to do coredump
The get_regs ethtool_ops has already implemented. What lacks is a patch
for the userland ethtool to do the pretty-printing. I haven't yet provided
a patch to the userland ethtool because I'm aware ethtool is moving towards
the netlink interface [2]. I'm curious if a generalized mechanism of
pretty-printing will be implemented thus making pretty-printing for a
specific driver unnecessary. As of this writing, `-d|--register-dump`
hasn't been implemented for the netlink interface.
To dump kernel data structures, the following Python script can be used
in drgn,
```python
def align(x, a):
"""the alignment a should be a power of 2
"""
mask = a - 1
return (x+ mask) & ~mask
def struct_size(struct_type):
struct_str = "struct {}".format(struct_type)
return sizeof(Object(prog, struct_str, address=0x0))
def netdev_priv(netdevice):
NETDEV_ALIGN = 32
return netdevice.value_() + align(struct_size("net_device"),
NETDEV_ALIGN)
name = 'xxx'
qlge_device = None
netdevices = prog['init_net'].dev_base_head.address_of_()
for netdevice in list_for_each_entry("struct net_device", netdevices,
"dev_list"):
if netdevice.name.string_().decode('ascii') == name:
print(netdevice.name)
ql_adapter = Object(prog, "struct ql_adapter",
address=netdev_priv(qlge_device))
```
The struct ql_adapter will be printed in drgn as follows,
>>> ql_adapter
(struct ql_adapter){
.ricb = (struct ricb){
.base_cq = (u8)0,
.flags = (u8)120,
.mask = (__le16)26637,
.hash_cq_id = (u8 [1024]){ 172, 142, 255, 255 },
.ipv6_hash_key = (__le32 [10]){},
.ipv4_hash_key = (__le32 [4]){},
},
.flags = (unsigned long)0,
.wol = (u32)0,
.nic_stats = (struct nic_stats){
.tx_pkts = (u64)0,
.tx_bytes = (u64)0,
.tx_mcast_pkts = (u64)0,
.tx_bcast_pkts = (u64)0,
.tx_ucast_pkts = (u64)0,
.tx_ctl_pkts = (u64)0,
.tx_pause_pkts = (u64)0,
...
},
.active_vlans = (unsigned long [64]){
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 52780853100545, 18446744073709551615,
18446619461681283072, 0, 42949673024, 2147483647,
},
.rx_ring = (struct rx_ring [17]){
{
.cqicb = (struct cqicb){
.msix_vect = (u8)0,
.reserved1 = (u8)0,
.reserved2 = (u8)0,
.flags = (u8)0,
.len = (__le16)0,
.rid = (__le16)0,
...
},
.cq_base = (void *)0x0,
.cq_base_dma = (dma_addr_t)0,
}
...
}
}
And the coredump obtained via devlink in json format looks like,
$ devlink health dump show DEVICE reporter coredump -p -j
{
"Core Registers": {
"segment": 1,
"values": [
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
]
},
"Test Logic Regs": {
"segment": 2,
"values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
},
"RMII Registers": {
"segment": 3,
"values": [
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
]
},
...
"Sem Registers": {
"segment": 50,
"values": [ 0,0,0,0 ]
}
}
Since I don't have a QLGE device and neither could I find a software
simulator, I put some functions into e1000 to get the above result.
I notice with the qlge_force_coredump module parameter set, ethtool
can also get the coredump. I'm not sure which tool is more suitable for
the coredump feature.
[1] https://lkml.org/lkml/2020/6/30/19
[2] https://www.kernel.org/doc/html/latest/networking/ethtool-netlink.html
Coiby Xu (3):
Initialize devlink health dump framework for the dlge driver
coredump via devlink health reporter
clean up code that dump info to dmesg
drivers/staging/qlge/Makefile | 2 +-
drivers/staging/qlge/qlge.h | 91 +---
drivers/staging/qlge/qlge_dbg.c | 672 ----------------------------
drivers/staging/qlge/qlge_ethtool.c | 1 -
drivers/staging/qlge/qlge_health.c | 156 +++++++
drivers/staging/qlge/qlge_health.h | 2 +
drivers/staging/qlge/qlge_main.c | 27 +-
7 files changed, 189 insertions(+), 762 deletions(-)
create mode 100644 drivers/staging/qlge/qlge_health.c
create mode 100644 drivers/staging/qlge/qlge_health.h
--
2.27.0