This patch set aims to avoid dumping registers, data structures and coredump to dmesg and also to reduce the code size of the qlge driver.
As pointed out by Benjamin [1], > At 2000 lines, qlge_dbg.c alone is larger than some entire ethernet > drivers. Most of what it does is dump kernel data structures or pci > memory mapped registers to dmesg. There are better facilities for that. > My thinking is not simply to delete qlge_dbg.c but to replace it, making > sure that most of the same information is still available. For data > structures, crash or drgn can be used; possibly with a script for the > latter which formats the data. For pci registers, they should be > included in the ethtool register dump and a patch added to ethtool to > pretty print them. That's what other drivers like e1000e do. For the > "coredump", devlink health can be used. So the debugging features are re-written following Benjamin's advice, - use ethtool to dump registers - dump kernel data structures in drgn - use devlink health to do coredump The get_regs ethtool_ops has already implemented. What lacks is a patch for the userland ethtool to do the pretty-printing. I haven't yet provided a patch to the userland ethtool because I'm aware ethtool is moving towards the netlink interface [2]. I'm curious if a generalized mechanism of pretty-printing will be implemented thus making pretty-printing for a specific driver unnecessary. As of this writing, `-d|--register-dump` hasn't been implemented for the netlink interface. To dump kernel data structures, the following Python script can be used in drgn, ```python def align(x, a): """the alignment a should be a power of 2 """ mask = a - 1 return (x+ mask) & ~mask def struct_size(struct_type): struct_str = "struct {}".format(struct_type) return sizeof(Object(prog, struct_str, address=0x0)) def netdev_priv(netdevice): NETDEV_ALIGN = 32 return netdevice.value_() + align(struct_size("net_device"), NETDEV_ALIGN) name = 'xxx' qlge_device = None netdevices = prog['init_net'].dev_base_head.address_of_() for netdevice in list_for_each_entry("struct net_device", netdevices, "dev_list"): if netdevice.name.string_().decode('ascii') == name: print(netdevice.name) ql_adapter = Object(prog, "struct ql_adapter", address=netdev_priv(qlge_device)) ``` The struct ql_adapter will be printed in drgn as follows, >>> ql_adapter (struct ql_adapter){ .ricb = (struct ricb){ .base_cq = (u8)0, .flags = (u8)120, .mask = (__le16)26637, .hash_cq_id = (u8 [1024]){ 172, 142, 255, 255 }, .ipv6_hash_key = (__le32 [10]){}, .ipv4_hash_key = (__le32 [4]){}, }, .flags = (unsigned long)0, .wol = (u32)0, .nic_stats = (struct nic_stats){ .tx_pkts = (u64)0, .tx_bytes = (u64)0, .tx_mcast_pkts = (u64)0, .tx_bcast_pkts = (u64)0, .tx_ucast_pkts = (u64)0, .tx_ctl_pkts = (u64)0, .tx_pause_pkts = (u64)0, ... }, .active_vlans = (unsigned long [64]){ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52780853100545, 18446744073709551615, 18446619461681283072, 0, 42949673024, 2147483647, }, .rx_ring = (struct rx_ring [17]){ { .cqicb = (struct cqicb){ .msix_vect = (u8)0, .reserved1 = (u8)0, .reserved2 = (u8)0, .flags = (u8)0, .len = (__le16)0, .rid = (__le16)0, ... }, .cq_base = (void *)0x0, .cq_base_dma = (dma_addr_t)0, } ... } } And the coredump obtained via devlink in json format looks like, $ devlink health dump show DEVICE reporter coredump -p -j { "Core Registers": { "segment": 1, "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ] }, "Test Logic Regs": { "segment": 2, "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ] }, "RMII Registers": { "segment": 3, "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ] }, ... "Sem Registers": { "segment": 50, "values": [ 0,0,0,0 ] } } Since I don't have a QLGE device and neither could I find a software simulator, I put some functions into e1000 to get the above result. I notice with the qlge_force_coredump module parameter set, ethtool can also get the coredump. I'm not sure which tool is more suitable for the coredump feature. [1] https://lkml.org/lkml/2020/6/30/19 [2] https://www.kernel.org/doc/html/latest/networking/ethtool-netlink.html Coiby Xu (3): Initialize devlink health dump framework for the dlge driver coredump via devlink health reporter clean up code that dump info to dmesg drivers/staging/qlge/Makefile | 2 +- drivers/staging/qlge/qlge.h | 91 +--- drivers/staging/qlge/qlge_dbg.c | 672 ---------------------------- drivers/staging/qlge/qlge_ethtool.c | 1 - drivers/staging/qlge/qlge_health.c | 156 +++++++ drivers/staging/qlge/qlge_health.h | 2 + drivers/staging/qlge/qlge_main.c | 27 +- 7 files changed, 189 insertions(+), 762 deletions(-) create mode 100644 drivers/staging/qlge/qlge_health.c create mode 100644 drivers/staging/qlge/qlge_health.h -- 2.27.0