I'm using Debian 9(stretch edition) kernel 4.9., hp dl385 g7 server with 32 cpu cores. NIC queues are tied to processor cores. Server is shaping traffic (iproute2 and htb discipline + skbinfo + ipset + ifb) and filtering some rules by iptables.
At that moment, when traffic goes up about 1gbit/s cpu is very high loaded. Perf tool tells me that kernel module native_queued_spin_lock_slowpath loading cpu about 40%. After several hours of searching, I found that if I remove the htb discipline from ifb0, the high load goes down. Well, I think that problem with classify and shaping by htb. Who knows how to solve?