On 2021/4/6 15:31, Michal Kubecek wrote: > On Tue, Apr 06, 2021 at 10:46:29AM +0800, Yunsheng Lin wrote: >> On 2021/4/6 9:49, Cong Wang wrote: >>> On Sat, Apr 3, 2021 at 5:23 AM Jiri Kosina <ji...@kernel.org> wrote: >>>> >>>> I am still planning to have Yunsheng Lin's (CCing) fix [1] tested in the >>>> coming days. If it works, then we can consider proceeding with it, >>>> otherwise I am all for reverting the whole NOLOCK stuff. >>>> >>>> [1] >>>> https://lore.kernel.org/linux-can/1616641991-14847-1-git-send-email-linyunsh...@huawei.com/T/#u >>> >>> I personally prefer to just revert that bit, as it brings more troubles >>> than gains. Even with Yunsheng's patch, there are still some issues. >>> Essentially, I think the core qdisc scheduling code is not ready for >>> lockless, just look at those NOLOCK checks in sch_generic.c. :-/ >> >> I am also awared of the NOLOCK checks too:), and I am willing to >> take care of it if that is possible. >> >> As the number of cores in a system is increasing, it is the trend >> to become lockless, right? Even there is only one cpu involved, the >> spinlock taking and releasing takes about 30ns on our arm64 system >> when CONFIG_PREEMPT_VOLUNTARY is enable(ip forwarding testing). > > I agree with the benefits but currently the situation is that we have > a race condition affecting the default qdisc which is being hit in > production and can cause serious trouble which is made worse by commit > 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host > queues") preventing the retransmits of the stuck packet being sent. > > Perhaps rather than patching over current implementation which requires > more and more complicated hacks to work around the fact that we cannot > make the "queue is empty" check and leaving the critical section atomic, > it would make sense to reimplement it in a way which would allow us > making it atomic.
Yes, reimplementing that is also an option. But what if reimplemention also has the same problem if we do not find the root cause of this problem? I think it better to find the root cause of it first? > > Michal > > > . >