On 2019/06/27 14:18, Manish Chopra wrote: > > -----Original Message----- > > From: Benjamin Poirier <bpoir...@suse.com> > > Sent: Monday, June 17, 2019 1:19 PM > > To: Manish Chopra <mani...@marvell.com>; GR-Linux-NIC-Dev <GR-Linux- > > nic-...@marvell.com>; netdev@vger.kernel.org > > Subject: [EXT] [PATCH net-next 16/16] qlge: Refill empty buffer queues from > > wq > > > > External Email > > > > ---------------------------------------------------------------------- > > When operating at mtu 9000, qlge does order-1 allocations for rx buffers in > > atomic context. This is especially unreliable when free memory is low or > > fragmented. Add an approach similar to commit 3161e453e496 ("virtio: net > > refill on out-of-memory") to qlge so that the device doesn't lock up if > > there > > are allocation failures. > > [...] > > + > > +static void ql_update_buffer_queues(struct rx_ring *rx_ring, gfp_t gfp, > > + unsigned long delay) > > +{ > > + bool sbq_fail, lbq_fail; > > + > > + sbq_fail = !!qlge_refill_bq(&rx_ring->sbq, gfp); > > + lbq_fail = !!qlge_refill_bq(&rx_ring->lbq, gfp); > > + > > + /* Minimum number of buffers needed to be able to receive at least > > one > > + * frame of any format: > > + * sbq: 1 for header + 1 for data > > + * lbq: mtu 9000 / lb size > > + * Below this, the queue might stall. > > + */ > > + if ((sbq_fail && QLGE_BQ_HW_OWNED(&rx_ring->sbq) < 2) || > > + (lbq_fail && QLGE_BQ_HW_OWNED(&rx_ring->lbq) < > > + DIV_ROUND_UP(9000, LARGE_BUFFER_MAX_SIZE))) > > + /* Allocations can take a long time in certain cases (ex. > > + * reclaim). Therefore, use a workqueue for long-running > > + * work items. > > + */ > > + queue_delayed_work_on(smp_processor_id(), > > system_long_wq, > > + &rx_ring->refill_work, delay); > > } > > > > This is probably going to mess up when at the interface load time > (qlge_open()) allocation failure occurs, in such cases we don't really want > to re-try allocations > using refill_work but rather simply fail the interface load.
Why would you want to turn a recoverable failure into a fatal failure? In case of allocation failure at ndo_open time, allocations are retried later from a workqueue. Meanwhile, the device can use the available rx buffers (if any could be allocated at all). > Just to make sure here in such cases it shouldn't lead to kernel panic etc. > while completing qlge_open() and > leaving refill_work executing in background. Or probably handle such > allocation failures from the napi context and schedule refill_work from there. > I've just tested allocation failures at open time and didn't find problems; with mtu 9000, using bcc, for example: tools/inject.py -P 0.5 -c 100 alloc_page "should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) (order == 1) => qlge_refill_bq()" What exact scenario do you have in mind that's going to lead to problems? Please try it out and describe it precisely.