On Thu, 18 Apr 2024 22:19:33 +0200, Jesper Dangaard Brouer <[email protected]>
wrote:
>
>
> On 17/04/2024 10.20, Xuan Zhuo wrote:
> > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <[email protected]> wrote:
> >> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <[email protected]>
> >> wrote:
> >>>
> >>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <[email protected]>
> >>> wrote:
> >>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <[email protected]>
> >>>>> wrote:
> >>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <[email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <[email protected]>
> >>>>>>> wrote:
> >>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo
> >>>>>>>> <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang
> >>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo
> >>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang
> >>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo
> >>>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private
> >>>>>>>>>>>>> variable.
> >>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support
> >>>>>>>>>>>>> premapped mode. This requires additional space to store the dma
> >>>>>>>>>>>>> addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no
> >>>>>>>>>>>>> suitable
> >>>>>>>>>>>>> variable for storing the DMA addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> struct { /* Page cache and anonymous
> >>>>>>>>>>>>> pages */
> >>>>>>>>>>>>> /**
> >>>>>>>>>>>>> * @lru: Pageout list, eg. active_list
> >>>>>>>>>>>>> protected by
> >>>>>>>>>>>>> * lruvec->lru_lock. Sometimes used
> >>>>>>>>>>>>> as a generic list
> >>>>>>>>>>>>> * by the page owner.
> >>>>>>>>>>>>> */
> >>>>>>>>>>>>> union {
> >>>>>>>>>>>>> struct list_head lru;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> /* Or, for the Unevictable
> >>>>>>>>>>>>> "LRU list" slot */
> >>>>>>>>>>>>> struct {
> >>>>>>>>>>>>> /* Always even, to
> >>>>>>>>>>>>> negate PageTail */
> >>>>>>>>>>>>> void *__filler;
> >>>>>>>>>>>>> /* Count page's or
> >>>>>>>>>>>>> folio's mlocks */
> >>>>>>>>>>>>> unsigned int
> >>>>>>>>>>>>> mlock_count;
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> /* Or, free page */
> >>>>>>>>>>>>> struct list_head buddy_list;
> >>>>>>>>>>>>> struct list_head pcp_list;
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>> /* See page-flags.h for
> >>>>>>>>>>>>> PAGE_MAPPING_FLAGS */
> >>>>>>>>>>>>> struct address_space *mapping;
> >>>>>>>>>>>>> union {
> >>>>>>>>>>>>> pgoff_t index; /* Our
> >>>>>>>>>>>>> offset within mapping. */
> >>>>>>>>>>>>> unsigned long share; /*
> >>>>>>>>>>>>> share count for fsdax */
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>> /**
> >>>>>>>>>>>>> * @private: Mapping-private opaque
> >>>>>>>>>>>>> data.
> >>>>>>>>>>>>> * Usually used for buffer_heads if
> >>>>>>>>>>>>> PagePrivate.
> >>>>>>>>>>>>> * Used for swp_entry_t if
> >>>>>>>>>>>>> PageSwapCache.
> >>>>>>>>>>>>> * Indicates order in the buddy system
> >>>>>>>>>>>>> if PageBuddy.
> >>>>>>>>>>>>> */
> >>>>>>>>>>>>> unsigned long private;
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But within the page pool struct, we have a variable called
> >>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr.
> >>>>>>>>>>>>> And that struct is used by netstack. That works to our
> >>>>>>>>>>>>> advantage.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> struct { /* page_pool used by netstack
> >>>>>>>>>>>>> */
> >>>>>>>>>>>>> /**
> >>>>>>>>>>>>> * @pp_magic: magic value to avoid
> >>>>>>>>>>>>> recycling non
> >>>>>>>>>>>>> * page_pool allocated pages.
> >>>>>>>>>>>>> */
> >>>>>>>>>>>>> unsigned long pp_magic;
> >>>>>>>>>>>>> struct page_pool *pp;
> >>>>>>>>>>>>> unsigned long _pp_mapping_pad;
> >>>>>>>>>>>>> unsigned long dma_addr;
> >>>>>>>>>>>>> atomic_long_t pp_ref_count;
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On the other side, we should use variables from the same
> >>>>>>>>>>>>> sub-struct.
> >>>>>>>>>>>>> So this patch replaces the "private" with "pp".
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <[email protected]>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>
> >>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply
> >>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need
> >>>>>>>>>>>> to
> >>>>>>>>>>>> bother the dma stuffs.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The page pool needs to do the dma by the DMA APIs.
> >>>>>>>>>>> So we can not use the page pool directly.
> >>>>>>>>>>
> >>>>>>>>>> I found this:
> >>>>>>>>>>
> >>>>>>>>>> define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the
> >>>>>>>>>> DMA
> >>>>>>>>>> * map/unmap
> >>>>>>>>>>
> >>>>>>>>>> It seems to work here?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have studied the page pool mechanism and believe that we cannot
> >>>>>>>>> use it
> >>>>>>>>> directly. We can make the page pool to bypass the DMA operations.
> >>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated
> >>>>>>>>> from the page
> >>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate
> >>>>>>>>> the DMA address
> >>>>>>>>> to the page.
> >>>>>>>>>
> >>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want
> >>>>>>>>> to return
> >>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we
> >>>>>>>>> can omit the
> >>>>>>>>> unmapping and remapping steps.
> >>>>>>>>>
> >>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it
> >>>>>>>>> disconnects
> >>>>>>>>> and releases the pages. When the pool hits its capacity, pages are
> >>>>>>>>> relinquished
> >>>>>>>>> without a chance for unmapping.
>
> Could Jakub's memory provider for PP help your use-case?
>
> See: [1]
> https://lore.kernel.org/all/[email protected]/
> Or: [2]
> https://lore.kernel.org/netdev/[email protected]/T/
It can not. That make the pp can get page by the callbacks.
Here we talk about the map/unmap.
The virtio-net has the different DMA APIs.
dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void
*ptr, size_t size,
enum dma_data_direction dir,
unsigned long attrs);
void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t
addr,
size_t size, enum
dma_data_direction dir,
unsigned long attrs);
dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct
page *page,
size_t offset, size_t size,
enum dma_data_direction dir,
unsigned long attrs);
void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t
addr,
size_t size, enum
dma_data_direction dir,
unsigned long attrs);
int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);
bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq,
dma_addr_t addr,
unsigned long offset,
size_t size,
enum dma_data_direction
dir);
void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq,
dma_addr_t addr,
unsigned long offset,
size_t size,
enum dma_data_direction
dir);
Thanks.
>
>
> [...]
> >>>>>>
> >>>>>> Adding Jesper for some comments.
> >>>>>>
> >>>>>>>
> >>>>>>> Back to this patch set, I think we should keep the virtio-net to
> >>>>>>> manage
> >>>>>>> the pages.
> >>>>>>>
>
> For context the patch:
> [3]
> https://lore.kernel.org/all/[email protected]/
>
> >>>>>>> What do you think?
> >>>>>>
> >>>>>> I might be wrong, but I think if we need to either
> >>>>>>
> >>>>>> 1) seek a way to manage the pages by yourself but not touching page
> >>>>>> pool metadata (or Jesper is fine with this)
> >>>>>
> >>>>> Do you mean working with page pool or not?
> >>>>>
> >>>>
> >>>> I meant if Jesper is fine with reusing page pool metadata like this
> >>>> patch.
> >>>>
> >>>>> If we manage the pages by self(no page pool), we do not care the
> >>>>> metadata is for
> >>>>> page pool or not. We just use the space of pages like the "private".
> >>>>
> >>>> That's also fine.
> >>>>
>
> I'm not sure it is "fine" to, explicitly choosing not to use page pool,
> and then (ab)use `struct page` member (pp) that intended for page_pool
> for other stuff. (In this case create a linked list of pages).
>
> +#define page_chain_next(p) ((struct page *)((p)->pp))
> +#define page_chain_add(p, n) ((p)->pp = (void *)n)
>
> I'm not sure that I (as PP maintainer) can make this call actually, as I
> think this area belong with the MM "page" maintainers (Cc MM-list +
> people) to judge.
>
> Just invention new ways to use struct page fields without adding your
> use-case to struct page, will make it harder for MM people to maintain
> (e.g. make future change).
>
> --Jesper
>
>