> -----Original Message----- > From: Christian König <[email protected]> > Sent: Friday, May 16, 2025 4:36 PM > To: wangtao <[email protected]>; [email protected]; > [email protected]; [email protected]; > [email protected]; [email protected] > Cc: [email protected]; [email protected]; linaro- > [email protected]; [email protected]; > wangbintian(BintianWang) <[email protected]>; yipengxiang > <[email protected]>; liulu 00013167 <[email protected]>; hanfeng > 00012985 <[email protected]> > Subject: Re: [PATCH 2/2] dmabuf/heaps: implement > DMA_BUF_IOCTL_RW_FILE for system_heap > > On 5/16/25 09:40, wangtao wrote: > > > > > >> -----Original Message----- > >> From: Christian König <[email protected]> > >> Sent: Thursday, May 15, 2025 10:26 PM > >> To: wangtao <[email protected]>; [email protected]; > >> [email protected]; [email protected]; > >> [email protected]; [email protected] > >> Cc: [email protected]; [email protected]; > >> linaro- [email protected]; [email protected]; > >> wangbintian(BintianWang) <[email protected]>; yipengxiang > >> <[email protected]>; liulu 00013167 <[email protected]>; > >> hanfeng > >> 00012985 <[email protected]> > >> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement > >> DMA_BUF_IOCTL_RW_FILE for system_heap > >> > >> On 5/15/25 16:03, wangtao wrote: > >>> [wangtao] My Test Configuration (CPU 1GHz, 5-test average): > >>> Allocation: 32x32MB buffer creation > >>> - dmabuf 53ms vs. udmabuf 694ms (10X slower) > >>> - Note: shmem shows excessive allocation time > >> > >> Yeah, that is something already noted by others as well. But that is > >> orthogonal. > >> > >>> > >>> Read 1024MB File: > >>> - dmabuf direct 326ms vs. udmabuf direct 461ms (40% slower) > >>> - Note: pin_user_pages_fast consumes majority CPU cycles > >>> > >>> Key function call timing: See details below. > >> > >> Those aren't valid, you are comparing different functionalities here. > >> > >> Please try using udmabuf with sendfile() as confirmed to be working by > T.J. > > [wangtao] Using buffer IO with dmabuf file read/write requires one > memory copy. > > Direct IO removes this copy to enable zero-copy. The sendfile system > > call reduces memory copies from two (read/write) to one. However, with > > udmabuf, sendfile still keeps at least one copy, failing zero-copy. > > > Then please work on fixing this. [wangtao] What needs fixing? Does sendfile achieve zero-copy? sendfile reduces memory copies (from 2 to 1) for network sockets, but still requires one copy and cannot achieve zero copies.
> > Regards, > Christian. > > > > > > If udmabuf sendfile uses buffer IO (file page cache), read latency > > matches dmabuf buffer read, but allocation time is much longer. > > With Direct IO, the default 16-page pipe size makes it slower than buffer > > IO. > > > > Test data shows: > > udmabuf direct read is much faster than udmabuf sendfile. > > dmabuf direct read outperforms udmabuf direct read by a large margin. > > > > Issue: After udmabuf is mapped via map_dma_buf, apps using memfd or > > udmabuf for Direct IO might cause errors, but there are no safeguards > > to prevent this. > > > > Allocate 32x32MB buffer and read 1024 MB file Test: > > Metric | alloc (ms) | read (ms) | total (ms) > > -----------------------|------------|-----------|----------- > > udmabuf buffer read | 539 | 2017 | 2555 > > udmabuf direct read | 522 | 658 | 1179 > > udmabuf buffer sendfile| 505 | 1040 | 1546 > > udmabuf direct sendfile| 510 | 2269 | 2780 > > dmabuf buffer read | 51 | 1068 | 1118 > > dmabuf direct read | 52 | 297 | 349 > > > > udmabuf sendfile test steps: > > 1. Open data file(1024MB), get back_fd 2. Create memfd(32MB) # Loop > > steps 2-6 3. Allocate udmabuf with memfd 4. Call sendfile(memfd, > > back_fd) 5. Close memfd after sendfile 6. Close udmabuf 7. Close > > back_fd > > > >> > >> Regards, > >> Christian. > >
