Hi David,

(Apologies for the earlier HTML email — resending in plain text.)

Thanks for the review. I'm addressing all your comments in v4.

> What is the expected return value to user space if we run into this case?

If the first page is not resident and PROCESS_VM_NOWAIT is set, it returns -1
with errno EFAULT.

Note that Sashiko suggested to return errno EAGAIN to distinguish
invalid addresses from unpaged memory. I haven't done that to avoid
adding more code tweaking errnos; let me know if you would like me to do
that.
https://sashiko.dev/#/patchset/20260428122826.339550-1-alban.crequy%40gmail.com

It can also return a short read, meaning it returns the number of bytes
successfully transferred from pages prior to the fault.

This is the same partial-read behavior as a regular
process_vm_readv when hitting an unmapped page — NOWAIT just changes
when the fault fails (immediately instead of blocking).

I've documented this in the updated commit message in v4 (I will send it
shortly). I also added two selftests that verify partial reads across a
resident and a non-resident page (with both single iovec and two iovecs).

Interestingly, while writing these tests, I found that the current
process_vm_readv(2) man page incorrectly claims that partial transfers
apply at iovec-element granularity. In practice, the kernel returns
partial reads at page granularity within a single remote iovec element.
I've sent a separate fix for that to linux-man@:
https://lore.kernel.org/linux-man/[email protected]/T/#u

> And in the same context: Will you send a man page update? :)

I've included a suggested man page update in the commit message of v4 patch
1/2, ready for the man-pages maintainer to pick up. I'll also send a separate
patch to linux-man@ once the kernel patches are merged and available in a new
Linux release.

> We try to sort this alphabetically.

Fixed in v4 — moved the include/uapi line to the correct position.

> Thinking out loud: the c file is called "process_vm_access.c", should
> we name the header like that as well?

Good idea — renamed to include/uapi/linux/process_vm_access.h in v4.

> Should we use BIT here?

As noted in the v2/v3 changelog: BIT() is defined in vdso/bits.h which is not
exported to userspace. UAPI headers using BIT() (like tcp.h) work in-kernel but
break for userspace programs that include these headers directly. In practice,
BIT() in tcp.h is only used by TCP_ACCECN_* flags, so merely including tcp.h in
userspace programs won't break them unless they use those flags. Those flags
were moved from kernel-only headers to UAPI headers in commit 4fa4ac5e5848
("tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info"), which appeared
in Linux v7.0-rc1 (not yet in a release). So userspace programs don't yet use
those macros.

So I think (1UL << N) is the correct choice for UAPI.

> This could return -EBADF or -ESRCH. We should document both in the
> man page. (or decide to always return -ESRCH, dunno)

Agreed with Christian's reply — keeping the actual errno from
pidfd_get_task() is useful information for userspace. Both EBADF
and ESRCH are documented in the suggested man page update text in v4.

Thanks,
Alban

Reply via email to