On 5/22/26 12:19 PM, John Groves wrote:
> From: John Groves <[email protected]>
>
> __fsdev_dax_direct_access() returned the number of available pages based
> on cached_size (the total size across all ranges). For multi-range
> devices with physical gaps between ranges, this over-reports the number
> of physically contiguous pages available from the returned kaddr/pfn.
> Callers trust this return value to mean contiguous pages, so accessing
> beyond the current range boundary would hit unmapped or unrelated memory.
>
> Fix by finding the range that contains the translated physical address
> and clamping the return to the remaining pages within that range.
>
> Also remove the now-unused cached_size field from struct dev_dax, since
> it was only consumed by the old return calculation.
>
> Fixes: 099c81a1f0ab3 ("dax: Add dax_operations for use by fs-dax on fsdev
> dax")
> Signed-off-by: John Groves <[email protected]>
I ran this through Claude and this is what it came back with and it looks
reasonable to me:
The claimed bug does not manifest in the current tree. This is a
refactor + dead-field removal dressed as a bugfix. Either the justification is
wrong or there's a missing companion change.
The claim doesn't hold up
Commit message:
▎ __fsdev_dax_direct_access() returned the number of available pages based on
▎ cached_size (the total size across all ranges). For multi-range devices with
▎ physical gaps between ranges, this over-reports the number of physically
▎ contiguous pages available from the returned kaddr/pfn.
Walk through it. __fsdev_dax_direct_access does:
phys = dax_pgoff_to_phys(dev_dax, pgoff, size);
if (phys == -1)
return -EFAULT;
...
return PHYS_PFN(min(size, dev_dax->cached_size - offset));
Now look at what dax_pgoff_to_phys (drivers/dax/bus.c:1433) enforces:
phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
if (phys + size - 1 <= range->end)
return phys;
break; /* falls through to return -1 */
It returns valid phys only when the entire requested size fits inside a single
range. If the request straddles a gap, it returns -1 and the caller bails
with -EFAULT before ever reaching the clamp.
So in the only path where the clamp runs, we already know size <= range->end -
phys + 1, which means cached_size - offset >= size, which means min(size,
cached_size - offset) == size. Pre-patch returns nr_pages. Post-patch returns
nr_pages. Same answer.
I worked it through with concrete numbers on a 2-range device with a physical
gap (range[0]=4 pages at 0x1000_0000, range[1]=4 pages at 0x2000_0000):
- pgoff=2, nr_pages=2: both return 2. ✓
- pgoff=3, nr_pages=2 (straddles gap): both return -EFAULT via the early bail.
✓
- pgoff=4, nr_pages=2: both return 2. ✓
I cannot construct a case where pre-patch over-reports.
What's actually changed
1. A no-op clamp loop replaces a constant calculation. Same return value in
every reachable case.
2. The trailing return -EFAULT; is dead code. If dax_pgoff_to_phys returned
valid phys, then by construction phys lies inside some range in
dev_dax->ranges[], so the loop will match.
3. O(N) walk replaces O(1) arithmetic. Trivial for small nr_range, but it's a
regression in the wrong direction for a hot path.
4. cached_size field is removed. Good cleanup — I grepped, no other consumers.
DJ
> ---
> drivers/dax/dax-private.h | 2 --
> drivers/dax/fsdev.c | 23 ++++++++++++++---------
> 2 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index 81e4af49e39c1..7a3727d76a68a 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -70,7 +70,6 @@ struct dev_dax_range {
> * @region: parent region
> * @dax_dev: core dax functionality
> * @virt_addr: kva from memremap; used by fsdev_dax
> - * @cached_size: size of daxdev cached by fsdev_dax
> * @align: alignment of this instance
> * @target_node: effective numa node if dev_dax memory range is onlined
> * @dyn_id: is this a dynamic or statically created instance
> @@ -86,7 +85,6 @@ struct dev_dax {
> struct dax_region *region;
> struct dax_device *dax_dev;
> void *virt_addr;
> - u64 cached_size;
> unsigned int align;
> int target_node;
> bool dyn_id;
> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> index aac0130ab2833..f74fd1bb7f4c5 100644
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
> @@ -50,8 +50,8 @@ static long __fsdev_dax_direct_access(struct dax_device
> *dax_dev, pgoff_t pgoff,
> {
> struct dev_dax *dev_dax = dax_get_private(dax_dev);
> size_t size = nr_pages << PAGE_SHIFT;
> - size_t offset = pgoff << PAGE_SHIFT;
> phys_addr_t phys;
> + int i;
>
> phys = dax_pgoff_to_phys(dev_dax, pgoff, size);
> if (phys == -1) {
> @@ -67,10 +67,20 @@ static long __fsdev_dax_direct_access(struct dax_device
> *dax_dev, pgoff_t pgoff,
> *pfn = PHYS_PFN(phys);
>
> /*
> - * Use cached_size which was computed at probe time. The size cannot
> - * change while the driver is bound (resize returns -EBUSY).
> + * Return the number of physically contiguous pages available from
> + * phys, clamped to the current range. For multi-range devices the
> + * ranges may not be physically contiguous, so we cannot report
> + * pages beyond the end of the range that contains phys.
> */
> - return PHYS_PFN(min(size, dev_dax->cached_size - offset));
> + for (i = 0; i < dev_dax->nr_range; i++) {
> + struct range *range = &dev_dax->ranges[i].range;
> +
> + if (phys >= range->start && phys <= range->end)
> + return PHYS_PFN(min(size,
> + (size_t)(range->end - phys + 1)));
> + }
> +
> + return -EFAULT;
> }
>
> static int fsdev_dax_zero_page_range(struct dax_device *dax_dev,
> @@ -272,11 +282,6 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
> }
> }
>
> - /* Cache size now; it cannot change while driver is bound */
> - dev_dax->cached_size = 0;
> - for (i = 0; i < dev_dax->nr_range; i++)
> - dev_dax->cached_size += range_len(&dev_dax->ranges[i].range);
> -
> /*
> * Use MEMORY_DEVICE_FS_DAX without setting vmemmap_shift, leaving
> * folios at order-0. Unlike device.c (MEMORY_DEVICE_GENERIC), this