Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size

Leon Hwang Thu, 28 May 2026 07:43:41 -0700

On 2026/5/28 21:20, Yuyang Huang wrote:
> On Thu, May 28, 2026 at 1:43 PM Leon Hwang <[email protected]> wrote:
>>
>> On 25/5/26 15:21, Yuyang Huang wrote:
>> [...]
>>>
>>> Feel free to let us know your thoughts.
>>>
>> I believe this is a user space issue instead of a kernel bug.
>>
>> I tried to use mmap() memory as uattr that got -EFAULT instead of crash.
>>
>> [................] /* mmap() memory */
>>              ^ tail 40B as uattr
>>                     ^ 56B offset for copy_to_user()
>>
>> Thanks,
>> Leon
>>
> 
> Thanks for testing this!
> 
> There are some discussion in the original thread:
> https://lore.kernel.org/all/CANP3RGfZTXM_u=e_atoompzxutoqj02nomkccr-ybzbom2s...@mail.gmail.com/
> as follows, which might answer your question
>


It seems you haven't convinced Alexei in that thread.

>>>> If the uattr indeed has less than needed space, then for
>>>>          if (copy_to_user(&uattr->query.revision, &revision, 
>>>> sizeof(revision)))
>>>>                  return -EFAULT;
>>>> the kernel will return -EFAULT to user space.
>>>>
>>>> Maybe userspace didn't handle the return code properly and causing
>>>> user space corruption and segfaults. This shouldn't be a kernel issue.
>>>> Maybe I missed something?
>>>
>>> That's not how that works at all.
>>>
>>> copy_to_user() will only fail and thus EFAULT will only be returned if
>>> the memory area copy_to_user() is trying to copy into isn't
>>> owned/mapped by the user (or perhaps is read-only protected, not sure
>>> about this last one).
>>>
>>> Because memory is mapped in (at least) 4K pages, the memory after a
>>> user buffer is almost always still valid memory.  It might be unused,
>>> or it might be something on the stack - like a return address, or it
>>> might be on the heap - metadata tracking, or a different memory
>>> allocation perhaps entirely.
> 
> You might hit the same case as maze@ mentioned in the thread.
> 
> To trigger -EFAULT, you likely positioned `uattr` at the very end of a
> mapped page immediately followed by a protected page
> 
> Could you share the test program you created so we can verify?
> 

Attached below.

> Please check the test program I shared earlier in the thread (where
> uattr is stored on the stack); the BPF syscall returned 0, but stack
> corruption occurred.
> 

To avoid such stack corruption, you should reserve enough space for the
query, e.g., by extracting union bpf_attr from kernel BTF vmlinux.

Thanks,
Leon

> If you think my test program contains a bug, feel free to let me know.
> 
> Thanks,
> 
> Yuyang

---

Assisted-by: Copilot:gemini-3-1-pro-preview

// SPDX-License-Identifier: GPL-2.0
#include <uapi/linux/if_link.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/bpf.h>
#include <net/if.h>
#include <test_progs.h>

#define loopback 1

#include "test_tc_link.skel.h"
#include "tc_helpers.h"

#define SHORT_QUERY_SIZE        offsetofend(union bpf_attr,
query.prog_attach_flags)

/*
 * test_tail_uattr_out_of_mmap:
 *
 * Places uattr at the very tail of a 1-page anonymous mmap so that the
 * mandatory fields (target_ifindex..prog_attach_flags, 40 bytes) fit
inside the page
 * but query.revision (+56) falls in the unmapped page immediately after.
 *
 *   mmap layout (1 page, e.g. 4096 bytes):
 *
 *     [0 ............ page_size - SHORT_QUERY_SIZE - 1]  unused
 *     [page_size - 40 ................. page_size - 1 ]  uattr:
target_ifindex..prog_cnt
 *     [page_size ..................................   ]  UNMAPPED
 *               ^
 *               uattr + 56 (revision) lands here
 */
static void test_tail_uattr_out_of_mmap(void)
{
        long page_size = sysconf(_SC_PAGE_SIZE);
        LIBBPF_OPTS(bpf_prog_attach_opts, opta);
        LIBBPF_OPTS(bpf_prog_detach_opts, optd);
        struct test_tc_link *skel;
        union bpf_attr *attr;
        void *mem, *tail;
        int fd, err;

        skel = test_tc_link__open_and_load();
        if (!ASSERT_OK_PTR(skel, "skel_load"))
                return;

        fd = bpf_program__fd(skel->progs.tc1);

        err = bpf_prog_attach_opts(fd, loopback, BPF_TCX_INGRESS, &opta);
        if (!ASSERT_OK(err, "prog_attach"))
                goto cleanup_skel;

        /*
         * Allocate 2 contiguous pages then immediately unmap the second one.
         * This guarantees the page following the first is unmapped, regardless
         * of what the runtime placed there beforehand.
         * Place uattr at the tail: last SHORT_QUERY_SIZE (40) bytes of page 1.
         * uattr + 56 (revision) therefore lands 16 bytes past the page end,
         * in the unmapped region.
         */
        mem = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        if (!ASSERT_OK_PTR(mem, "mmap"))
                goto detach;
        munmap((char *)mem + page_size, page_size);

        tail = (char *)mem + page_size - SHORT_QUERY_SIZE;
        memset(tail, 0, SHORT_QUERY_SIZE);

        attr = (union bpf_attr *)tail;
        attr->query.target_ifindex = loopback;
        attr->query.attach_type = BPF_TCX_INGRESS;

        err = syscall(__NR_bpf, BPF_PROG_QUERY, tail, SHORT_QUERY_SIZE);
        ASSERT_OK(err, "syscall");
        ASSERT_OK(errno, "errno");
        ASSERT_EQ(attr->query.prog_cnt, 1, "prog_cnt_written");

        munmap(mem, page_size); /* second page already unmapped above */
detach:
        bpf_prog_detach_opts(fd, loopback, BPF_TCX_INGRESS, &optd);
cleanup_skel:
        test_tc_link__destroy(skel);
}

void test_mmap_uattr_corruption(void)
{
        if (test__start_subtest("tail_uattr_out_of_mmap"))
                test_tail_uattr_out_of_mmap();
}

Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size

Reply via email to