On 2026/5/28 21:20, Yuyang Huang wrote:
> On Thu, May 28, 2026 at 1:43 PM Leon Hwang <[email protected]> wrote:
>>
>> On 25/5/26 15:21, Yuyang Huang wrote:
>> [...]
>>>
>>> Feel free to let us know your thoughts.
>>>
>> I believe this is a user space issue instead of a kernel bug.
>>
>> I tried to use mmap() memory as uattr that got -EFAULT instead of crash.
>>
>> [................] /* mmap() memory */
>> ^ tail 40B as uattr
>> ^ 56B offset for copy_to_user()
>>
>> Thanks,
>> Leon
>>
>
> Thanks for testing this!
>
> There are some discussion in the original thread:
> https://lore.kernel.org/all/CANP3RGfZTXM_u=e_atoompzxutoqj02nomkccr-ybzbom2s...@mail.gmail.com/
> as follows, which might answer your question
>
It seems you haven't convinced Alexei in that thread.
>>>> If the uattr indeed has less than needed space, then for
>>>> if (copy_to_user(&uattr->query.revision, &revision,
>>>> sizeof(revision)))
>>>> return -EFAULT;
>>>> the kernel will return -EFAULT to user space.
>>>>
>>>> Maybe userspace didn't handle the return code properly and causing
>>>> user space corruption and segfaults. This shouldn't be a kernel issue.
>>>> Maybe I missed something?
>>>
>>> That's not how that works at all.
>>>
>>> copy_to_user() will only fail and thus EFAULT will only be returned if
>>> the memory area copy_to_user() is trying to copy into isn't
>>> owned/mapped by the user (or perhaps is read-only protected, not sure
>>> about this last one).
>>>
>>> Because memory is mapped in (at least) 4K pages, the memory after a
>>> user buffer is almost always still valid memory. It might be unused,
>>> or it might be something on the stack - like a return address, or it
>>> might be on the heap - metadata tracking, or a different memory
>>> allocation perhaps entirely.
>
> You might hit the same case as maze@ mentioned in the thread.
>
> To trigger -EFAULT, you likely positioned `uattr` at the very end of a
> mapped page immediately followed by a protected page
>
> Could you share the test program you created so we can verify?
>
Attached below.
> Please check the test program I shared earlier in the thread (where
> uattr is stored on the stack); the BPF syscall returned 0, but stack
> corruption occurred.
>
To avoid such stack corruption, you should reserve enough space for the
query, e.g., by extracting union bpf_attr from kernel BTF vmlinux.
Thanks,
Leon
> If you think my test program contains a bug, feel free to let me know.
>
> Thanks,
>
> Yuyang
---
Assisted-by: Copilot:gemini-3-1-pro-preview
// SPDX-License-Identifier: GPL-2.0
#include <uapi/linux/if_link.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/bpf.h>
#include <net/if.h>
#include <test_progs.h>
#define loopback 1
#include "test_tc_link.skel.h"
#include "tc_helpers.h"
#define SHORT_QUERY_SIZE offsetofend(union bpf_attr,
query.prog_attach_flags)
/*
* test_tail_uattr_out_of_mmap:
*
* Places uattr at the very tail of a 1-page anonymous mmap so that the
* mandatory fields (target_ifindex..prog_attach_flags, 40 bytes) fit
inside the page
* but query.revision (+56) falls in the unmapped page immediately after.
*
* mmap layout (1 page, e.g. 4096 bytes):
*
* [0 ............ page_size - SHORT_QUERY_SIZE - 1] unused
* [page_size - 40 ................. page_size - 1 ] uattr:
target_ifindex..prog_cnt
* [page_size .................................. ] UNMAPPED
* ^
* uattr + 56 (revision) lands here
*/
static void test_tail_uattr_out_of_mmap(void)
{
long page_size = sysconf(_SC_PAGE_SIZE);
LIBBPF_OPTS(bpf_prog_attach_opts, opta);
LIBBPF_OPTS(bpf_prog_detach_opts, optd);
struct test_tc_link *skel;
union bpf_attr *attr;
void *mem, *tail;
int fd, err;
skel = test_tc_link__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel_load"))
return;
fd = bpf_program__fd(skel->progs.tc1);
err = bpf_prog_attach_opts(fd, loopback, BPF_TCX_INGRESS, &opta);
if (!ASSERT_OK(err, "prog_attach"))
goto cleanup_skel;
/*
* Allocate 2 contiguous pages then immediately unmap the second one.
* This guarantees the page following the first is unmapped, regardless
* of what the runtime placed there beforehand.
* Place uattr at the tail: last SHORT_QUERY_SIZE (40) bytes of page 1.
* uattr + 56 (revision) therefore lands 16 bytes past the page end,
* in the unmapped region.
*/
mem = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (!ASSERT_OK_PTR(mem, "mmap"))
goto detach;
munmap((char *)mem + page_size, page_size);
tail = (char *)mem + page_size - SHORT_QUERY_SIZE;
memset(tail, 0, SHORT_QUERY_SIZE);
attr = (union bpf_attr *)tail;
attr->query.target_ifindex = loopback;
attr->query.attach_type = BPF_TCX_INGRESS;
err = syscall(__NR_bpf, BPF_PROG_QUERY, tail, SHORT_QUERY_SIZE);
ASSERT_OK(err, "syscall");
ASSERT_OK(errno, "errno");
ASSERT_EQ(attr->query.prog_cnt, 1, "prog_cnt_written");
munmap(mem, page_size); /* second page already unmapped above */
detach:
bpf_prog_detach_opts(fd, loopback, BPF_TCX_INGRESS, &optd);
cleanup_skel:
test_tc_link__destroy(skel);
}
void test_mmap_uattr_corruption(void)
{
if (test__start_subtest("tail_uattr_out_of_mmap"))
test_tail_uattr_out_of_mmap();
}