[clang] [llvm] [BPF] Make -mcpu=v3 as the default (PR #107008)
yuvald-sweet-security wrote: Hey, I didn't see this change on the [LLVM 20.1.0 Release Notes](https://releases.llvm.org/20.1.0/docs/ReleaseNotes.html) - it would be nice if you could add those in the future as a heads up. Anyway, I got here after tracing a regression introduced in Clang 20 with sub-par eBPF codegen which results in dramatically larger kernel verifier states (and eventually causes the verifier to give up with sufficiently large functions). This culprit does indeed seem to be BPF v3 - my guess is that branching according to 32-bit registers provides less refutability for the verifier which ends up carrying more possible states as a result. FYI https://github.com/llvm/llvm-project/pull/107008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Make -mcpu=v3 as the default (PR #107008)
yuvald-sweet-security wrote: > Thanks @yuvald-sweet-security I am not sure whether I can add default v1->v3 > in llvm20 or not. But let me check anyway. > > Also, if the verification failure can be easily reproduced, could you submit > the test case, we can do some analysis to see why and maybe verifier could be > improved due to this. Thanks! The specific case I was researching is in proprietary code, but I can see eBPF verifier/codegen degradations caused by moving from v1 to v3 in many public repos, e.g. try to compile Tracee (https://github.com/aquasecurity/tracee): ``` $ cd tracee/pkg/ebpf/c $ clang-20 -D__TARGET_ARCH_x86 -I. -I/usr/x86_64-linux-gnu/include -I/lib/modules/`uname -r`/build/include -target bpf -O2 -g -c ./tracee.bpf.c -mcpu=v1 -o /tmp/test-v1.o $ clang-20 -D__TARGET_ARCH_x86 -I. -I/usr/x86_64-linux-gnu/include -I/lib/modules/`uname -r`/build/include -target bpf -O2 -g -c ./tracee.bpf.c -mcpu=v3 -o /tmp/test-v3.o $ sudo /root/veristat /tmp/test-v1.o Processing 'test-v1.o'... File Program Verdict Duration (us) Insns States Program size Jited size - -- --- - -- -- -- test-v1.o cgroup_bpf_run_filter_skb success 20165446 368 13966388 test-v1.o cgroup_mkdir_signal success 124 281 17 199 933 test-v1.o cgroup_rmdir_signal success 116 276 17 194 915 test-v1.o cgroup_skb_egress success 36426425 251 3616 19650 test-v1.o cgroup_skb_ingress success 50266425 251 3616 19650 test-v1.o empty_kprobesuccess 24 2 0 2 23 test-v1.o kernel_write_magic_entersuccess 56 56 355 255 test-v1.o kernel_write_magic_return success 12523 295261020 4573 22096 test-v1.o lkm_seeker_kset_tailsuccess 264372 437356 22484 14711 68504 test-v1.o lkm_seeker_mod_tree_tailsuccess 92 111 9 206 671 test-v1.o lkm_seeker_modtree_loop success 168335 198161 12446 13947 65075 test-v1.o lkm_seeker_new_mod_only_tailsuccess 80150 158441159 13138 52462 test-v1.o lkm_seeker_proc_tailfailure 50660 1893843693 13806 0 test-v1.o process_execute_failed_tail success 16065182 368 2572 12010 test-v1.o sched_process_exec_event_submit_tailsuccess 26326923 471 2850 13651 test-v1.o sched_process_exec_signal success 3947 11860 796 5305 24294 test-v1.o sched_process_exit_signal success 280 378 23 2801275 test-v1.o sched_process_fork_signal success 5271383 75 10804563 test-v1.o send_binsuccess 13486804 452 3464 18131 test-v1.o send_bin_tp success 11376804 452 3464 18131 test-v1.o sys_dup_exit_tail success 24972912 188 2345 11493 test-v1.o sys_enter_init success 1619 610 43 5742849 test-v1.o sys_enter_submitsuccess 82566961 484 3774 19483 test-v1.o sys_exit_init success 745 507 34 4772349 test-v1.o sys_exit_submit success 27424081 276 2370 12202 test-v1.o syscall__accept4success 14162314 140 16798480 test-v1.o syscall__execve_enter failure 14962086 155 4033 0 test-v1.o syscall__execve_exitfailure 7682086 155 4034 0 test-v1.o syscall__execveat_enter failure 7552087 155 4088 0 test-v1.o syscall__execveat_exit failure 7542087 155
[clang] [llvm] [BPF] Make -mcpu=v3 as the default (PR #107008)
yuvald-sweet-security wrote: @4ast perhaps you know who in LLVM can help with adding a "Changes to the BPF Backend" section to the release notes, similar to how other backends have their own sections? https://github.com/llvm/llvm-project/pull/107008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Make -mcpu=v3 as the default (PR #107008)
yuvald-sweet-security wrote: > @yuvald-sweet-security Could you share which kernel you are used for above > testing? I ran this on my host machine which is Windows with WSL, kernel `6.6.75.1-microsoft-standard-WSL2`. However, I can see similar regressions on pretty much every testing VM that I have, for example here's Ubuntu 22.04 with kernel `6.8.0-1021-azure`: ``` $ sudo ~/veristat /tmp/test-v1.o --filter=trace_ret_vfs_writev_tail Processing 'test-v1.o'... File ProgramVerdict Duration (us) Insns States Peak states - - --- - - -- --- test-v1.o trace_ret_vfs_writev_tail success 23429 250151859 1728 - - --- - - -- --- Done. Processed 1 files, 0 programs. Skipped 1 files, 158 programs. $ sudo ~/veristat /tmp/test-v3.o --filter=trace_ret_vfs_writev_tail Processing 'test-v3.o'... File ProgramVerdict Duration (us) Insns States Peak states - - --- - - -- --- test-v3.o trace_ret_vfs_writev_tail failure 98260 803506184 2738 - - --- - - -- --- Done. Processed 1 files, 0 programs. Skipped 1 files, 158 programs. ``` https://github.com/llvm/llvm-project/pull/107008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Make -mcpu=v3 as the default (PR #107008)
yuvald-sweet-security wrote: @yonghong-song thank you for taking the time to help with this issue and providing your suggestions, it’s greatly appreciated. I am also glad to hear that asm barriers are no longer necessary, as they caused quite some trouble for me in the past. However, I've encountered a few issues while attempting to apply this as a solution to the problem with moving to mpcu v3. * while you are correct in that removing the barriers fixes the verifier failure in `trace_ret_vfs_writev_tail`, the original issue I've pointed out - the increased size of kernel verifier states and instructions when using v3 - still remains: ``` $ sudo /root/veristat /tmp/test-v1.o --filter=trace_ret_vfs_writev_tail Processing 'test-v1.o'... File ProgramVerdict Duration (us) Insns States Program size Jited size - - --- - - -- -- test-v1.o trace_ret_vfs_writev_tail success 23158 252921874 7163 38486 - - --- - - -- -- Done. Processed 1 files, 0 programs. Skipped 1 files, 158 programs. $ sudo /root/veristat /tmp/test-v3-nobarrier.o --filter=trace_ret_vfs_writev_tail Processing 'test-v3-nobarrier.o'... File ProgramVerdict Duration (us) Insns States Program size Jited size --- - --- - -- -- -- test-v3-nobarrier.o trace_ret_vfs_writev_tail success 69161 104971 7618 6999 37490 --- - --- - -- -- -- Done. Processed 1 files, 0 programs. Skipped 1 files, 158 programs. ``` As you can see the v3 codegen causes the verifier to consume about 4 times the number of instructions when verifying it, and while it's not such a big issue for this particular function, it can be an issue for larger functions which are already close to the verifier's 1 million instructions limit as this can cause them to go over it. * Many verifier failures still remain after removing the barriers. e.g. the function `vfs_writev_magic_return` still fails even after I've removed all asm volatile/barrier vars (you can use [patch.txt](https://github.com/user-attachments/files/19407900/patch.txt) which removes all of them). Also, in some of the code that I have removing the barrier vars on v3 codegen actually introduces more verifier failures (compared to v3 with barrier vars), I'll see if I can make a minimal example of this. thank you again for your thoughtful advice and support https://github.com/llvm/llvm-project/pull/107008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Make -mcpu=v3 as the default (PR #107008)
yuvald-sweet-security wrote: > > Hey, > > I didn't see this change on the [LLVM 20.1.0 Release > > Notes](https://releases.llvm.org/20.1.0/docs/ReleaseNotes.html) - it would > > be nice if you could add those in the future as a heads up. > > Anyway, I got here after tracing a regression in eBPF codegen introduced in > > Clang 20 which results in dramatically larger kernel verifier states (and > > eventually causes the verifier to give up with sufficiently large > > functions). The culprit does indeed seem to be BPF v3 - my guess is that > > branching according to 32-bit registers provides less refutability for the > > verifier which ends up carrying more possible states as a result, but I > > didn't dive too deep into this. > > FYI > > The llvm patch link to add release note in llvm20: #131691 Thank you very much :) https://github.com/llvm/llvm-project/pull/107008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits