nvptx: Don't use PTX '.const', constant state space [PR119573] (was: 'TREE_READONLY' for 'const' array in C vs. C++)

2025-04-03 Thread Thomas Schwinge
Hi!

I have, by the way, filed 
"nvptx: PTX '.const', constant state space" for this topic.

On 2025-04-01T09:32:46+0200, Jakub Jelinek  wrote:
> On Tue, Apr 01, 2025 at 09:19:08AM +0200, Richard Biener via Gcc wrote:
>> On Tue, Apr 1, 2025 at 12:04 AM Thomas Schwinge  
>> wrote:
>> > In Nvidia PTX, "A state space is a storage area with particular
>> > characteristics.  All variables reside in some state space.  [...]".
>> > These include:
>> >
>> > .const  Shared, read-only memory.
>> > .global  Global memory, shared by all threads.
>> >
>> > Implemented via 'TARGET_ENCODE_SECTION_INFO', GCC/nvptx then uses
>> > special-cased instructions for accessing the respective memory regions.
>> >
>> > Now, given a 'const' array (with whatever element type; not interesting
>> > here), like:
>> >
>> > extern int * const arr[];
>> >
>> > ..., for GCC/C compilation, we access this as '.const' memory: GCC/nvptx
>> > 'DATA_AREA_CONST', but for GCC/C++ compilation, we access it as
>> > 'DATA_AREA_GLOBAL', and then fault at run time due to mismatch with the
>> > definition, which actually is '.const' for both C and C++ compilation.
>> >
>> > The difference is, when we get to 'TARGET_ENCODE_SECTION_INFO', that for
>> > C we've got 'TREE_READONLY(decl)', but for C++ we don't.
>> 
>> In C++ there could be runtime initializers for a const qualified
>> object.

Of course, C++...  ;-)

>> I think all
>> you need to do is make sure the logic that places the object in .const
>> vs. .global
>> is consistent with the logic deciding how to access it.

Right, that was the goal, essentially.  However it's indeed not possible,
because of...

> If it is extern, then for C++ you really don't know if it will be
> TREE_READONLY or not, that depends on whether it has runtime initialization
> or not and only the TU which defines it knows that.

... this.

> And deriving something from TYPE_READONLY of the ARRAY_TYPE or even
> from strip_array_types of the type does not work, that reflects whether it
> is const or not, I think it will still be set even if it has mutable members
> and doesn't say anything about runtime initialization.

A 'make check' clearly agreed that this is not the right thing to do.
(Not sure how/if the GCC/AVR back end gets away with doing it this way?

Pushed to trunk branch commit 5deeae29dab2af64e3342daf7a3e424c64ea
"nvptx: Don't use PTX '.const', constant state space [PR119573]", see
attached.  This is also in line with the remark by Alexander, that using
PTX '.const' for arbitrary read-only data is questionable.

In addition to the individual test case progressions listed in the Git
commit log, in the following the complete lists of progressed libstdc++
test cases, as indicated in the Git commit log, where I didn't want to
include these in detail.

| We progress:
| 
| error   : Memory space doesn't match for 
'_ZNSt3tr18__detail12__prime_listE' in 'input file 3 at offset 53085', first 
specified in 'input file 1 at offset 1924'
| nvptx-run: cuLinkAddData failed: device kernel image is invalid 
(CUDA_ERROR_INVALID_SOURCE, 300)
| 
| ... into execution test PASS for a few dozens of libstdc++ test cases.

These are:

PASS: tr1/6_containers/unordered_map/24064.cc  -std=gnu++17 (test for 
excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/24064.cc  -std=gnu++17 
execution test

PASS: tr1/6_containers/unordered_map/capacity/29134-map.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/capacity/29134-map.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/erase/1.cc  -std=gnu++17 (test for 
excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/erase/1.cc  -std=gnu++17 
execution test

PASS: tr1/6_containers/unordered_map/erase/24061-map.cc  -std=gnu++17 (test 
for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/erase/24061-map.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/find/map1.cc  -std=gnu++17 (test for 
excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/find/map1.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/insert/24061-map.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/insert/24061-map.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/insert/array_syntax.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/insert/array_syntax.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/insert/map_range.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/insert/map_range.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/insert/map_single.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/insert/map_single.cc  
-st

RE: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-03 Thread Robert Dubner
I stated that poorly.  After I generate the GENERIC, and I hand the tree
over to the middle end, it is the call to BUILT_IN_EXIT that seems to be
disappearing.

Everything I describe here is occurring with a -O0 build of GCC and
GCOBOL.

> -Original Message-
> From: Robert Dubner 
> Sent: Thursday, April 3, 2025 18:16
> To: 'GCC Mailing List' 
> Cc: Robert Dubner 
> Subject: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> optimized away.
> 
> The COBOL compiler has this routine:
> 
> void
> gg_exit(tree exit_code)
>   {
>   tree the_call =
>   build_call_expr_loc(location_from_lineno(),
>   builtin_decl_explicit (BUILT_IN_EXIT),
>   1,
>   exit_code);
>   gg_append_statement(the_call);
>   }
> 
> I have found that when GCOBOL is used with -O2, -O3, or -Os, the call to
> gg_exit() is optimized away, and the intended exit value is lost, and I
> end up with zero.
> 
> By changing the routine to
> 
> void
> gg_exit(tree exit_code)
>   {
>   tree args[1] = {exit_code};
>   tree function = gg_get_function_address(INT, "exit");
>   tree the_call = build_call_array_loc (location_from_lineno(),
> VOID,
> function,
> 1,
> args);
>   gg_append_statement(the_call);
>   }
> 
> the call is not optimized away, and the generated executable behaves as
> expected.
> 
> How do I prevent the call to gg_exit() from being optimized away?
> 
> Thanks!



RE: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-03 Thread Robert Dubner
I will try again.  First, the explicit call to "exit" is also being
optimized away.

Second, grepping indicates I am the only one in GCC trying to use
BUILT_IN_EXIT.

So, let me rephrase:

What GENERIC do I have to create that will do the equivalent of calling
exit() in a C program?

Thanks.

> -Original Message-
> From: Gcc  On Behalf Of
Robert
> Dubner
> Sent: Thursday, April 3, 2025 18:19
> To: GCC Mailing List 
> Subject: RE: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> optimized away.
> 
> I stated that poorly.  After I generate the GENERIC, and I hand the tree
> over to the middle end, it is the call to BUILT_IN_EXIT that seems to be
> disappearing.
> 
> Everything I describe here is occurring with a -O0 build of GCC and
> GCOBOL.
> 
> > -Original Message-
> > From: Robert Dubner 
> > Sent: Thursday, April 3, 2025 18:16
> > To: 'GCC Mailing List' 
> > Cc: Robert Dubner 
> > Subject: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> > optimized away.
> >
> > The COBOL compiler has this routine:
> >
> > void
> > gg_exit(tree exit_code)
> >   {
> >   tree the_call =
> >   build_call_expr_loc(location_from_lineno(),
> >   builtin_decl_explicit (BUILT_IN_EXIT),
> >   1,
> >   exit_code);
> >   gg_append_statement(the_call);
> >   }
> >
> > I have found that when GCOBOL is used with -O2, -O3, or -Os, the call
to
> > gg_exit() is optimized away, and the intended exit value is lost, and
I
> > end up with zero.
> >
> > By changing the routine to
> >
> > void
> > gg_exit(tree exit_code)
> >   {
> >   tree args[1] = {exit_code};
> >   tree function = gg_get_function_address(INT, "exit");
> >   tree the_call = build_call_array_loc (location_from_lineno(),
> > VOID,
> > function,
> > 1,
> > args);
> >   gg_append_statement(the_call);
> >   }
> >
> > the call is not optimized away, and the generated executable behaves
as
> > expected.
> >
> > How do I prevent the call to gg_exit() from being optimized away?
> >
> > Thanks!



COBOL: A call to builtin_decl_explicit (BUILT_IN_EXIT) is being optimized away

2025-04-03 Thread Robert Dubner
 


Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-03 Thread Jose E. Marchesi via Gcc


Nah, I see ECF_TM_PURE despite the name seems to be doing something very
different than ECF_CONST and ECF_PURE:

  if (flags & ECF_PURE)
DECL_PURE_P (decl) = 1;
  ...
  if ((flags & ECF_TM_PURE) && flag_tm)
apply_tm_attr (decl, get_identifier ("transaction_pure"));

Still, you may try to use the same ECF attributes than gcc/builtins.def..

> Perhaps it is because you are using ECF_TM_PURE when defining the
> built-in in cobol1.cc:
>
>   #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_COLD_LIST 
> (ECF_TM_PURE|ECF_NORETURN|ECF_NOTHROW|ECF_LEAF|ECF_COLD)
>
>   [...]
>   gfc_define_builtin ("__builtin_exit", ftype, BUILT_IN_EXIT,
>   "exit", ATTR_TMPURE_NORETURN_NOTHROW_LEAF_COLD_LIST);
>
> In gcc/builtins.def:
>
>   DEF_LIB_BUILTIN(BUILT_IN_EXIT, "exit", BT_FN_VOID_INT, 
> ATTR_NORETURN_NOTHROW_LIST)
>
> So you want ECF_NORETURN and ECF_NOTHROW.
>
>> I stated that poorly.  After I generate the GENERIC, and I hand the tree
>> over to the middle end, it is the call to BUILT_IN_EXIT that seems to be
>> disappearing.
>>
>> Everything I describe here is occurring with a -O0 build of GCC and
>> GCOBOL.
>>
>>> -Original Message-
>>> From: Robert Dubner 
>>> Sent: Thursday, April 3, 2025 18:16
>>> To: 'GCC Mailing List' 
>>> Cc: Robert Dubner 
>>> Subject: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
>>> optimized away.
>>> 
>>> The COBOL compiler has this routine:
>>> 
>>> void
>>> gg_exit(tree exit_code)
>>>   {
>>>   tree the_call =
>>>   build_call_expr_loc(location_from_lineno(),
>>>   builtin_decl_explicit (BUILT_IN_EXIT),
>>>   1,
>>>   exit_code);
>>>   gg_append_statement(the_call);
>>>   }
>>> 
>>> I have found that when GCOBOL is used with -O2, -O3, or -Os, the call to
>>> gg_exit() is optimized away, and the intended exit value is lost, and I
>>> end up with zero.
>>> 
>>> By changing the routine to
>>> 
>>> void
>>> gg_exit(tree exit_code)
>>>   {
>>>   tree args[1] = {exit_code};
>>>   tree function = gg_get_function_address(INT, "exit");
>>>   tree the_call = build_call_array_loc (location_from_lineno(),
>>> VOID,
>>> function,
>>> 1,
>>> args);
>>>   gg_append_statement(the_call);
>>>   }
>>> 
>>> the call is not optimized away, and the generated executable behaves as
>>> expected.
>>> 
>>> How do I prevent the call to gg_exit() from being optimized away?
>>> 
>>> Thanks!


Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-03 Thread Jose E. Marchesi via Gcc


Perhaps it is because you are using ECF_TM_PURE when defining the
built-in in cobol1.cc:

  #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_COLD_LIST 
(ECF_TM_PURE|ECF_NORETURN|ECF_NOTHROW|ECF_LEAF|ECF_COLD)

  [...]
  gfc_define_builtin ("__builtin_exit", ftype, BUILT_IN_EXIT,
  "exit", ATTR_TMPURE_NORETURN_NOTHROW_LEAF_COLD_LIST);

In gcc/builtins.def:

  DEF_LIB_BUILTIN(BUILT_IN_EXIT, "exit", BT_FN_VOID_INT, 
ATTR_NORETURN_NOTHROW_LIST)

So you want ECF_NORETURN and ECF_NOTHROW.

> I stated that poorly.  After I generate the GENERIC, and I hand the tree
> over to the middle end, it is the call to BUILT_IN_EXIT that seems to be
> disappearing.
>
> Everything I describe here is occurring with a -O0 build of GCC and
> GCOBOL.
>
>> -Original Message-
>> From: Robert Dubner 
>> Sent: Thursday, April 3, 2025 18:16
>> To: 'GCC Mailing List' 
>> Cc: Robert Dubner 
>> Subject: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
>> optimized away.
>> 
>> The COBOL compiler has this routine:
>> 
>> void
>> gg_exit(tree exit_code)
>>   {
>>   tree the_call =
>>   build_call_expr_loc(location_from_lineno(),
>>   builtin_decl_explicit (BUILT_IN_EXIT),
>>   1,
>>   exit_code);
>>   gg_append_statement(the_call);
>>   }
>> 
>> I have found that when GCOBOL is used with -O2, -O3, or -Os, the call to
>> gg_exit() is optimized away, and the intended exit value is lost, and I
>> end up with zero.
>> 
>> By changing the routine to
>> 
>> void
>> gg_exit(tree exit_code)
>>   {
>>   tree args[1] = {exit_code};
>>   tree function = gg_get_function_address(INT, "exit");
>>   tree the_call = build_call_array_loc (location_from_lineno(),
>> VOID,
>> function,
>> 1,
>> args);
>>   gg_append_statement(the_call);
>>   }
>> 
>> the call is not optimized away, and the generated executable behaves as
>> expected.
>> 
>> How do I prevent the call to gg_exit() from being optimized away?
>> 
>> Thanks!


RE: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-03 Thread Robert Dubner
Jose,

I appreciate the attempt.  But using the same 

#define ATTR_NORETURN_NOTHROW_LIST (ECF_NORETURN|ECF_NOTHROW)

as found in gcc/builtins.def make no difference.

Thanks, though.

Bob D.

> -Original Message-
> From: Jose E. Marchesi 
> Sent: Thursday, April 3, 2025 19:27
> To: Robert Dubner 
> Cc: GCC Mailing List 
> Subject: Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> optimized away.
> 
> 
> Nah, I see ECF_TM_PURE despite the name seems to be doing something very
> different than ECF_CONST and ECF_PURE:
> 
>   if (flags & ECF_PURE)
> DECL_PURE_P (decl) = 1;
>   ...
>   if ((flags & ECF_TM_PURE) && flag_tm)
> apply_tm_attr (decl, get_identifier ("transaction_pure"));
> 
> Still, you may try to use the same ECF attributes than
gcc/builtins.def..
> 
> > Perhaps it is because you are using ECF_TM_PURE when defining the
> > built-in in cobol1.cc:
> >
> >   #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_COLD_LIST
> (ECF_TM_PURE|ECF_NORETURN|ECF_NOTHROW|ECF_LEAF|ECF_COLD)
> >
> >   [...]
> >   gfc_define_builtin ("__builtin_exit", ftype, BUILT_IN_EXIT,
> >   "exit", ATTR_TMPURE_NORETURN_NOTHROW_LEAF_COLD_LIST);
> >
> > In gcc/builtins.def:
> >
> >   DEF_LIB_BUILTIN(BUILT_IN_EXIT, "exit", BT_FN_VOID_INT,
> ATTR_NORETURN_NOTHROW_LIST)
> >
> > So you want ECF_NORETURN and ECF_NOTHROW.
> >
> >> I stated that poorly.  After I generate the GENERIC, and I hand the
> tree
> >> over to the middle end, it is the call to BUILT_IN_EXIT that seems to
> be
> >> disappearing.
> >>
> >> Everything I describe here is occurring with a -O0 build of GCC and
> >> GCOBOL.
> >>
> >>> -Original Message-
> >>> From: Robert Dubner 
> >>> Sent: Thursday, April 3, 2025 18:16
> >>> To: 'GCC Mailing List' 
> >>> Cc: Robert Dubner 
> >>> Subject: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> >>> optimized away.
> >>>
> >>> The COBOL compiler has this routine:
> >>>
> >>> void
> >>> gg_exit(tree exit_code)
> >>>   {
> >>>   tree the_call =
> >>>   build_call_expr_loc(location_from_lineno(),
> >>>   builtin_decl_explicit (BUILT_IN_EXIT),
> >>>   1,
> >>>   exit_code);
> >>>   gg_append_statement(the_call);
> >>>   }
> >>>
> >>> I have found that when GCOBOL is used with -O2, -O3, or -Os, the
call
> to
> >>> gg_exit() is optimized away, and the intended exit value is lost,
and
> I
> >>> end up with zero.
> >>>
> >>> By changing the routine to
> >>>
> >>> void
> >>> gg_exit(tree exit_code)
> >>>   {
> >>>   tree args[1] = {exit_code};
> >>>   tree function = gg_get_function_address(INT, "exit");
> >>>   tree the_call = build_call_array_loc (location_from_lineno(),
> >>> VOID,
> >>> function,
> >>> 1,
> >>> args);
> >>>   gg_append_statement(the_call);
> >>>   }
> >>>
> >>> the call is not optimized away, and the generated executable behaves
> as
> >>> expected.
> >>>
> >>> How do I prevent the call to gg_exit() from being optimized away?
> >>>
> >>> Thanks!


Re: Memory access in GIMPLE

2025-04-03 Thread Krister Walfridsson via Gcc

On Thu, 3 Apr 2025, Richard Biener wrote:


On Thu, Apr 3, 2025 at 2:23 AM Krister Walfridsson via Gcc
 wrote:


I have more questions about GIMPLE memory semantics for smtgcc.

As before, each section starts with a description of the semantics I've
implemented (or plan to implement), followed by concrete questions if
relevant. Let me know if the described semantics are incorrect or
incomplete.


Accessing memory

Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the
lhs and/or rhs is a memory reference expression (such as MEM_REF). When
both lhs and rhs access memory, one of the following must hold --
otherwise the access is UB:
  1. There is no overlap between lhs and rhs
  2. lhs and rhs represent the same address

A memory access is also UB in the following cases:
  * Any accessed byte is outside valid memory
  * The pointer violates the alignment requirements
  * The pointer provenance doesn't match the object
  * The type is incorrect from a TBAA perspective
  * It's a store to constant memory


correct.

Note that GIMPLE_CALL and GIMPLE_ASM can also access memory.


smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this
mail. Provenance has its own issues, which I'll come back to in a separate
mail.


Checking memory access is within bounds
---
A memory access may be represented by a chain of memory reference
expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example,
accessing a structure:

   struct s {
 int x, y;
   };

as:

   int foo (struct s * p)
   {
 int _3;

  :
 _3 = p_1(D)->x;
 return _3;
   }

involves a MEM_REF for the whole object and a COMPONENT_REF to select the
field. Conceptually, we load the entire structure and then pick out the
element -- so all bytes of the structure must be in valid memory.

We could also do the access as:

   int foo (struct s * p)
   {
 int * q;
 int _3;

  :
 q_2 = &p_1(D)->x;
 _3 = *q_2;
 return _3;
   }

This calculates the address of the element, and then reads it as an
integer, so only the four bytes of x must be in valid memory.

In other words, the compiler is not allowed to optimize:
   q_2 = &p_1(D)->x;
   _3 = *q_2;
to
   _3 = p_1(D)->x;


Correct.  The reason that p_1(D)->x is considered accessing the whole
object is because of TBAA, so with -fno-strict-aliasing there is no UB
when the whole object isn't accessible (but the subsetted piece is).


I'm still a bit confused... Assume we're using -fno-strict-aliasing and p 
points to a 4-byte buffer in the example above. Then:

  _3 = p_1(D)->x;
is valid. So that means my paragraph starting with "In other words..." 
must be incorrect? That is, the compiler is allowed to optimize:

  q_2 = &p_1(D)->x;
  _3 = *q_2;
to
  _3 = p_1(D)->x;
since it's equally valid. But this optimization would be invalid for 
-fstrict-aliasing?


And a memory access like:
  _4 = p_1(D)->y;
would be invalid (for both -fno-strict-aliasing and -fstrict-aliasing)? 
That is, the subsetted piece must be accessible?


   /Krister


Re: [GSoC] Tooling for running BPF GCC tests on a live kernel

2025-04-03 Thread Piyush Raj via Gcc
On Tue, Apr 1, 2025 at 8:07 PM Jose E. Marchesi
 wrote:
>
> Hello Piyush.
Hello Jose,

> Sounds like a quite good background.
Thank you!

> Have you built GCC from sources?
Yes, I have. I built GCC while working on LFS and recently rebuilt it,
running the test suite while going through the "Before you apply"
section on the GSoC page.
> > I found "Tooling for Running BPF GCC Tests on a Live Kernel"
> > interesting and intend to apply for it in GSoC 2025.
>
> Thank you for your interest in working in this project in particular.
> It is an important task, and would greatly help the maintenance of the
> BPF port in the compiler.
>
> The BPF verifier is very picky and rejects a great deal of compiled
> valid C code.  This basically means that any legit change in the
> compiler, such as a new optimization or even a bug fix, can potentially
> make it to generate non-verifiable code.  Thus having a way to detect
> such regressions is very important.
>
> > From my research
> > on the current BPF development landscape:
> > - Only the kernel has tests against a live kernel [6], while Clang and
> > GCC perform compile-time tests.
>
> Correct.
>
> > - BPF CI [7] mainly tests kernel patches for regressions against the
> > latest GCC and LLVM snapshots, so it does not help much in catching
> > regressions in GCC itself.
>
> Also correct.  As explained above, compiler changes can make the
> verifier unhappy.  But so can new selftests added to the kernel that
> produce unveifiable code.
>
> > - Another project, little-vm-helper [8] by Cilium, also uses a
> > VM-based approach for BPF testing.
>
> I wasn't aware of that project.  It may be a good reference, or maybe
> not, depending.
>
> > If there are other BPF testing projects that I should look into,
> > please let me know.
>
> At the past LSFMM conference someone suggested us to look at virtme-ng.
Thanks! I’ll check it out.

> Daniel Xu, a BPF kernel developer (who I added in CC) also recommended
> we look at https://github.com/danobi/vmtest, which he wrote.

Yes, I came across Daniel’s work during my research [1][2]. I was
planning to use vmtest for running the VM like in BPF CI.

> > I have a few questions and would appreciate any guidance from the
> > community
> >
> > 1. Are there specific BPF features or functionalities that cannot be
> > effectively tested in a VM environment due to virtualization?
>
> I don't think so.
>
> > 2. Should the new BPF test suite focus only on execution tests for
> > programs, or can we also include kernel selftests?
>
> I would focus on compiler oriented tests, exercising source
> constructions that have proven to generate unverifiable code.  In
> contrast the kernel selftests focus on exercising all the supported BPF
> features in the kernel.

> The main goal of the testing shall be detecting verification
> regressions.

Understood! Are there any existing collections of such source files
that have caused verification failures? Are they tracked in Bugzilla
beyond these issues [3]?

> > 3. The GDB simulator currently lacks full support for BPF features.
> > How do GCC developers typically debug the BPF programs generated by
> > GCC?
>
> We just look at the generated code when there are verification failures
> and try to determine, from the kernel verifier's output, what led to
> generating such code.
>
> > Would attaching a GDB debugger to the virtual machine[9] serve as an
> > effective debugging strategy?
>
> I don't think that is currently a possibility.  AFAIK you can connect
> GDB to a running kernel in say qemu, and debug the kernel code.  But I
> dont think you can debug BPF-jitted code.  It is an intriguing
> possibility, but I think out of scope of this particular GSOC project.
> > Additionally, would automating this process provide significant value
> > to developers?
>
> I don't understand.  Automating what exactly?

I meant automating the process of opening a debugger with breakpoints
when the kernel starts executing eBPF code. But as you mentioned
earlier, this may not be feasible or would require more work than fits
within the current GSoC scope.

I also found some ongoing work related to BPF debugging in the kernel [4].

Thank you,
Piyush

[1] http://oldvger.kernel.org/bpfconf2024_material/BPF-CI.pdf
[2] https://www.youtube.com/watch?v=NT-325hgXjY
[3] 
https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&cf_known_to_fail_type=allwords&cf_known_to_work_type=allwords&order=Importance&query_format=advanced&short_desc=bpf&short_desc_type=allwordssubstr
[4] 
https://lore.kernel.org/lkml/am6pr03mb50804a5bf211e94a5df8f66699...@am6pr03mb5080.eurprd03.prod.outlook.com/


Re: [GSoC] Tooling for running BPF GCC tests on a live kernel

2025-04-03 Thread Jose E. Marchesi via Gcc


> On Tue, Apr 1, 2025 at 8:07 PM Jose E. Marchesi
>  wrote:
>>
>> Hello Piyush.
> Hello Jose,
>
>> Sounds like a quite good background.
> Thank you!
>
>> Have you built GCC from sources?
> Yes, I have. I built GCC while working on LFS and recently rebuilt it,
> running the test suite while going through the "Before you apply"
> section on the GSoC page.
>> > I found "Tooling for Running BPF GCC Tests on a Live Kernel"
>> > interesting and intend to apply for it in GSoC 2025.
>>
>> Thank you for your interest in working in this project in particular.
>> It is an important task, and would greatly help the maintenance of the
>> BPF port in the compiler.
>>
>> The BPF verifier is very picky and rejects a great deal of compiled
>> valid C code.  This basically means that any legit change in the
>> compiler, such as a new optimization or even a bug fix, can potentially
>> make it to generate non-verifiable code.  Thus having a way to detect
>> such regressions is very important.
>>
>> > From my research
>> > on the current BPF development landscape:
>> > - Only the kernel has tests against a live kernel [6], while Clang and
>> > GCC perform compile-time tests.
>>
>> Correct.
>>
>> > - BPF CI [7] mainly tests kernel patches for regressions against the
>> > latest GCC and LLVM snapshots, so it does not help much in catching
>> > regressions in GCC itself.
>>
>> Also correct.  As explained above, compiler changes can make the
>> verifier unhappy.  But so can new selftests added to the kernel that
>> produce unveifiable code.
>>
>> > - Another project, little-vm-helper [8] by Cilium, also uses a
>> > VM-based approach for BPF testing.
>>
>> I wasn't aware of that project.  It may be a good reference, or maybe
>> not, depending.
>>
>> > If there are other BPF testing projects that I should look into,
>> > please let me know.
>>
>> At the past LSFMM conference someone suggested us to look at virtme-ng.
> Thanks! I’ll check it out.
>
>> Daniel Xu, a BPF kernel developer (who I added in CC) also recommended
>> we look at https://github.com/danobi/vmtest, which he wrote.
>
> Yes, I came across Daniel’s work during my research [1][2]. I was
> planning to use vmtest for running the VM like in BPF CI.
>
>> > I have a few questions and would appreciate any guidance from the
>> > community
>> >
>> > 1. Are there specific BPF features or functionalities that cannot be
>> > effectively tested in a VM environment due to virtualization?
>>
>> I don't think so.
>>
>> > 2. Should the new BPF test suite focus only on execution tests for
>> > programs, or can we also include kernel selftests?
>>
>> I would focus on compiler oriented tests, exercising source
>> constructions that have proven to generate unverifiable code.  In
>> contrast the kernel selftests focus on exercising all the supported BPF
>> features in the kernel.
>
>> The main goal of the testing shall be detecting verification
>> regressions.
>
> Understood! Are there any existing collections of such source files
> that have caused verification failures? Are they tracked in Bugzilla
> beyond these issues [3]?

We get reports occassionally and normally we open bugzillas for them,
but not always.

We can compile a list of recent fixes for particular verification
failure so a set of initial tests can be written during the GSOC
project.

Going forward, once the run-time testsuite is in place, each fix shall
be accompanied with the corresponding new test(s).

>> > 3. The GDB simulator currently lacks full support for BPF features.
>> > How do GCC developers typically debug the BPF programs generated by
>> > GCC?
>>
>> We just look at the generated code when there are verification failures
>> and try to determine, from the kernel verifier's output, what led to
>> generating such code.
>>
>> > Would attaching a GDB debugger to the virtual machine[9] serve as an
>> > effective debugging strategy?
>>
>> I don't think that is currently a possibility.  AFAIK you can connect
>> GDB to a running kernel in say qemu, and debug the kernel code.  But I
>> dont think you can debug BPF-jitted code.  It is an intriguing
>> possibility, but I think out of scope of this particular GSOC project.
>> > Additionally, would automating this process provide significant value
>> > to developers?
>>
>> I don't understand.  Automating what exactly?
>
> I meant automating the process of opening a debugger with breakpoints
> when the kernel starts executing eBPF code. But as you mentioned
> earlier, this may not be feasible or would require more work than fits
> within the current GSoC scope.

BPF programs are hooked into the kernel via several mechanisms: kprobes,
etc.  There may be a way to locate these points using GDB.

However, note that the BPF code gets JITted into whatever native native
instructions of the host architecture.  Having a debugger relating the
JITted native instructions to the corresponding source locations would
require the JITter to somehow translate the debug info as well I
gues