Hi!

Jakub, would you please provide guidance?


Elsewhere, I wrote:

|| I'm working on implementing (some) C++ standard library support for code
|| offloading in GCC, and ran into the following issue: per
|| 
<https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables>,
|| "variables declared at block scope with the specifier 'static' [...] have
|| static [...] storage duration but are initialized the first time control
|| passes through their declaration".
||
|| To implement "initialized the first time [...]" in a multi-threaded
|| context, compilers typically use a guard variable and locking call to a
|| compiler-internal C++ support library function ('__cxa_guard_acquire').
|| (..., which in GCC, you may disable with '-fno-threadsafe-statics', for
|| that matter.)
||
|| In GCC, all this appears to work fine for multi-threaded host-side
|| (non-offladed) OpenMP 'parallel', for example.  However, I'm now curious
|| about the OpenMP 'target' offloading case; minimal example:
||
||     struct S
||     {
||       S() { }
||       ~S() { }
||     };
||
||     static void f()
||     {
||       // 
<https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables>
||       static S s;
||     }
||
||     int main()
||     {
||     #pragma omp target
||       {
||         f();
||       }
||     }
||
|| (Everything other than 'main' is meant to be implicitly OpenMP
|| 'declare target'ed here.)

On 2023-11-20T19:13:23+0100, Jakub Jelinek <ja...@redhat.com> wrote:
> On Mon, Nov 20, 2023 at 06:43:47PM +0100, Thomas Schwinge wrote:
>> Current GCC fails:
>>
>>     error: variable ‘_ZGVZL1fvE1s’ has been referenced in offloaded code but 
>> hasn’t been marked to be included in the offloaded code
>>
>> ... with:
>>
>>     $ c++filt _ZGVZL1fvE1s
>>     guard variable for f()::s
>>
>> That may "simply" be a bug to fix in GCC.

The conclusion was: yes.

>> (Something like implicitly
>> creating respective guard variables on the device, I suppose.)
>
> Yeah, I believe we should in the omp_discover_* sub-pass handle with
> a help of a langhook automatically mark the guard variables (possibly
> iff the guarded variable is marked?),

Looking at 'gcc/omp-offload.cc:omp_discover_implicit_declare_target' left
me confused how that would be the code that marks up 'static' variables
as implicit 'omp declare target'.  Working through a simple POD example
(say, 's%static S s%static int i') it turns out, indeed that's not where
that is happending, but instead 'gcc/gimplify.cc:gimplify_bind_expr' is
the place:

    [...]
      for (t = BIND_EXPR_VARS (bind_expr); t ; t = DECL_CHAIN (t))
    [...]
                  /* Static locals inside of target construct or offloaded
                     routines need to be "omp declare target".  */
                  if (TREE_STATIC (t))
                    for (; ctx; ctx = ctx->outer_context)
                      if ((ctx->region_type & ORT_TARGET) != 0)
                        {
                          if (!lookup_attribute ("omp declare target",
                                                 DECL_ATTRIBUTES (t)))
                            {
                              tree id = get_identifier ("omp declare target");
                              DECL_ATTRIBUTES (t)
                                = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES 
(t));
                              varpool_node *node = varpool_node::get (t);
                              if (node)
                                {
                                  node->offloadable = 1;
                                  if (ENABLE_OFFLOADING && !DECL_EXTERNAL (t))
                                    {
                                      g->have_offload = true;
                                      if (!in_lto_p)
                                        vec_safe_push (offload_vars, t);
                                    }
                                }
                            }
                          break;
    [...]

You (Jakub) added that in
commit 211b7533bff68e5dd72e7d75249f470101759d6d (Subversion r272322)
"Make static vars inside of target regions or declare target routines 
implicitly declare target to (PR middle-end/90779)".

Now, the problem why that existing code doesn't trigger for C++ guard
variables is that those are not in 'BIND_EXPR_VARS', due to C++ front end
use of 'pushdecl_top_level_and_finish'.  If I change the C++ front end as
follows (WIP; not reviewed in detail):

    --- gcc/cp/decl2.cc
    +++ gcc/cp/decl2.cc
    @@ -3576,5 +3576,6 @@ get_guard (tree decl)
           DECL_IGNORED_P (guard) = 1;
           TREE_USED (guard) = 1;
    -      pushdecl_top_level_and_finish (guard, NULL_TREE);
    +      pushdecl (guard);
    +      cp_finish_decl (guard, NULL_TREE, false, NULL_TREE, 0);
         }
       return guard;

..., then we do get the expected behavior:

    --- a-r.cc.006t.gimple        2023-12-07 13:27:36.254963406 +0100
    +++ a-r.cc.006t.gimple        2023-12-07 14:10:39.352107107 +0100
    @@ -5,6 +5,7 @@
       bool retval.1;
       bool D.2966;
       static struct S s1;
    +  static long long int _ZGVZL2f1vE2s1;

       gimple_call <__atomic_load_1, _1, &_ZGVZL2f1vE2s1, 2>
       gimple_assign <eq_expr, retval.0, _1, 0, NULL>

..., and offloading compilation works down to the expected next issue:

    ld: error: undefined symbol: __cxa_guard_acquire
    >>> referenced by /tmp/ccAVyZpc.o:(f1())
    [...]
    collect2: error: ld returned 1 exit status
    gcn mkoffload: fatal error: 
build-gcc/gcc/x86_64-pc-linux-gnu-accel-amdgcn-amdhsa-gcc returned 1 exit status
    [...]

However: 'pushdecl_top_level_and_finish' has been used there "forever",
and I currently have no clue at all whether changing that into 'pushdecl'
would be acceptable, what effects that'd have elsewhere.

That said...  Couldn't we indeed move this gimplification-level code re
'Static locals [...] need to be "omp declare target"' into
'gcc/omp-offload.cc:omp_discover_implicit_declare_target'?

First thought: 'gcc/omp-offload.cc:omp_discover_declare_target_tgt_fn_r'
extended so that for each 'VAR_DECL' that is 'TREE_STATIC', we mark it
'omp declare target'.  (That'll need some additional conditions, but you
get the idea.)  This way, we're not restricted to only 'static's in the
current bind/block, but would also catch top-level things like C++ guard
variables (without requiring any C++ front end changes).

I suppose I'd first exclude all 'DECL_ARTIFICIAL' ones, and we then may
gradually enable those, as we add test cases and handling as necessary:

> or e.g. rtti info (_ZTS*, _ZTI*)
> and eventually figure out what we should do about virtual tables (_ZTV*).
> The last case is most complicated, as it contains function pointers, and we
> need to figure out if we mark all methods, or say replace some pointers in
> the virtual table with NULLs or something that errors or terminates if it
> isn't marked.

All those I plan to defer, for now.

> And sure, __cxa_guard_* would need to be implemented in the offloading
> libsupc++.a or libstdc++.a.

Until proper libstdc++/libsupc++ support emerges (I'm working on it...),
my idea was to add a temporary 'libgomp/config/accel/*.c' implementation
(based on 'libstdc++-v3/libsupc++/guard.cc').


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to