On 09.06.22 13:40, Jakub Jelinek via Gcc-patches wrote:
On Wed, Jun 08, 2022 at 05:56:02AM +0200, Tobias Burnus wrote:
+ && lookup_attribute ("omp declare target",
+ DECL_ATTRIBUTES (current_function_decl)))
+ omp_requires_mask
+ = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
I must admit it is unclear what the
"must appear lexically before any device constructs or device routines."
restriction actually means for device routines.
Is that lexically before definition of such device routines, or even their
declarations?
I have similar issues – also for Fortran (and C++) module use. Hence, I
had filled https://github.com/OpenMP/spec/issues/3240 (not publicly
accessible); I added your issues to the list.
The above patch snippet is I believe for function definitions that were
arked as declare target before the definition somehow (another decl for
it merged with the new one or in between the begin/end). And is true
even for device_type (host), to rule that out you'd need to check for
"omp declare target host" attribute not being present.
I'm not against the above snippet perhaps adjusted for device_type(host),
but IMHO we want clarifications from omp-lang
How to proceed for now? And does 'omp_is_initial_device()' on the host a
device function or not? It can be hard-coded to 'true' ...
[...]
target update is also a device construct and the above snippet hasn't been
added for it, ditto for interop which we don't implement yet.
But, my preference would be instead of adding these snippets to
c_parser_omp_target_{data,enter_data,exit_data,update} etc. move it from
c_parser_omp_target to c_parser_omp_all_clauses:
if (flag_openmp
&& (mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE)) != 0)
omp_requires_mask
= (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
(somewhere at the start of the function), because the definition of device
constructs is exactly like that:
"device construct An OpenMP construct that accepts the device clause."
Makes sense.
[C++ cases]
Ditto.
For Fortran, is the above mostly not needed because requires need to be in
the specification part and device constructs are executable and appear in
the part after it? Do we allow requires in BLOCK's specification part?
We don't allow it in BLOCK – but there are issues related to USE-ing
modules, cf. OpenMP issue.
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3644,6 +3644,9 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool
want_value)
+ if (fndecl && flag_openmp && omp_runtime_api_call (fndecl, true))
+ omp_requires_mask
+ = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
I'm sure device APIs were discussed, but I must be blind and I can't find it
in either 5.0, 5.1 or 5.2. All I see is device constructs or device routines
in those places where I'd also look for device related OpenMP runtime
library APIs. Though, if some routine calls omp_get_num_devices (),
certainly the library at that point needs to know
reverse_offload/unified_shared_memory/etc. requires because that determines
how many devices it has. So, what have I missed (aka on which place in the
standard the above snippet is based on)?
It is based on your review comments from last year ("Something I miss in
the patch is that for the device API calls") plus what requires some
device initialization. But otherwise, I also did not see it.
In terms of parsing, it makes no difference – contrary to
'unified_shared_memory', where the parser could decide not to add
implicit mapping, the compiler part is not affected by API calls.
I cannot really make up my mind whether it should be required in this
case or not. Maybe, it is not needed.
+ const char *requires_section = ".gnu.gomp_requires";
+ tree maskvar = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+ get_identifier (".gomp_requires_mask"),
+ unsigned_type_node);
+ SET_DECL_ALIGN (maskvar, TYPE_ALIGN (unsigned_type_node));
Don't we want also DECL_USER_ALIGN (maskvar) = 1; so that
we never try to increase its alignment?
Probably yes.
Is it an allocated section, or should it be better non-allocated and then
dealt with by mkoffload?
Shouldn't the vars in that section be const, so that it is a read-only
section?
Is unsigned_type_node what we want (say wouldn't be just unsigned_char_node
be enough, currently we just need 3 bits).
Probably -that would be 8 bits, leaving 5 spare. I have not checked what
Andrew et al. do with the pinned-memory support by -f<some-flag>, but
that will likely use only 1 to 3 bits, if any.
Also, wonder if for HAVE_GAS_SHF_MERGE && flag_merge_constants
we shouldn't try to make that section mergeable. If it goes away during
linking and is replaced by something, then it doesn't matter, but otherwise,
as we don't record which TU had what flags, all we care about is that
there were some TUs which used device construct/routines (and device APIs?)
and used bitmask 7, other TUs that used bitmask 3 and others that used
bitmask 4.
(maybe – I am not sure about this, either.)
@@ -442,6 +463,14 @@ omp_finish_file (void)
}
else
{
+#ifndef ACCEL_COMPILER
+ if (flag_openmp
+ && (omp_requires_mask & OMP_REQUIRES_TARGET_USED)
+ && (omp_requires_mask & (OMP_REQUIRES_UNIFIED_ADDRESS
+ | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+ | OMP_REQUIRES_REVERSE_OFFLOAD)))
+ sorry ("OpenMP device offloading is not supported for this target");
+#endif
I don't understand this snippet. Without named sections on the host,
I bet we simply don't support offloading at all,
the record_offload_symbol target hook is only non-trivially defined
for nvptx and nvptx isn't typical host for OpenMP offloading,
because we don't remember it anywhere.
I thought that would address your: "This probably needs to sorry if the
target doesn't support named sections. We probably don't support LTO in
that case either though."
I wonder if we shouldn't rename it when we change the arguments,
so that if one mixes an older plugin with newer libgomp or vice versa
this is easily caught.
Ok.
+/* Start/end of .gnu.gomp.requires section of program, defined in
Isn't it .gnu.gomp_requires ?
Yes.
+ crtoffloadbegin/end.o. */
+__attribute__((weak))
+extern const unsigned int __requires_mask_table[];
+__attribute__((weak))
+extern const unsigned int __requires_mask_table_end[];
I must say it is unclear to me how this works.
It will only find one such array, say in the executable,
or if the executable doesn't have it, in one of the shared libraries.
I think we want some solution that will work with OpenMP code
at least in the executable and some shared libraries it is linked against
(obviously another case is when a library with certain #pragma omp requires
is dlopened after we've finalized the number of devices, bet the options
in that case are either warn or fatal error).
There is the general problem that GCC does not work well with having
device routines in a shared library; it works fine if the device part
is only in the library – but calling from an .exe program's target
region a declare-target function in a library won't work.
Thus, we may need to find a more general solution for this.
The choices could be e.g. make __requires_mask_table{,_end} .hidden
and in the crtoffloadbegin.o (or end) unconditionally call some new libgomp
routine to register the table, but the disadvantage of that is that we could
have many of those register calls even when there is nothing to register
(sure, the .ctor in crtoffloadbegin.o (or end) could compare the 2 addresses
and not call anything if the table is empty but there would be still code
executed during initialization of the library).
Yet another possibility is linker plugin case. We already use it for the
case where we actually have some offloading LTO bytecode, transform it into
a data section and register with GOMP_offload_register*.
So, if we could e.g. at the same time also process those requires arrays,
diagnose at link time if multiple TUs with that set disagree on the mask
value and in the target data provide that mask to the library, that would
be much nicer.
And the masks either could be gathered from .gnu.gomp_requires or it can be
somehow encoded in the offloading LTO or its associated data.
What is important though is that it will work even if we actually don't have
any "omp declare target" functions or variables in the TU or the whole
executable or library, just the requires mask. But that can be dealt with
e.g. by forcing the LTO sections even for that case or so.
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht
München, HRB 106955