Hi! Richard, as the original author of 'SSA_NAME_POINTS_TO_READONLY_MEMORY': 2018 commit 6214d5c7e7470bdd5ecbeae668c2522551bfebbc (Subversion r263958) "Move const_parm trick to generic code"; 'gcc/tree.h':
/* Nonzero if this SSA_NAME is known to point to memory that may not be written to. This is set for default defs of function parameters that have a corresponding r or R specification in the functions fn spec attribute. This is used by alias analysis. */ #define SSA_NAME_POINTS_TO_READONLY_MEMORY(NODE) \ SSA_NAME_CHECK (NODE)->base.deprecated_flag ..., may I ask you to please help review the following patch (full-quoted)? For context: this patch here ("second patch") depends on a first patch: <inbox.sourceware.org/d0e6013f-ca38-b98d-dc01-b30adbd59...@siemens.com> "[PATCH, OpenACC 2.7] readonly modifier support in front-ends". That one is still under review/rework; so you're not able to apply this second patch here. In a nutshell: a 'readonly' modifier has been added to the OpenACC 'copyin' clause (copy host to device memory, don't copy back at end of region): | If the optional 'readonly' modifier appears, then the implementation may assume that the data | referenced by _var-list_ is never written to within the applicable region. That is, for example (untested): #pragma acc routine void escape(int *); int x[32] = [...]; #pragma acc parallel copyin(readonly: x) { int a1 = x[3]; escape(x); int a2 = x[3]; // Per 'readonly', don't need to reload 'x[3]' here. //x[22] = 0; // Invalid -- but no diagnostic mandated. } What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY' flag. The actual optimization then is done in this second patch. Chung-Lin found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that. I don't have much experience with most of the following generic code, so would appreciate a helping hand, whether that conceptually makes sense as well as from the implementation point of view: On 2023-07-25T23:52:06+0800, Chung-Lin Tang via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote: >> As we discussed earlier, the work for actually linking this to middle-end >> points-to analysis is a somewhat non-trivial issue. This first patch allows >> the language feature to be used in OpenACC directives first (with no effect >> for now). >> The middle-end changes are probably going to be a later patch. > > This second patch tries to link the readonly modifier to points-to analysis. > > There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in > the > alias oracle routines in tree-ssa-alias.cc, so basically what this patch does > is > try to make the variables holding the array section base pointers to have this > flag set. > > There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the > associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set. > Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp > vars carrying these receiver references on the offloaded side. These > eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY. > This still doesn't always work as expected in terms of optimization: > struct pointer fields and Fortran arrays (kind of like C structs) which have > several accesses to create the pointer access on the receive/offloaded side, > and SRA appears to not work on these sequences, so gets in the way of much > redundancy elimination. I understand correctly that this is left as future work? Please add the test cases you have, XFAILed in some reasonable way. > Currently have one testcase where we can demonstrate 'readonly' can avoid > a clobber by function call. :-) > --- a/gcc/c/c-typeck.cc > +++ b/gcc/c/c-typeck.cc > @@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum > c_omp_region_type ort) > OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH); > else > OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER); > + if (OMP_CLAUSE_MAP_READONLY (c)) > + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1; > OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c); > if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER > && !c_mark_addressable (t)) > --- a/gcc/cp/semantics.cc > +++ b/gcc/cp/semantics.cc > @@ -5872,6 +5872,8 @@ handle_omp_array_sections (tree c, enum > c_omp_region_type ort) > } > else > OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER); > + if (OMP_CLAUSE_MAP_READONLY (c)) > + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1; > OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c); > if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER > && !cxx_mark_addressable (t)) > --- a/gcc/fortran/trans-openmp.cc > +++ b/gcc/fortran/trans-openmp.cc > @@ -2524,6 +2524,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, > gfc_exec_op op, > node3 = build_omp_clause (input_location, OMP_CLAUSE_MAP); > OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind); > OMP_CLAUSE_DECL (node3) = gfc_conv_descriptor_data_get (decl); > + if (n->u.readonly) > + OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1; > /* This purposely does not include GOMP_MAP_ALWAYS_POINTER. The extra > cast prevents gimplify.cc from recognising it as being part of the > struct - and adding an 'alloc: for the 'desc.data' pointer, which > @@ -2559,6 +2561,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, > gfc_exec_op op, > OMP_CLAUSE_MAP); > OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind); > OMP_CLAUSE_DECL (node3) = decl; > + if (n->u.readonly) > + OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1; > } Could combine these two into one, after 'if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))' reconverges here, like where 'OMP_CLAUSE_SIZE (node3)' is set: > ptr2 = fold_convert (ptrdiff_type_node, ptr2); > OMP_CLAUSE_SIZE (node3) = fold_build2 (MINUS_EXPR, ptrdiff_type_node, Is 'n->u.readonly == OMP_CLAUSE_MAP_READONLY (node)'? If yes, would the latter be clearer to use as the 'if' expression (like in C, C++ front ends)? I see further additional 'OMP_CLAUSE_MAP' clauses synthesized, for example in 'gcc/cp/semantics.cc:handle_omp_array_sections', or 'gcc/fortran/trans-openmp.cc:gfc_trans_omp_array_section', also 'gcc/gimplify.cc'. I assume these are not relevant to have 'OMP_CLAUSE_MAP_READONLY' -> 'OMP_CLAUSE_MAP_POINTS_TO_READONLY' propagated? Actually, per your changes (see below), there is one 'OMP_CLAUSE_MAP_POINTS_TO_READONLY' propagation in 'gcc/gimplify.cc:build_omp_struct_comp_nodes'. Is the current situation re flag setting/propagation what was empirically necessary to make the test case work, or is it a systematic review? (The former is fine; I'd just like to know.) > --- a/gcc/gimple-expr.cc > +++ b/gcc/gimple-expr.cc > @@ -376,6 +376,8 @@ copy_var_decl (tree var, tree name, tree type) > DECL_CONTEXT (copy) = DECL_CONTEXT (var); > TREE_USED (copy) = 1; > DECL_SEEN_IN_BIND_EXPR_P (copy) = 1; > + if (VAR_P (var)) > + DECL_POINTS_TO_READONLY (copy) = DECL_POINTS_TO_READONLY (var); > DECL_ATTRIBUTES (copy) = DECL_ATTRIBUTES (var); > if (DECL_USER_ALIGN (var)) > { > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -221,6 +221,7 @@ struct gimplify_omp_ctx > splay_tree variables; > hash_set<tree> *privatized_types; > tree clauses; > + hash_set<tree_operand_hash> *pt_readonly_ptrs; > /* Iteration variables in an OMP_FOR. */ > vec<tree> loop_iter_var; > location_t location; > @@ -628,6 +629,15 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, > gimple_seq *post_p, > gimplify_expr (&val, pre_p, post_p, is_gimple_reg_rhs_or_call, > fb_rvalue); > > + bool pt_readonly = false; > + if (gimplify_omp_ctxp && gimplify_omp_ctxp->pt_readonly_ptrs) > + { > + tree ptr = val; > + if (TREE_CODE (ptr) == POINTER_PLUS_EXPR) > + ptr = TREE_OPERAND (ptr, 0); > + pt_readonly = gimplify_omp_ctxp->pt_readonly_ptrs->contains (ptr); > + } 'POINTER_PLUS_EXPR' is the only special thing we may run into, here? (Generally, I prefer 'if', 'else if, [...], 'else gcc_unreachable ()'.) > + > if (allow_ssa > && gimplify_ctxp->into_ssa > && is_gimple_reg_type (TREE_TYPE (val))) > @@ -639,9 +649,18 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, > gimple_seq *post_p, > if (name) > SET_SSA_NAME_VAR_OR_IDENTIFIER (t, create_tmp_var_name (name)); > } > + if (pt_readonly) > + SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1; > } > else > - t = lookup_tmp_var (val, is_formal, not_gimple_reg); > + { > + t = lookup_tmp_var (val, is_formal, not_gimple_reg); > + if (pt_readonly) > + { > + DECL_POINTS_TO_READONLY (t) = 1; > + gimplify_omp_ctxp->pt_readonly_ptrs->add (t); > + } > + } > > mod = build2 (INIT_EXPR, TREE_TYPE (t), t, unshare_expr (val)); > > @@ -8906,6 +8925,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree > grp_start, tree grp_end, > OMP_CLAUSE_SET_MAP_KIND (c2, mkind); > OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (grp_end)); > OMP_CLAUSE_CHAIN (c2) = NULL_TREE; > + if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (grp_end)) > + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1; > tree grp_mid = NULL_TREE; > if (OMP_CLAUSE_CHAIN (grp_start) != grp_end) > grp_mid = OMP_CLAUSE_CHAIN (grp_start); For my understanding, is this empirically necessary, or a systematic review? > @@ -11741,6 +11762,16 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq > *pre_p, > > gimplify_omp_ctxp = outer_ctx; > } > + else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP > + && (code == OACC_PARALLEL > + || code == OACC_KERNELS > + || code == OACC_SERIAL) > + && OMP_CLAUSE_MAP_POINTS_TO_READONLY (c)) > + { > + if (ctx->pt_readonly_ptrs == NULL) > + ctx->pt_readonly_ptrs = new hash_set<tree_operand_hash> (); > + ctx->pt_readonly_ptrs->add (OMP_CLAUSE_DECL (c)); > + } > if (notice_outer) > goto do_notice; > break; Also need to 'delete ctx->pt_readonly_ptrs;' somewhere. > --- a/gcc/omp-low.cc > +++ b/gcc/omp-low.cc > @@ -14098,6 +14098,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, > omp_context *ctx) > if (ref_to_array) > x = fold_convert_loc (clause_loc, TREE_TYPE (new_var), x); > gimplify_expr (&x, &new_body, NULL, is_gimple_val, fb_rvalue); > + if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (c) && VAR_P (x)) > + DECL_POINTS_TO_READONLY (x) = 1; > if ((is_ref && !ref_to_array) > || ref_to_ptr) > { This is in the middle of the "Handle GOMP_MAP_FIRSTPRIVATE_{POINTER,REFERENCE} in second pass" code block. Again, for my understanding, is this empirically necessary, or a systematic review? > --- a/gcc/testsuite/c-c++-common/goacc/readonly-1.c > +++ b/gcc/testsuite/c-c++-common/goacc/readonly-1.c > @@ -19,8 +19,8 @@ int main (void) > return 0; > } > > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel > map\\(readonly,to:\\*s.ptr \\\[len: 64\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] > \\\[len: 128\\\]\\)" 1 "original" { target { c } } } } */ > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel > map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: 64\\\]\\) .+ > map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\)" 1 "original" { target { c++ > } } } } */ > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel > map\\(readonly,to:\\*s.ptr \\\[len: 64\\\]\\) > map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) > map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\) > map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 > "original" { target { c } } } } */ > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel > map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: 64\\\]\\) > map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) > map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\) > map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 > "original" { target { c++ } } } } */ > /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache > \\(readonly:x\\\[0\\\] \\\[len: 128\\\]\\);$" 1 "original" } } */ I suppose the new 'map(pt_readonly,attach_detach:s.ptr [bias: 0])' clause was previously "hidden" in '.+'? Please then change that in the first patch "[PATCH, OpenACC 2.7] readonly modifier support in front-ends", so that we can see here what actually is changing (only 'pt_readonly', I suppose). > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/goacc/readonly-2.c > @@ -0,0 +1,15 @@ > +/* { dg-additional-options "-O -fdump-tree-fre" } */ > + > +#pragma acc routine > +extern void foo (int *ptr, int val); > + > +int main (void) > +{ > + int r, a[32]; > + #pragma acc parallel copyin(readonly: a[:32]) copyout(r) > + { > + foo (a, a[8]); > + r = a[8]; > + } > +} > +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = > MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */ Please add a comment why 'fre1', and what generally is being checked here; that's not obvious to the casual reader. (That is, me in a few weeks.) ;-) Also add a scan for "before the optimization": two 'MEM's, I suppose? > --- a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90 > +++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90 > @@ -20,8 +20,8 @@ program main > !$acc end parallel > end program main > > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel > map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data > \\\[len: .+\\\]\\) .+ map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] > \\*\\) parm.*data \\\[len: .+\\\]\\)" 1 "original" } } > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel > map\\(readonly,to:a\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - > \\(integer\\(kind=8\\)\\) &a\\) / 4\\\] \\\[len: .+\\\]\\) .+ > map\\(readonly,to:b\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - > \\(integer\\(kind=8\\)\\) &b\\) / 4\\\] \\\[len: .+\\\]\\)" 1 "original" } } > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel > map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data > \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:a.0 \\\[pointer assign, bias: > \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) a.0\\\]\\) > map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data > \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:b \\\[pointer assign, bias: > \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) b\\\]\\)" 1 > "original" } } > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel > map\\(readonly,to:a\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - > \\(integer\\(kind=8\\)\\) &a\\) / 4\\\] \\\[len: .+\\\]\\) > map\\(pt_readonly,alloc:a \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) > parm.*data - \\(integer\\(kind=8\\)\\) &a\\\]\\) > map\\(readonly,to:b\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - > \\(integer\\(kind=8\\)\\) &b\\) / 4\\\] \\\[len: .+\\\]\\) > map\\(pt_readonly,alloc:b \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) > parm.*data - \\(integer\\(kind=8\\)\\) &b\\\]\\)" 1 "original" } } > ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache > \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: > .+\\\]\\) \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data > \\\[len: .+\\\]\\);" 2 "original" } } Same comment as for 'c-c++-common/goacc/readonly-1.c'. > --- a/gcc/tree-pretty-print.cc > +++ b/gcc/tree-pretty-print.cc > @@ -907,6 +907,8 @@ dump_omp_clause (pretty_printer *pp, tree clause, int > spc, dump_flags_t flags) > pp_string (pp, "map("); > if (OMP_CLAUSE_MAP_READONLY (clause)) > pp_string (pp, "readonly,"); > + if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (clause)) > + pp_string (pp, "pt_readonly,"); > switch (OMP_CLAUSE_MAP_KIND (clause)) > { > case GOMP_MAP_ALLOC: > @@ -3436,6 +3438,8 @@ dump_generic_node (pretty_printer *pp, tree node, int > spc, dump_flags_t flags, > pp_string (pp, "(D)"); > if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (node)) > pp_string (pp, "(ab)"); > + if (SSA_NAME_POINTS_TO_READONLY_MEMORY (node)) > + pp_string (pp, "(ptro)"); > break; > > case WITH_SIZE_EXPR: > --- a/gcc/tree-ssanames.cc > +++ b/gcc/tree-ssanames.cc > @@ -402,6 +402,9 @@ make_ssa_name_fn (struct function *fn, tree var, gimple > *stmt, > else > SSA_NAME_RANGE_INFO (t) = NULL; > > + if (VAR_P (var) && DECL_POINTS_TO_READONLY (var)) > + SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1; > + > SSA_NAME_IN_FREE_LIST (t) = 0; > SSA_NAME_IS_DEFAULT_DEF (t) = 0; > init_ssa_name_imm_use (t); > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -1021,6 +1021,13 @@ extern void omp_clause_range_check_failed (const_tree, > const char *, int, > #define DECL_HIDDEN_STRING_LENGTH(NODE) \ > (TREE_CHECK (NODE, PARM_DECL)->decl_common.decl_nonshareable_flag) > > +/* In a VAR_DECL, set for variables regarded as pointing to memory not > written > + to. SSA_NAME_POINTS_TO_READONLY_MEMORY gets set for SSA_NAMEs created from > + such VAR_DECLs. Currently used by OpenACC 'readonly' modifier in copyin > + clauses. */ > +#define DECL_POINTS_TO_READONLY(NODE) \ > + (TREE_CHECK (NODE, VAR_DECL)->decl_common.decl_not_flexarray) > + > /* In a CALL_EXPR, means that the call is the jump from a thunk to the > thunked-to function. Be careful to avoid using this macro when one of the > next two applies instead. */ > @@ -1815,6 +1822,10 @@ class auto_suppress_location_wrappers > #define OMP_CLAUSE_MAP_READONLY(NODE) \ > TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)) > > +/* Set if 'OMP_CLAUSE_DECL (NODE)' points to read-only memory. */ > +#define OMP_CLAUSE_MAP_POINTS_TO_READONLY(NODE) \ > + TREE_CONSTANT (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)) > + > /* Same as above, for use in OpenACC cache directives. */ > #define OMP_CLAUSE__CACHE__READONLY(NODE) \ > TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_)) As in my "[PATCH, OpenACC 2.7] readonly modifier support in front-ends" review, please document how certain flags are used for OMP clauses. I note you're not actually using 'OMP_CLAUSE__CACHE__READONLY' anywhere -- but that's OK given the current 'gcc/gimplify.cc:gimplify_oacc_cache'. ;-) Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955