Hi!
On 2021-03-17T08:46:16+0100, Richard Biener <[email protected]> wrote:
> On Wed, Mar 17, 2021 at 12:25 AM Thomas Schwinge
> <[email protected]> wrote:
>> This is a prototype/"initial hack" for a very simple implementation of
>> <https://gcc.gnu.org/PR90591> "Avoid unnecessary data transfer out of OMP
>> construct". (... to be completed and posted later.) So simple that it
>> will easily fail (gracefully, of course), but yet is effective for a lot
>> of real-world code:
>>
>> subroutine [...]
>> [...]
>> real([...]) xx !temporary variable, for distance calculation
>> [...]
>> !$acc kernels pcopyin(x, zii) reduction(+:eva) ! implicit 'copy(xx)' for
>> scalar used inside region; established during gimplification
>> do 100 i=0,n-1
>> evx=0.0d0
>> do 90 j=0,n-1
>> xx=abs(x(1,i)-x(1,j))
>> [...]
>> !$acc end kernels
>> [...]
>> ['xx' never used here]
>> end subroutine [...]
>>
>> Inside 'kernels', we'd like to automatically parallelize loops (we've
>> basically got that working; analysis by Graphite etc.), but the problem
>> is that given "implicit 'copy(xx)' for scalar used inside region", when
>> Graphite later looks at the outlined 'kernels' region's function, it must
>> assume that 'xx' is still live after the OpenACC 'kernels' construct --
>> and thus cannot treat it as a thread-private temporary, cannot
>> parallelize.
>>
>> Now, walking each function backwards (!), I'm taking note of any
>> variables' uses, and if I reach an 'kernels' construct, but have not seen
>> a use of 'xx', I may then optimize 'copy(xx)' -> 'firstprivate(xx)',
>> enabling Graphite to do its thing. (Other such optimizations may be
>> added later.) (This is inspired by Jakub's commit
>> 1a80d6b87d81c3f336ab199a901cf80ae349c335 "re PR tree-optimization/68128
>> (A huge regression in Parboil v2.5 OpenMP CUTCP test (2.5 times lower
>> performance))".)
>>
>> I've now got a simple 'callback_op', which for '!is_lhs' looks at
>> 'get_base_address ([op])', and if that 'var' is contained in the set of
>> current candidates (initialized per containg 'bind's, which we enter
>> first, even if walking a 'gimple_seq' backwards), removes that 'var' as a
>> candidate for such optimization. (Plus some "details", of couse.) This
>> seems to work fine, as far as I can tell. :-)
>
> It might then still fail for x = a[var] when you are interested in 'var'.
That already works as expected:
gimple_assign <array_ref, x, a[var], NULL, NULL>
..., and in the debug log ('OP' is 'get_base_address (OPERAND)'), I see:
##### LOOKING INTO OPERAND: <array_ref 0x7ffff67af3b8>
###### OP: <var_decl 0x7ffff7feeb40 a>
####### WASN'T CANDIDATE
##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feeb40 a>
###### OP: <var_decl 0x7ffff7feeb40 a>
####### WASN'T CANDIDATE
##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feebd0 var>
###### OP: <var_decl 0x7ffff7feebd0 var>
####### NO LONGER CANDIDATE
##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feec60 x>
###### IGNORED: IS_LHS
Notice 'var' "NO LONGER CANDIDATE".
Well, I cross-checked, and I'm getting this behavior (deep scanning)
because I'm (intentionally) using (default) '*walk_subtrees = 1' in my
'callback_op'. Indeed if I specify '*walk_subtrees = 0', the debug log
changes as follows:
##### LOOKING INTO OPERAND: <array_ref 0x7ffff67af3b8>
###### OP: <var_decl 0x7ffff7feeb40 a>
####### WASN'T CANDIDATE
-##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feeb40 a>
-###### OP: <var_decl 0x7ffff7feeb40 a>
-####### WASN'T CANDIDATE
-##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feebd0 var>
-###### OP: <var_decl 0x7ffff7feebd0 var>
-####### NO LONGER CANDIDATE
##### LOOKING INTO OPERAND: <var_decl 0x7ffff7feec60 x>
###### IGNORED: IS_LHS
..., and your projected mis-optimization:
+##### OPTIMIZING map(force_tofrom:var [len: 4]) -> firstprivate(var)
If I continue to use (default) '*walk_subtrees = 1', then I might not
actually need to use 'get_base_address', because we get all the
individual sub operands visited.
> I think you want to use walk_gimple_stmt and provide walk_tree_fn which
> will recurse into the complex tree operands (also making get_base_address
> unnecessary).
I'm already using 'walk_gimple_stmt' via 'walk_gimple_seq', and I'll
(later) look into removing 'get_base_address', and also look up
'walk_tree_fn'. Ah, wait, what you suggested with 'walk_tree_fn' is
actually what I've already got; in my last email I said "I've now got a
simple 'callback_op' [...]". Seems we're converging! ;-)
Grüße
Thomas
>> Of course, the eventual IPA-based solution (see PR90591, PR94693, etc.)
>> will be much better -- but we need something now.
>>
>>
>> Grüße
>> Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank
Thürauf