On 27/02/23 9:58 pm, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 04, 2023 at 01:58:19PM +0530, Surya Kumari Jangala wrote:
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Modifying swappable instructions is also incorrect in webs where
>> loads/stores on quad word aligned addresses are changed to lvx/stvx.
>> Similarly, in webs where swap(load(vector constant)) instructions are
>> replaced with load(swapped vector constant), the swappable
>> instructions should not be modified.
>>
>> 2023-01-04  Surya Kumari Jangala  <jskum...@linux.ibm.com>
>>
>> gcc/
>>      PR rtl-optimization/106770
>>      * rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> Please add an entry?  Or multiple ones, actually.  Describe all changes.
Ok

> 
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>>    unsigned int special_handling : 4;
>>    /* Set if the web represented by this entry cannot be optimized.  */
>>    unsigned int web_not_optimizable : 1;
>> +  /* Set if the web represented by this entry has been optimized, ie,
> 
> s/ie/i.e./
> 
>> +     register swaps of permuting loads/stores have been removed.  */
> 
> If it really means only exactly this, then the name isn't so good.

There is another bit in this class named "web_not_optimizable". This stands for 
webs that cannot be optimized. I am reusing this name for the new bit I am 
adding, and I have named this bit as "web_is_optimized".

> 
>> +  unsigned int web_is_optimized : 1;
> 
> And if it is as general as the name suggests, then the comment is no
> good.  Which is it?  :-)
> 
>>    /* For each load and store in an optimizable web (which implies
>>       the loads and stores are permuting), find the associated
>>       register swaps and mark them for removal.  Due to various
>> -     optimizations we may mark the same swap more than once.  Also
>> -     perform special handling for swappable insns that require it.  */
>> +     optimizations we may mark the same swap more than once. Fix up
>> +     the non-permuting loads and stores by converting them into
>> +     permuting ones.  */
> 
> Two spaces after a full stop is correct.  Please put that back.
Ok

> 
> Is it a good idea convert from/to swapping load/stores in this pass at
> all?  Doesdn't that belong elsewhere?  Like, in combine, where we
> already should do this.  Why does that not work> 
>> -    if (!root_entry->web_not_optimizable)
>> +    if (!root_entry->web_not_optimizable) {
> 
> Blocks start on a new line, indented.
Ok

> 
>>        mark_swaps_for_removal (insn_entry, i);
>> +          root_entry->web_is_optimized = true;
> 
> Indent using tabs where possible.
Ok

> 
>> +        swap_web_entry* root_entry
>> +          = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> 
> Space before *, in all cases. Space before the second (.  There are too
> many brackets here, too.
Ok

> 
>> +  /* Perform special handling for swappable insns that require it. 
> 
> No trailing spaces.
Ok

> 
>> +     Note that special handling should be done only for those 
>> +     swappable insns that are present in webs optimized above.  */
>> +  for (i = 0; i < e; ++i)
>> +    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
>> +        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
>> +          insn_entry[i].special_handling == SH_NOSWAP_ST))
>>        {
>>      swap_web_entry* root_entry
>>        = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
>> -    if (!root_entry->web_not_optimizable)
>> +    if (root_entry->web_is_optimized)
>>        handle_special_swappables (insn_entry, i);
>>        }
> 
> Why this change?

The swap pass analyzes vector computations and removes unnecessary doubleword 
swaps (xxswapdi instructions). The swap pass first constructs webs and removes 
swap instructions if possible. If the web contains operations that are 
sensitive to element order (ie, insns requiring special handling), such as an 
extract, then such instructions should be modified. For example, for an extract 
operation the lane is changed.
However, the swap pass is changing element order of an extract operation that 
is present in unoptimized webs. Unoptimized webs are those for which register 
swaps were not removed, one of the reasons being that there are no loads/stores 
present in this web. For such webs, element order of extract operation should 
not be changed.
Hence, we first mark webs that can be optimized, and only for such webs we call 
the routine handle_special_swappables() to modify operations sensitive to 
element order.
One type of insns that are handled by handle_special_swappables() are 
non-permuting loads/stores. These insns are converted to permuting 
loads/stores. This conversion is needed because a permuting load/store is 
converted into a simple load/store during the final assembly code generation. 
Whereas, non-permuting loads/stores are converted into a load/store and a swap 
instruction.
So we first go thru all the webs, and for permuting loads/stores we find the 
associated register swaps and mark them for removal. And for non-permuting 
loads/stores, we convert them to permuting loads/stores. All such webs are 
marked as 'optimized'
Then we go thru all the webs again, and for those marked 'optimized', we call 
handle_special_swappables() to handle insns that require special handling.

> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
> 
> Is -O3 required?  Use -O2 if you can.  And no trailing spaces please.
Let me check with -O2.

> 
>> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> 
> Those two remaining are superfluous, so comment that please.
Ok

> 
> 
> Segher

Reply via email to