Claudiu Zissulescu-Ianculescu <claudiu.zissulescu-iancule...@oracle.com> writes:
>>> +  [(set (match_operand:TI 0 "aarch64_granule16_memory_operand" "=Umg")
>>> +      (unspec:TI
>>> +        [(match_operand:TI 1 "aarch64_granule16_memory_operand" "Umg")
>>> +         (match_operand:DI 2 "register_operand" "rk")]
>>> +        UNSPEC_TAG_SPACE))]
>>> +  "TARGET_MEMTAG && aarch64_check_memtag_ops (operands[0], operands[1])"
>>> +  "stg\\t%2, %0"
>>> +  [(set_attr "type" "memtag")]
>>> +)
>> 
>> In the previous reviews, I'd suggested we use things like:
>> 
>> (set (match_operand:TI 0 "aarch64_granule_memory_operand" "+Umg")
>>      (unspec_volatile:TI
>>        [(match_dup 0)
>>         (match_operand:DI 1 "register_operand" "rk")]
>>        UNSPECV...))
>> 
>> for the STG patterns.  Does that not work?  I remember there was some
>> follow-up discussion about the post-increment case, but I thought the
>> plan had been to use a separate pattern for those.
>
> First, I wanted to avoid using unpsec_volatile constructions. Then,
> using match_dup 0 is not good when we do have POST/PRE INC/DEC/MODIFY,
> as the compiler sees two operands doing the PRE/POST operation. Hence, I
> introduced the aarch64_check_memtag_ops() function which checks if the
> operands are the same or not (ignoring the POST/PRE constructions in
> operand 0). Moreover, I wanted to use the already in machinery of
> classifying an address (see aarch64_granule16_memory_address_p).
>
>> 
>> (Sorry if I'm misremembering, been a while since the last series.)
>> 
>> The problem with the pattern in the patch is that operands 0 and 1's
>> predicates and constraints treat the operands as entirely separate.
>> aarch64_check_memtag_ops would be applied during insn recognition but
>> it would have no effect on (for example) register allocation.
>
> Alternatively, I can "explode" the above pattern to handle separately
> PRE/POST cases.
>
>> 
>> That said... I wonder if it would work to drop the aarch64_check_memtag_ops
>> condition and instead use something like:
>> 
>>   return (GET_RTX_CLASS (GET_CODE (XEXP (operands[1], 0))) == RTX_AUTOINC
>>           ? "stg\\t%2, %1"
>>        : "stg\\t%2, %0");
>> 
>> We know that operands 0 and 1 started out as the same address value and
>> it would be invalid for any later pass to change that value (rather than
>> the way that the value is computed).
>> 
>> I don't remember seeing that RTX_AUTOINC pattern used before, and it
>> feels a bit underhand, but I can't immediately think of a reason why
>> it's wrong in principle.
>> 
>> The unspec must be unspec_volatile though, for the reason in the
>> previous review:
>> 
>>   The unspec_volatile should ensure that nothing tries to remove the
>>   store as dead (which would especially be a problem when clearing tags).
>
> Indeed. For avoiding DCE, I have added a memory barrier. Of course, I
> can revert it back to unspec_volatile.

Ah, I see.  Perhaps the implementation of the barrier as:

    asm volatile ("" ::: "memory");

would have the effect of keeping all reaching stores live, since the
memory clobber acts as both a read and a write.  But it seems surprising
that that would be the case even for things that are provably local
to the current function and are provably dead at the point of the asm.

For example, in:

void g() {
  int x = 1;
  asm volatile("// barrier" ::: "memory");
  x = 2;
  asm volatile("// use %0" :: "m" (x));
}

the barrier does have the effect of preserving the store of 1 to x,
and similarly for:

void g() {
  int x = 2;
  asm volatile("// use %0" :: "m" (x));
  x = 1;
  asm volatile("// barrier" ::: "memory");
}

But it's not clear to me whether that's really required or whether
it's just a limitation of the current optimisers.

[FTR,

void g() {
  int x = 2;
  asm volatile("// use %0" :: "a" (&x));
  x = 1;
  asm volatile("// barrier" ::: "memory");
}

is different in that the first asm could cache the address for the
second asm.]

Let's see what others think.

Richard

Reply via email to