On 5/26/25 01:18, Robin Dapp wrote:
>> 2. OK'ish: A bunch of testcases see more reads/writes as PRE of redundant
>> read/writes is punted to later passes which obviously needs more work.
>>
>> 3. NOK: We loose the ability to instrument local RM writes - especially in 
>> the
>> testsuite.
>>   e.g.
>>      a.  instrinsic setting a static RM
>>      b. get_frm() to ensure that happened (inline asm to read out frm)
>>
>> The tightly coupled restore kicks in before get_frm could be emitted which 
>> fails
>> to observe #a. This is a deal breaker for the testsuite as much of frm tests
>> report as fail even if the actual codegen is sane.
> I'd say that most of the tests we have right now are written with the 
> existing 
> behavior in mind and don't necessarily translate well to a changed behavior.
>
> We mostly test the proper LCM and backup update behavior and backup updates 
> don't happen with a local-only approach.
>
> I haven't really understood how the FRM-changing intrinsics are used.
>
> There are two extremes: 
>
> - A single intrinsic using a different rounding mode and a lot of other 
>   arithmetic before and after it.  In that case we cannot optimize anyway 
>   because the rest must operate with the global rounding mode.
>
> - A longer code sequence, like a function, that uses a different rounding 
> mode 
>   and every instrinsic being FRM-changing.  In that case we would need to 
>   optimize a lot of saves and restores away until we only have a single save 
> at 
>   the beginning and a single restore at the end.
>
> I suppose we don't handle the latter case well right now.  But on the other 
> hand it's also not very interesting as explicit fegetround (), fesetround (), 
> fesetround () is what the user would/should have done anyway.
>
> So IMHO the only interesting cases are somewhere in the middle.  It would 
> really help to have some examples here that could tell us whether the simple 
> approach leaves a lot on the table (in terms of redundant save/restore).

As I mentioned earlier (3. above), the main issue with this approach is get_frm
() testsuite instrumentation being broken now.
FRM is already restored before it is read back (by inline asm) thus rendering
most of testsuite machinery crippled.
e.g. float-point-frm-run-1.c won't even pass the test local RM set to some
static value.

And indeed there are cases where rest of passes fail to eliminate extraneous
save/restores
float-point-dynamic-frm-13.c  now generates 3 pairs of save/restores vs. 1 save
and 3 restores

I do have yet another implementation which is mid way between the 2 extremes. It
is not as stateless as the tight save/restore and still transitions on calls and
jumps and seems to be a better compromise.
I have an little implementation issue - where inline asm reg is clobbering the
backup reg but get_frm () works as expected.

We can discuss some more in the call tomorrow.

-Vineet


Reply via email to