On 5/26/25 01:18, Robin Dapp wrote: >> 2. OK'ish: A bunch of testcases see more reads/writes as PRE of redundant >> read/writes is punted to later passes which obviously needs more work. >> >> 3. NOK: We loose the ability to instrument local RM writes - especially in >> the >> testsuite. >> e.g. >> a. instrinsic setting a static RM >> b. get_frm() to ensure that happened (inline asm to read out frm) >> >> The tightly coupled restore kicks in before get_frm could be emitted which >> fails >> to observe #a. This is a deal breaker for the testsuite as much of frm tests >> report as fail even if the actual codegen is sane. > I'd say that most of the tests we have right now are written with the > existing > behavior in mind and don't necessarily translate well to a changed behavior. > > We mostly test the proper LCM and backup update behavior and backup updates > don't happen with a local-only approach. > > I haven't really understood how the FRM-changing intrinsics are used. > > There are two extremes: > > - A single intrinsic using a different rounding mode and a lot of other > arithmetic before and after it. In that case we cannot optimize anyway > because the rest must operate with the global rounding mode. > > - A longer code sequence, like a function, that uses a different rounding > mode > and every instrinsic being FRM-changing. In that case we would need to > optimize a lot of saves and restores away until we only have a single save > at > the beginning and a single restore at the end. > > I suppose we don't handle the latter case well right now. But on the other > hand it's also not very interesting as explicit fegetround (), fesetround (), > fesetround () is what the user would/should have done anyway. > > So IMHO the only interesting cases are somewhere in the middle. It would > really help to have some examples here that could tell us whether the simple > approach leaves a lot on the table (in terms of redundant save/restore).
As I mentioned earlier (3. above), the main issue with this approach is get_frm () testsuite instrumentation being broken now. FRM is already restored before it is read back (by inline asm) thus rendering most of testsuite machinery crippled. e.g. float-point-frm-run-1.c won't even pass the test local RM set to some static value. And indeed there are cases where rest of passes fail to eliminate extraneous save/restores float-point-dynamic-frm-13.c now generates 3 pairs of save/restores vs. 1 save and 3 restores I do have yet another implementation which is mid way between the 2 extremes. It is not as stateless as the tight save/restore and still transitions on calls and jumps and seems to be a better compromise. I have an little implementation issue - where inline asm reg is clobbering the backup reg but get_frm () works as expected. We can discuss some more in the call tomorrow. -Vineet