Hi Richard, Thanks for your comments.
On Thu, 16 May 2019 at 18:13, Richard Sandiford <richard.sandif...@arm.com> wrote: > > kugan.vivekanandara...@linaro.org writes: > > From: Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> > > > > Inorder to fix this PR. > > * We need to change the whilelo pattern in backend > > * Change RTL CSE such that: > > - Add support for VEC_DUPLICATE > > - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the > > parallel rtx at the end. > > > > For example, with patch1, we now have rtl insn as follows: > > > > (insn 19 18 20 3 (parallel [ > > (set (reg:VNx4BI 93 [ next_mask_18 ]) > > (unspec:VNx4BI [ > > (const_int 0 [0]) > > (reg:DI 95 [ _33 ]) > > ] UNSPEC_WHILE_LO)) > > (set (reg:CC 66 cc) > > (compare:CC (unspec:SI [ > > (vec_duplicate:VNx4BI (const_int 1 [0x1])) > > (reg:VNx4BI 93 [ next_mask_18 ]) > > ] UNSPEC_PTEST_PTRUE) > > (const_int 0 [0]))) > > ]) 4244 {while_ultdivnx4bi} > > > > When cse_insn process the first, it records the CSE set in reg 93. Then > > after > > processing both the instruction in the parallel rtx, we invalidate all > > expression with reg 93 which means expression in the second instruction is > > invalidated for CSE. Attached patch relaxes this by invalidating before > > processing the > > second. > > As far as patch 1 goes: the traditional reason for using clobbers > to start with is that: > > - setting CC places a requirement on what CC must be after that instruction. > We then have to rely on REG_UNUSED notes to tell whether that value > actually matters or not. > > This was a bigger deal before df though. It might not matter as much now. > > - many passes find it harder to deal with multiple sets rather than > single sets, so it's better to keep a single_set unless we know > that both results are needed. > > It's currently combine's job to create a multiple set in cases > where both results are useful. The pattern for that already exists > (*while_ult<GPI:mode><PRED_ALL:mode>_cc), so if we do go for something > like patch 1, we should simply expand to that insn rather than adding a > new one. Note that: > > (vec_duplicate:PRED_ALL (const_int 1)) > > shouldn't appear in the insn stream. It should always be a const_vector > instead. > > From a quick check, I haven't yet found a case where setting CC up-front > hurts though, so maybe the above is out-of-date best practice and we > should set the register up-front after all, if only to reduce the number > of patterns. > > However, if possible, I think we should fix the PR in a way that works > for instructions that only optionally set the flags (which for AArch64 > is the common case). So it would be good if we could fix the PR without > needing patch 1. Do you think that combine should be able to set this. Sorry, I don't understand how we can let other passes know that this instruction will set the flags needed. Thanks, Kugan > > Thanks, > Richard > > > > > Bootstrap and regression testing for the current version is ongoing. > > > > Thanks, > > Kugan > > > > Kugan Vivekanandarajah (2): > > [PR88836][aarch64] Set CC_REGNUM instead of clobber > > [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one > > > > gcc/config/aarch64/aarch64-sve.md | 9 +++- > > gcc/cse.c | 67 > > ++++++++++++++++++++++++++---- > > gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++ > > 3 files changed, 80 insertions(+), 10 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c