[TCWG CI] 400.perlbench slowed down by 6% after llvm: [AArch64] Remove redundant ORRWrs which is generated by zero-extend

2021-10-27 Thread ci_notify
After llvm commit a502436259307f95e9c95437d8a1d2d07294341c Author: Jingu Kang [AArch64] Remove redundant ORRWrs which is generated by zero-extend the following benchmarks slowed down by more than 2%: - 400.perlbench slowed down by 6% from 9792 to 10354 perf samples - 464.h264ref slowed down

Re: [TCWG CI] 400.perlbench slowed down by 6% after llvm: [AArch64] Remove redundant ORRWrs which is generated by zero-extend

2021-10-27 Thread David Spickett
I think this is a false positive/one off disturbance in the benchmarking. Based on the contents of the saved temps. FastFullPelBlockMotionSearch has not changed at all. (so unless perf is saying time spent in that function and its callees went up, it must be something other than code change) perl

Re: [TCWG CI] 400.perlbench slowed down by 6% after llvm: [AArch64] Remove redundant ORRWrs which is generated by zero-extend

2021-10-27 Thread Maxim Kuvyrkov
Hi David, Thanks for looking at this! I can’t immediately say that this is a false positive, the performance difference reproduces in several independent builds. Looking at the save-temps -- at least 400.perlbench’es regexec.s (which hosts S_regmatch()) has 19 extra instructions, which are, if

Re: [TCWG CI] 433.milc:[.] mult_su3_mat_vec slowed down by 11% after llvm: [AMDGPU] Enable load clustering in the post-RA scheduler

2021-10-27 Thread Foad, Jay
[AMD Official Use Only] > This benchmarking CI is work-in-progress, and we welcome feedback and > suggestions at linaro-toolchain@lists.linaro.org My feedback is: the commit in question only touched the AMDGPU backend, so I doubt it had any effect on your AArch64 benchmark. Jay. __