On 9/25/25 4:05 AM, Paul-Antoine Arras wrote:
Hi Jeff,

On 23/09/2025 22:39, Jeff Law wrote:
On 9/23/25 1:45 PM, Paul-Antoine Arras wrote:
I experimented with this patch which allows to remove a vfmv when a floating-point op can be loaded directly from memory with a zero- stride vlse.

In terms of benchmarks, I measured the following reductions in icount:
* 503.bwaves: -4.0%
* 538.imagick: -3.3%
* 549.fotonik3d: -0.34%

However, the icount for 507.cactuBSSN increased by 0.43%. In addition, measurements on the BPI board show that the patch actually increases execution times by 5 to 11%.

This may still be beneficial for some uarchs but would have to be tunable, wouldn't it?
Is worth proceeding with this?
It's probably worth investigating.  DO you happen to have A/B binaries handy still?  I could throw them onto our design.

Yes, you'll find attached the two binaries I built and tested on the BPI.
No clue why, but I'm getting faults running those binaries on our design. The instruction pointer seems to go off into never-never land. Given the binaries are statically linked, that's exceptionally weird.

Regardless, seems easier if I just build things here.

jeff

Reply via email to