The only thing I think we want for the patch (as Pan also raised last time) is the param to set those .vx costs to zero in order to ensure the tests test the right thing (--param=vx_preferred/gr2vr_cost or something).

I see, shall we start a new series for this? AFAIK, we may need some more 
alignment
for something like --param=xx that exposing to end-user.

According to patchwork the tests you add pass but shouldn't they actually fail with a GR2VR cost of 2? I must be missing something.

For now the cost of GR2VR is 2, take test vx_vadd-1-i64.c for example, the 
vec_dup + vadd.vv
has higher cost than vadd.vx, thus perform the late-combine as below.

Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 while vadd.vx costs 1 + 2, making it cheaper?

IMHO vec_dup should just cost 2 (=GR2VR) rather than 3. All it does is broadcast (no additional operation), while vadd.vx performs the broadcast (cost 2) as well as an operation (cost 1). So vec_dup + vadd.vv should cost 3, the same as vadd.vx. In late combine when comparing costs we scale the them by "frequency". The vadd.vx inside the loop should have higher frequency making it more costly by default.

With such a change the tests wouldn't pass by default (AFAICT) and we would need a --param=xx. I wouldn't worry about exposing those details to the user for now as we're so early in the cycle and can easily iterate on it. I would suggest just adding something in order to make the tests work as expected and change things later (if needed).

--
Regards
Robin

Reply via email to