https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908

--- Comment #46 from Hongtao.liu <crazylht at gmail dot com> ---
Another issue is splitting vector load to halves or elements, the latter
requires scratch registers which may not be available, the former doesn't
require extra register but may still trigger STLF stalls. For cray case,
splitting to halves is equal to splitting to elements.

For x86, there're sse/256_unaligned_load_optima would split 128/256-bit vector
load to halves.

Reply via email to