https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #37 from Hongtao.liu <crazylht at gmail dot com> --- > There is not much value in the vectorization we do in this function > (when manually fixing the STLF issue the speed is as good as with the > scalar code). We cost > > ray.dir.x 1 times scalar_load costs 12 in body > ray.dir.y 1 times scalar_load costs 12 in body Still from an target-related perspective, instead of adding cost for STLF penalty, maybe we should just reduce cost of scalar_load if it's from parm_decl because there's probably STLF.