http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51509

--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
2011-12-13 09:20:54 UTC ---
FWIW,

  uint8x8x4_t x;
  uint8x8x2_t y;

  x = vld4_dup_u8(src);

  y.val[0] = x.val[1];
  y.val[1] = x.val[2];

  vst2_lane_u8(dst, y, 0);

does give the expected output.  I.e. the remaining inefficiency
from comment #1 is in the uninitialised parts of y.

Richard

Reply via email to