https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78821
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #13) > (In reply to Uroš Bizjak from comment #12) > > > --cut here-- > > struct s { char a; char b; char c; char d; }; > > > > void foo (struct s *__restrict a, struct s *__restrict b) > > { > > a->a = b->a; > > a->b = b->b; > > a->c = ~b->c; > > a->d = b->d; > > } > > --cut here-- > > > > This testcase can be optimized by inserting xorl mask between load and > > store, as suggested above. > > Also, > > a->a = 0; > a->b = 0; > a->c = b->c; > a->d = 0; > > could use andl mask, and similar > > a->a = 0xff; > a->b = 0xff; > a->c = b->c; > a->d = 0xff; > > could use orl mask. I'm not entirely sure if we can do this last thing, because the original just reads from b->c, if b->d or b->{a,b} could trap while b->c doesn't (such as for the 32-bit load not being aligned). At least for the BIT_NOT_EXPR vs. missing BIT_NOT_EXPR cases with some effort supporting it wouldn't be that difficult, we'd need to replace the optional BIT_NOT_EXPR with BIT_XOR_EXPR computed bitmasks based on what stores have them and what don't (in any of the 3 spots with bit_not_p). Trying to support something else, like a->a = b->a | 123; a->b = b->b & 12; a->c = b->c ^ 14; would be harder, but in theory possible. In any case, none of this needs the bswap infrastructure, while some of the earlier testcases do need.