https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552
--- Comment #3 from Katsunori Kumatani <katsunori.kumatani at gmail dot com> --- (In reply to Uroš Bizjak from comment #1) > (In reply to Katsunori Kumatani from comment #0) > > > Things to note: > > > > This happens on GCC 6 and up to 7 only, GCC 5.4 generates correct output. > > This happens once you turn on the -fschedule-insns option. So it's a bug > > there. > > If you remove the __restrict__ from the pointer in foo's parameter, the > > problem is gone. > > Using "asm volatile" instead of "asm" in memset_test generates correct code. > > Using "memory" clobber in that asm also generates correct code. > > > > > > Most of these workarounds are not valid in this context because they DISABLE > > the optimizations, so it's like preventing the problem from popping up > > instead of solving it. "memory" clobber is obviously the worst solution by > > far as it will kill any cached memory in registers. "asm volatile" is > > probably the least bad workaround, __restrict__ is definitely useful for > > same types the compiler can't otherwise know they won't alias. > > "memory" clobber is the correct solution here, as it represents an implied > compiler barrier. Without it, the compiler is free to schedule loads and > stores around the "rep stosb". > > IOW, it is the "cached memory in registers" instructions that can be > scheduled around the "rep stosb" without "memory" clobber. I don't think it's the correct solution at all as it implies arbitrary memory writes which the compiler can't know about. In my case, I explicitly told GCC what memory I was going to clobber (with the "=m"(*m) output operand). In fact, without telling it I was going to clobber it, GCC would remove the asm completely because no other output is used at all! This dereference struct trick is also explained in the manual for Extended Asm, too. Because the pointer is __restrict__ this means that it cannot alias anything else, thus this is the optimal solution since GCC will only uncache/reload the values that are obtained by dereferencing that pointer only (or a derived pointer). But in this case, it doesn't seem to -- it will load values obtained from that pointer before the "output operand" which is wrong. Obviously the "dummy" asm in function foo isn't valid since it modifies an input, it was just for demonstration purposes. Hope that makes sense. Richard, thank you for confirming it.