https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100865
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> --- A small benchmark: https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast shows that broadcast is a little bit faster on Intel Core i7-8559U: [hjl@gnu-cfl-2 microbenchmark]$ make gcc -g -I. -O2 -c -o test.o test.c gcc -g -c -o memory.o memory.S gcc -g -c -o broadcast.o broadcast.S gcc -o test test.o memory.o broadcast.o ./test memory : 99333 broadcast: 97208 [hjl@gnu-cfl-2 microbenchmark]$