https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351
--- Comment #2 from Agner Fog <agner at agner dot org> --- AVX512 allows all _memory_ source operands to broadcast from a scalar on almost all vector instructions for 128-, 256- and 512-bit vectors with 32- or 64-bit elements. See section 4.6.1 in "IntelĀ® Architecture Instruction Set Extensions Programming Reference" https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf This feature comes for free; there is no performance cost to broadcasting other than making the instruction prefix longer for vector sizes smaller than 512. This feature has no explicit support in intrinsic functions, so the only way to utilize this excellent optimization opportunity without using assembly is to contract broadcast intrinsics with subsequent instructions. An obvious application is to store scalar constants as 32 or 64 bit constants rather than as full vectors. Often, it is not known to the programmer whether a variable is stored in memory or in a register. If a scalar variable is already in a register then it is better to use a broadcast instruction. If the scalar variable is in memory then it is better to contract the broadcast into the vector instruction that uses it, even if the broadcasted value is used multiple times.