https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351

--- Comment #2 from Agner Fog <agner at agner dot org> ---
AVX512 allows all _memory_ source operands to broadcast from a scalar on almost
all vector instructions for 128-, 256- and 512-bit vectors with 32- or 64-bit
elements. See section 4.6.1 in "IntelĀ® Architecture Instruction Set Extensions
Programming Reference"
https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf

This feature comes for free; there is no performance cost to broadcasting other
than making the instruction prefix longer for vector sizes smaller than 512.

This feature has no explicit support in intrinsic functions, so the only way to
utilize this excellent optimization opportunity without using assembly is to
contract broadcast intrinsics with subsequent instructions.

An obvious application is to store scalar constants as 32 or 64 bit constants
rather than as full vectors.

Often, it is not known to the programmer whether a variable is stored in memory
or in a register. If a scalar variable is already in a register then it is
better to use a broadcast instruction. If the scalar variable is in memory then
it is better to contract the broadcast into the vector instruction that uses
it, even if the broadcasted value is used multiple times.

Reply via email to