https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84719
H.J. Lu <hjl.tools at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |NEW --- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> --- I compared __builtin_memcpy one size at a time. Here are results in cycles: clang 1 bytes: 17193410146 gcc 1 bytes: 15440244966 clang 2 bytes: 8997535880 gcc 2 bytes: 8147449530 clang 3 bytes: 6002276628 gcc 3 bytes: 5430387704 clang 4 bytes: 4497121282 gcc 4 bytes: 4069604454 clang 5 bytes: 3644879742 gcc 5 bytes: 3258094970 clang 6 bytes: 3045612708 gcc 6 bytes: 2728410608 clang 7 bytes: 2574110178 gcc 7 bytes: 2330365680 clang 8 bytes: 969894432 gcc 8 bytes: 6436950208 GCC is faster except for 8 byte size.