[Bug c/92276] New: Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276 Bug ID: 92276 Summary: Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))' Product: gcc Version: 8.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: Lijian.Zhang at arm dot com Target Milestone: --- Dear experts, I'm trying to use '__attribute__ ((optimize("unroll-loops")))' to apply automatic loop unrolling to a static-line function with __attribute__ ((__always_inline__)). But the loop is not unrolled from the assembly output. The compiling command is 'gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72 unroll.c -S'. However, if I apply -funroll-loops option to the compiling process, i.e., compile with command 'gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72 -funroll-loops unroll.c -S'. I can see loop is unrolled from the assembly output. And if I compile without -funroll-loops option, and if '__attribute__ ((__always_inline__))' is commented out, '__attribute__ ((__always_inline__))' is also taking effect. So it seems those two attribute parameters are not working together, which seems to be unreasonable to me. I want some functions to be inlined and also the loops inside those functions unrolled automatically, as the loop iteration number is fixed. lijian@net-arm-d05-08:~/C/unroll$ gcc --version gcc (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. lijian@net-arm-d05-08:~/C/unroll$ cat /etc/os-release NAME="Ubuntu" VERSION="18.04.1 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.1 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/"; SUPPORT_URL="https://help.ubuntu.com/"; BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"; PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"; VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic lijian@net-arm-d05-08:~/C/unroll$ gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72 unroll.c -S lijian@net-arm-d05-08:~/C/unroll$ lscpu Architecture:aarch64 Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 32 Socket(s): 2 NUMA node(s):4 Vendor ID: ARM Model: 2 Model name: Cortex-A72 Stepping:r0p2 BogoMIPS:100.00 L1d cache: 32K L1i cache: 48K L2 cache:1024K L3 cache:16384K NUMA node0 CPU(s): 0-15 NUMA node1 CPU(s): 16-31 NUMA node2 CPU(s): 32-47 NUMA node3 CPU(s): 48-63 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid #include #include #include static inline __attribute__ ((__always_inline__)) __attribute__ ((optimize("unroll-loops"))) unsigned int clib_crc32c (unsigned int v, unsigned char * s, int len) { for (; len >= 8; len -= 8, s += 8) v = __crc32cd (v, *((unsigned long *) s)); for (; len >= 4; len -= 4, s += 4) v = __crc32cw (v, *((unsigned int *) s)); for (; len >= 2; len -= 2, s += 2) v = __crc32ch (v, *((unsigned short *) s)); for (; len >= 1; len -= 1, s += 1) v = __crc32cb (v, *((unsigned char *) s)); return v; } int main (int argc, char *argv[]) { unsigned char s[40] = {argc, 0, argc, 0}; unsigned char ss[32] = {argc, 0, argc, 0, argc, 0}; unsigned int v = 0xbeefdead, vv = 0xdeadbeef; int len = strtol (argv[1], NULL, 10); for (int i = 0; i < len; i++) { v = clib_crc32c (v, s, 40); vv = clib_crc32c (vv, ss, 32); } printf ("%8X\n", v); printf ("%8X\n", vv); return 0; }
[Bug c/92276] Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276 --- Comment #2 from Lijian Zhang --- (In reply to Richard Biener from comment #1) > Instead of trying to force the compiler to unroll with -funroll-loops you can > use #pragma GCC unroll N on individual loops instead. > > The attributes should not conflict in any way. Sorry, I made a mistake that in my case '__attribute__ ((optimize("unroll-loops")))' should be used for the caller, not the callee. #pragma GCC optimize ("unroll-loops") is also working. Thanks for your suggestion! / #include #include #include static inline __attribute__ ((__always_inline__)) unsigned int clib_crc32c (unsigned int v, unsigned char * s, int len) { for (; len >= 8; len -= 8, s += 8) v = __crc32cd (v, *((unsigned long *) s)); for (; len >= 4; len -= 4, s += 4) v = __crc32cw (v, *((unsigned int *) s)); for (; len >= 2; len -= 2, s += 2) v = __crc32ch (v, *((unsigned short *) s)); for (; len >= 1; len -= 1, s += 1) v = __crc32cb (v, *((unsigned char *) s)); return v; } __attribute__ ((optimize("unroll-loops"))) int main (int argc, char *argv[]) { unsigned char s[40] = {argc, 0, argc, 0}; unsigned char ss[32] = {argc, 0, argc, 0, argc, 0}; unsigned int v = 0xbeefdead, vv = 0xdeadbeef; int len = strtol (argv[1], NULL, 10); v = clib_crc32c (v, s, 40); vv = clib_crc32c (vv, ss, 32); printf ("%8X\n", v); printf ("%8X\n", vv); return 0; }
[Bug c/92276] Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276 Lijian Zhang changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Lijian Zhang --- In my case, the callee is defined with '__attribute__ ((__always_inline__))', and I want to apply automatic loop unrolling. The '__attribute__ ((optimize("unroll-loops")))' has to be added for the caller, not the callee.
[Bug c/92276] Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276 --- Comment #4 from Lijian Zhang --- (In reply to Richard Biener from comment #1) > Instead of trying to force the compiler to unroll with -funroll-loops you can > use #pragma GCC unroll N on individual loops instead. > > The attributes should not conflict in any way. Hi Richard, Does it make sense to you that '__attribute__ ((optimize("unroll-loops")))' has to be moved ahead of the caller, if the callee is defined with '__attribute__ ((__always_inline__))'? Thanks.
[Bug c/87358] New: ICE when -mtune=thunderx2t99 applied
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87358 Bug ID: 87358 Summary: ICE when -mtune=thunderx2t99 applied Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: Lijian.Zhang at arm dot com Target Milestone: --- lijian@armada8040-1:~/ICE.issue$ gcc --version gcc (Ubuntu/Linaro 7.3.0-16ubuntu3) 7.3.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. lijian@armada8040-1:~/ICE.issue$ cat /etc/os-release NAME="Ubuntu" VERSION="18.04.1 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.1 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/"; SUPPORT_URL="https://help.ubuntu.com/"; BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"; PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"; VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic lijian@armada8040-1:~/ICE.issue$ lscpu Architecture:aarch64 Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 2 Vendor ID: ARM Model: 1 Model name: Cortex-A72 Stepping:r0p1 CPU max MHz: 2000. CPU min MHz: 100. BogoMIPS:50.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 lijian@armada8040-1:~/ICE.issue$ gcc -c l2_learn.i -O2 -march=armv8.1-a+crc+crypto -mtune=thunderx2t99 /home/lijian/tasks/dualQuad/origin/src/vnet/l2/l2_learn.c: In function ‘l2learn_node_fn_thunderx2t99’: /home/lijian/tasks/dualQuad/origin/src/vnet/l2/l2_learn.c:430:1: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. lijian@armada8040-1:~/ICE.issue$ gcc -c l2_learn.i -O2 -march=armv8.1-a+crc+crypto lijian@armada8040-1:~/ICE.issue$ lijian@armada8040-1:~/ICE.issue$
[Bug c/87358] ICE when -mtune=thunderx2t99 applied
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87358 Lijian Zhang changed: What|Removed |Added CC||Lijian.Zhang at arm dot com --- Comment #1 from Lijian Zhang --- Created attachment 44723 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44723&action=edit pre-process file
[Bug target/87358] [8/9 Regression] ICE when -mtune=thunderx2t99 applied
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87358 --- Comment #9 from Lijian Zhang --- Hi Andrew, I only reproduced this issue with gcc-7.3.0, but not able to reproduce the failure with gcc-8.2.0/gcc-8.1.0 But from your description, gcc-8.2.0 still have this issue, and this issue is target to be fixed in gcc-8.3.0? Thanks.