[Bug c++/91043] New: GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 Bug ID: 91043 Summary: GCC produces unaligned vmovdqa vector data access Product: gcc Version: 8.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hhaim at cisco dot com Target Milestone: --- **The project**: https://github.com/cisco-system-traffic-generator/trex-core **how to compile**: https://github.com/cisco-system-traffic-generator/trex-core/wiki#how-to-build-trex The commit with a workaround: https://github.com/cisco-system-traffic-generator/trex-core/commit/39e7f535f96f0f5b4406db667be7bc775ce3e515 **The issue**: gcc 7/8 generate vector instruction on a variables that was allocated by the gcc and it seems as not aligned the struct is defined like that static CGlobalTRex g_trex; It includes CLatencyManager m_mg; which includes CLatencyManagerPerPort m_ports[TREX_MAX_PORTS]; class CLatencyManagerPerPort { public: CCPortLatency m_port; << crash is on the function reset of this object CPortLatencyHWBase * m_io; uint32_t m_flag; }; **Workaround**: Adding no-sse to this function solves the issue __attribute__((noinline,target("no-sse2"))) void CCPortLatency::reset(){ void CCPortLatency::reset(){ warning: bad breakpoint number at or near '0x585763' (gdb) disassemble 0x585763 Dump of assembler code for function CCPortLatency::Create(unsigned char, unsigned short, unsigned short, unsigned short, CCPortLatency*, CLatencyPktMode*, CNatRxManager*): 0x005856a0 <+0>: push %rbp 0x005856a1 <+1>: mov%rsp,%rbp 0x005856a4 <+4>: push %r12 0x005856a6 <+6>: push %r10 0x005856a8 <+8>: lea0x10(%rbp),%r10 0x005856ac <+12>:push %rbx 0x005856ad <+13>:mov%rdi,%rbx 0x005856b0 <+16>:sub$0x8,%rsp 0x005856b4 <+20>:mov(%r10),%rax 0x005856b7 <+23>:movb $0x0,0x3f(%rbx) 0x005856bb <+27>:mov0x8(%r10),%rdi 0x005856bf <+31>:mov%rax,(%rbx) 0x005856c2 <+34>:test %rax,%rax 0x005856c5 <+37>:je 0x585795 0x005856cb <+43>:mov%esi,%eax 0x005856cd <+45>:mov%sil,0x31(%rbx) 0x005856d1 <+49>:movzbl %sil,%esi 0x005856d5 <+53>:not%eax 0x005856d7 <+55>:mov%rdi,0x8(%rbx) 0x005856db <+59>:and$0x1,%eax 0x005856de <+62>:movb $0x1,0x3e(%rbx) 0x005856e2 <+66>:movl $0x12345678,0x28(%rbx) 0x005856e9 <+73>:movl $0x1,0x38(%rbx) 0x005856f0 <+80>:mov%cx,0x34(%rbx) 0x005856f4 <+84>:mov%dx,0x32(%rbx) 0x005856f8 <+88>:mov%r8w,0x36(%rbx) 0x005856fd <+93>:mov%r9,0x10(%rbx) 0x00585701 <+97>:mov%al,0x19(%rbx) 0x00585704 <+100>: mov%al,0x18(%rbx) 0x00585707 <+103>: movq $0x0,0x1c(%rbx) 0x0058570f <+111>: cmpb $0x0,0xc2e938(%rsi) 0x00585716 <+118>: je 0x585721 0x00585718 <+120>: movb $0x1,0x24(%rbx) 0x0058571c <+124>: movb $0x1,0x24(%r9) 0x00585721 <+129>: lea0x100(%rbx),%r12 ---Type to continue, or q to quit--- 0x00585728 <+136>: mov%r12,%rdi 0x0058572b <+139>: callq 0x590320 0x00585730 <+144>: mov0x6a8449(%rip),%rdi# 0xc2db80 0x00585737 <+151>: callq 0x4c5be0 0x0058573c <+156>: mov0x28(%rbx),%eax 0x0058573f <+159>: mov%r12,%rdi 0x00585742 <+162>: vpxor %xmm0,%xmm0,%xmm0 0x00585746 <+166>: movb $0x0,0x30(%rbx) 0x0058574a <+170>: movq $0x0,0xc0(%rbx) 0x00585755 <+181>: movq $0x0,0xc8(%rbx) 0x00585760 <+192>: mov%eax,0x2c(%rbx) => 0x00585763 <+195>: vmovdqa %ymm0,0x40(%rbx) << crash here 0x00585768 <+200>: vmovdqa %ymm0,0x60(%rbx) 0x0058576d <+205>: vmovdqa %ymm0,0x80(%rbx) 0x00585775 <+213>: vmovdqa %ymm0,0xa0(%rbx)
[Bug c++/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 Hanoch Haim changed: What|Removed |Added Target||x86 Host||x86 --- Comment #1 from Hanoch Haim --- /usr/local/gcc-7.4/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc-7.4/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc-7.4/libexec/gcc/x86_64-pc-linux-gnu/7.4.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ./configure --disable-multilib --enable-languages=c,c++ --prefix=/usr/local/gcc-7.4 Thread model: posix gcc version 7.4.0 (GCC) [csi
[Bug c++/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #2 from Hanoch Haim --- /usr/local/gcc-8.3/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc-8.3/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc-8.3/libexec/gcc/x86_64-pc-linux-gnu/8.3.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ./configure --disable-multilib --enable-languages=c,c++ --prefix=/usr/local/gcc-8.3 Thread model: posix gcc version 8.3.0 (GCC) [csi-kiwi-03]>
[Bug c++/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #3 from Hanoch Haim --- With Ubuntu gcc7.4 package, there is no bug. I've built the gcc from source and it has an issue. There are a diffrent configuration values
[Bug c++/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #5 from Hanoch Haim --- It was fast. The way to build are here https://github.com/cisco-system-traffic-generator/trex-core/wiki#how-to-build-trex ``` $ git clone g...@github.com:cisco-system-traffic-generator/trex-core.git $cd linux_dpdk $./b configure $./b build ``` with gcc 7.x/8.x only this function are with wrong optimization if anything else is needed I would provide it
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #7 from Hanoch Haim --- Created attachment 46541 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46541&action=edit stateful_rx_core.ii compress ii
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #8 from Hanoch Haim --- Created attachment 46542 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46542&action=edit stateful_rx_core.ss
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #9 from Hanoch Haim --- Attached. I hope this is what you are looking for.
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #11 from Hanoch Haim --- thanks for the quick answer. The parent object is static (bss) and wasn't dynmicly allocated using new/malloc. gcc set the address of the parent object and the childs. Is there a way to solve it without removing the alignment?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 Hanoch Haim changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|INVALID |--- --- Comment #12 from Hanoch Haim --- Removing __rte_cache_aligned does not solve the issue diff --git a/src/time_histogram.h b/src/time_histogram.h index 07e66b49..26a37248 100755 --- a/src/time_histogram.h +++ b/src/time_histogram.h @@ -133,10 +133,10 @@ private: uint32_t m_win_cnt; uint32_t m_hot_max; dsec_t m_max_ar[HISTOGRAM_QUEUE_SIZE]; // Array of maximum latencies for previous periods -uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] __rte_cache_aligned ; +uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] ; // Hdr histogram instance hdr_histogram *m_hdrh; -}; +} __rte_cache_aligned;
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #13 from Hanoch Haim --- One more thing, The parent object is defined with 64Byte alignment class CGlobalTRex { .. } __rte_cache_aligned; static CGlobalTRex trex;
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #16 from Hanoch Haim --- The global/parent object CGlobalTRex is aligned (64B) as expected: (gdb) p &g_trex $1 = (CGlobalTRex *) 0xc365c0 Could you explain why it is a problem to define the internal objects with the aligment like the parent (64B)?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #19 from Hanoch Haim --- After some investigation, I think it is not a gcc issue, please verify. One of the internal object does not include a 64B alignment. #define __rte_cache_aligned __attribute__((__aligned__(64))); class CTimeHistogram { } __rte_cache_aligned; class CCPortLatency { public: CTimeHistogram m_hist; } __rte_cache_aligned; <<= without this, it is not aligned while the code generation assumed it is aligned ! class Root { CCPortLatency port; } __rte_cache_aligned; Is it valid? why the code generation assumed the CCPortLatency is aligned because one of its internal is aligned?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #20 from Hanoch Haim --- One more thing. I would expect that the issue would be in CTimeHistogram functions (defined as aligned) but the code generation issue was in the parent object ( CCPortLatency) Why the compiler assumed that if one of the internal objects is defined as aligned the parent is aligned too?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #22 from Hanoch Haim --- "Of course it does, because without aligning the container you cannot have aligned members. Maximum alignment always propagates outwards." Sorry, your answer is still not clear, so let give a short example In this case there is a discrepancy betwean two gcc modules 1. The module that generates the code think that it is aligned (CCPortLatency) 2. However the linker puts it in a none aligned location " class CTimeHistogram { } __rte_cache_aligned; class CCPortLatency { public: CTimeHistogram m_hist; }; class Root { CCPortLatency port; } __rte_cache_aligned; static Root root; " In this case can I expect root.port to be aligned because its child (m_hist) was defined as aligned and it propogate? Or should I explicitly ask both to be aligned?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 Hanoch Haim changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |INVALID --- Comment #25 from Hanoch Haim --- Hi Richard, You were right all along. I've looked into the wrong place! I understand it now and it is not a gcc issue. gcc7/8 are just better than gcc 6 with code generation. 1. The alignment is contagious, gcc marks all the parent objects of such an object as aligned. 2. With static allocated object there is no issue. 3. The issue in my case was a dynamic allocation of a different object that includes the aligned object. The object(parent) is assumed to be aligned, but was allocated dynamically (not aligned) Thank you for the explanation.