Question about -moutline-atomic under -mcmodel-large
Hi, Compiling below program: #define STREAM_ARRAY_SIZE (1107296256) double a[STREAM_ARRAY_SIZE], b[STREAM_ARRAY_SIZE], c[STREAM_ARRAY_SIZE]; typedef struct { volatile int locked; } spinlock_t; volatile int cnt32=0; volatile long cnt64=0; void atom(){ __atomic_fetch_add(&cnt32, 1,__ATOMIC_RELAXED); } int main() { atom(); a[13] = b [23] = c [17] = (double)cnt32; return 0; } with command line like: $ gcc -O2 a.c -o a.out -march=armv8-a -mcmodel=large /usr/lib/gcc/aarch64-linux-gnu/11/libgcc.a(lse-init.o): in function `init_have_lse_atomics': (.text.startup+0x14): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against `.bss' /usr/lib/gcc/aarch64-linux-gnu/11/libgcc.a(ldadd_4_1.o): in function `__aarch64_ldadd4_relax': (.text+0x4): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `__aarch64_have_lse_atomics' defined in .bss section in /usr/lib/gcc/aarch64-linux-gnu/11/libgcc.a(lse-init.o) collect2: error: ld returned 1 exit status R_AARCH64_ADR_PREL_PG_HI21 is generated against __aarch64_ldadd4_relax in lse-init.c and lse.S. Not sure if this is a break on -mcmodel=large? Or is this as expected? -mcmodel=large Generate code for the large code model. This makes no assumptions about addresses and sizes of sections. Programs can be statically linked only. The -mcmodel=large option is incompatible with -mabi=ilp32, -fpic and -fPIC. What I am not sure is the meaning of "statically linked". On the other hand, the error can be fixed by using adrps to :got: entry for __aarch64_have_lse_atomics in lse.S, but not for lse-init.o in which the symbol is considered local definition in C code. Last question is why do we have __aarch64_have_lse_atomics(and some other symbols) in both libgcc and glibc? #objdump -t /usr/lib64/libc.so.6 | grep "__aarch64_ldadd" 00111460 l F .text 0030 __aarch64_ldadd8_acq 00111370 l F .text 0030 __aarch64_ldadd8_relax 001114c0 l F .text 0030 __aarch64_ldadd8_rel 001113d0 l F .text 0030 __aarch64_ldadd4_acq #objdump -t /usr/lib/gcc/aarch64-linux-gnu/10/libgcc.a | grep "__aarch64_ldadd8" g F .text 0030 .hidden __aarch64_ldadd8_relax g F .text 0030 .hidden __aarch64_ldadd8_acq g F .text 0030 .hidden __aarch64_ldadd8_rel g F .text 0030 .hidden __aarch64_ldadd8_acq_rel Any idea when each version will be used? Thanks, bin
Re: cache optimization through samping hardware event
On Tue, Nov 10, 2020 at 3:04 PM 172060045 <172060...@hdu.edu.cn> wrote: > > Hi, > > Recently, I was interested in GCC AutoFDO optimization, which works by > sampling specific PMU event on production machines and using those profiles > to guide optimization. In this way, information such as cache miss can also > be obtained through sampling, so can we implement feedback-directed cache > optimization according to this idea? IIUC, the original AutoFDO doesn't do icache optimization based on icache-miss perf data, but I think this is possible to do. One point is linker needs to be involved in order to reorder functions, not only GCC itself. Also TLB-miss might be handled too. Thanks, bin > > ARMv8.2 provides SPE features, which can obtain accurate LLC miss, TLB miss, > branch miss and remote access information through perf, it may be helpful to > the idea. > > > Is any one doing relevant work?It would be grateful if someone could offer > any advices, thx!
Re: State of AutoFDO in GCC
On Fri, Apr 23, 2021 at 4:16 AM Martin Liška wrote: > > On 4/22/21 9:58 PM, Eugene Rozenfeld via Gcc wrote: > > GCC documentation for AutoFDO points to create_gcov tool that converts > > perf.data file into gcov format that can be consumed by gcc with > > -fauto-profile (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html, > > https://gcc.gnu.org/wiki/AutoFDO/Tutorial). > > > > I noticed that the source code for create_gcov has been deleted from > > https://github.com/google/autofdo on April 7. I asked about that change in > > that repo and got the following reply: > > > > https://github.com/google/autofdo/pull/107#issuecomment-819108738 > > > > "Actually we didn't use create_gcov and havn't updated create_gcov for > > years, and we also didn't have enough tests to guarantee it works (It was > > gcc-4.8 when we used and verified create_gcov). If you need it, it is > > welcomed to update create_gcov and add it to the respository." > > > > Does this mean that AutoFDO is currently dead in gcc? > > Hello. > > Yes. I know that even basic test cases have been broken for years in the GCC. > It's new to me that create_gcov was removed. > > I tend to send patch to GCC that will remove AutoFDO from GCC. > I known Bin spent some time working on AutoFDO, has he came up to something? Hi Martin, I haven't touched this part for quite some time. I have no objection to removing it from GCC. However, I do have general concern that because of fewer users/developers, it's less likely and harder for new features to land in GCC. I have no idea if this is a real problem or how to fix it. OTOH, maybe removing rotten features, making GCC more(?) concise, and improving existing features that GCC is doing well is the right thing. Thanks, bin > > Martin > > > > > Thanks, > > > > Eugene > > >
Question about builtin_free doesn't read memory
Hi, In function ref_maybe_used_by_call_p_1, there is below code snippet /* The following builtins do not read from memory. */ case BUILT_IN_FREE: ... return false; I am confused because free function does read from (and even write to) memory pointed to by passed argument? I am thinking DSE optimizations like: *ptr = value; free(ptr); *ptr = undef; Does GCC take advantage of UB to eliminate the first store to ptr if free is considered not reading memory? Thanks, bin
Re: Question about builtin_free doesn't read memory
On Sun, Nov 28, 2021 at 4:11 PM Jan Hubicka wrote: > > > Hi, > > In function ref_maybe_used_by_call_p_1, there is below code snippet > > /* The following builtins do not read from memory. */ > > case BUILT_IN_FREE: > > ... > >return false; > > > > I am confused because free function does read from (and even write to) > > memory pointed to by passed argument? > > Free is a black box and makes the memory pointed to disappear without > actually worrying what values it holds. We rely on fact that we do not > see free imlementation and does not worry about the details of its > implementation (whcih probably has sort of linked list before address > ptr points to) > > I am thinking DSE optimizations like: > > *ptr = value; > > free(ptr); > > *ptr = undef; > > Does GCC take advantage of UB to eliminate the first store to ptr if > > free is considered not reading memory? > > The aim here is to optimize out *ptr = value. > > Honza Thanks very much for explaining. Thanks, bin