Question about -moutline-atomic under -mcmodel-large

2020-09-17 Thread Bin.Cheng via Gcc
Hi,
Compiling below program:

#define STREAM_ARRAY_SIZE (1107296256)
double a[STREAM_ARRAY_SIZE],
   b[STREAM_ARRAY_SIZE],
   c[STREAM_ARRAY_SIZE];

typedef struct {
  volatile int locked;
} spinlock_t;

volatile int cnt32=0;
volatile long cnt64=0;

void atom(){
  __atomic_fetch_add(&cnt32, 1,__ATOMIC_RELAXED);
}

int main()
{
  atom();
  a[13] = b [23] = c [17] = (double)cnt32;
  return 0;
}

with command line like:
$ gcc -O2 a.c -o a.out -march=armv8-a -mcmodel=large
/usr/lib/gcc/aarch64-linux-gnu/11/libgcc.a(lse-init.o): in function
`init_have_lse_atomics':
(.text.startup+0x14): relocation truncated to fit:
R_AARCH64_ADR_PREL_PG_HI21 against `.bss'
/usr/lib/gcc/aarch64-linux-gnu/11/libgcc.a(ldadd_4_1.o): in function
`__aarch64_ldadd4_relax':
(.text+0x4): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
against symbol `__aarch64_have_lse_atomics' defined in .bss section in
/usr/lib/gcc/aarch64-linux-gnu/11/libgcc.a(lse-init.o)
collect2: error: ld returned 1 exit status

R_AARCH64_ADR_PREL_PG_HI21 is generated against __aarch64_ldadd4_relax
in lse-init.c and lse.S.  Not sure if this is a break on
-mcmodel=large? Or is this as expected?
-mcmodel=large
Generate code for the large code model. This makes no assumptions
about addresses and sizes of sections. Programs can be statically
linked only. The -mcmodel=large option is incompatible with
-mabi=ilp32, -fpic and -fPIC.

What I am not sure is the meaning of "statically linked".

On the other hand, the error can be fixed by using adrps to :got:
entry for __aarch64_have_lse_atomics in lse.S, but not for lse-init.o
in which the symbol is considered local definition in C code.

Last question is why do we have __aarch64_have_lse_atomics(and some
other symbols) in both libgcc and glibc?

#objdump -t /usr/lib64/libc.so.6 | grep "__aarch64_ldadd"

00111460 l F .text  0030
__aarch64_ldadd8_acq

00111370 l F .text  0030
__aarch64_ldadd8_relax

001114c0 l F .text  0030
__aarch64_ldadd8_rel

001113d0 l F .text  0030
__aarch64_ldadd4_acq


#objdump -t /usr/lib/gcc/aarch64-linux-gnu/10/libgcc.a | grep "__aarch64_ldadd8"

 g F .text  0030 .hidden __aarch64_ldadd8_relax

 g F .text  0030 .hidden __aarch64_ldadd8_acq

 g F .text  0030 .hidden __aarch64_ldadd8_rel

 g F .text  0030 .hidden
__aarch64_ldadd8_acq_rel

Any idea when each version will be used?

Thanks,
bin


Re: cache optimization through samping hardware event

2020-11-18 Thread Bin.Cheng via Gcc
On Tue, Nov 10, 2020 at 3:04 PM 172060045 <172060...@hdu.edu.cn> wrote:
>
> Hi,
>
> Recently, I was interested in GCC AutoFDO optimization, which works by 
> sampling specific PMU event on production machines and using those profiles 
> to guide optimization. In this way, information such as cache miss can also 
> be obtained through sampling, so can we implement feedback-directed cache 
> optimization according to this idea?
IIUC, the original AutoFDO doesn't do icache optimization based on
icache-miss perf data, but I think this is possible to do.  One point
is linker needs to be involved in order to reorder functions, not only
GCC itself.  Also TLB-miss might be handled too.

Thanks,
bin
>
> ARMv8.2 provides SPE features, which can obtain accurate LLC miss, TLB miss, 
> branch miss and remote access information through perf, it may be helpful to 
> the idea.
>
>
> Is any one doing relevant work?It would be grateful if someone could offer 
> any advices, thx!


Re: State of AutoFDO in GCC

2021-04-22 Thread Bin.Cheng via Gcc
On Fri, Apr 23, 2021 at 4:16 AM Martin Liška  wrote:
>
> On 4/22/21 9:58 PM, Eugene Rozenfeld via Gcc wrote:
> > GCC documentation for AutoFDO points to create_gcov tool that converts 
> > perf.data file into gcov format that can be consumed by gcc with 
> > -fauto-profile (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html, 
> > https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
> >
> > I noticed that the source code for create_gcov has been deleted from 
> > https://github.com/google/autofdo on April 7. I asked about that change in 
> > that repo and got the following reply:
> >
> > https://github.com/google/autofdo/pull/107#issuecomment-819108738
> >
> > "Actually we didn't use create_gcov and havn't updated create_gcov for 
> > years, and we also didn't have enough tests to guarantee it works (It was 
> > gcc-4.8 when we used and verified create_gcov). If you need it, it is 
> > welcomed to update create_gcov and add it to the respository."
> >
> > Does this mean that AutoFDO is currently dead in gcc?
>
> Hello.
>
> Yes. I know that even basic test cases have been broken for years in the GCC.
> It's new to me that create_gcov was removed.
>
> I tend to send patch to GCC that will remove AutoFDO from GCC.
> I known Bin spent some time working on AutoFDO, has he came up to something?
Hi Martin,
I haven't touched this part for quite some time.  I have no objection
to removing it from GCC.  However, I do have general concern that
because of fewer users/developers, it's less likely and harder for new
features to land in GCC.  I have no idea if this is a real problem or
how to fix it.  OTOH, maybe removing rotten features, making GCC
more(?) concise, and improving existing features that GCC is doing
well is the right thing.

Thanks,
bin
>
> Martin
>
> >
> > Thanks,
> >
> > Eugene
> >
>


Question about builtin_free doesn't read memory

2021-11-27 Thread Bin.Cheng via Gcc
Hi,
In function ref_maybe_used_by_call_p_1, there is below code snippet
 /* The following builtins do not read from memory.  */
 case BUILT_IN_FREE:
 ...
   return false;

I am confused because free function does read from (and even write to)
memory pointed to by passed argument?
I am thinking DSE optimizations like:
  *ptr = value;
  free(ptr);
  *ptr = undef;
Does GCC take advantage of UB to eliminate the first store to ptr if
free is considered not reading memory?

Thanks,
bin


Re: Question about builtin_free doesn't read memory

2021-11-28 Thread Bin.Cheng via Gcc
On Sun, Nov 28, 2021 at 4:11 PM Jan Hubicka  wrote:
>
> > Hi,
> > In function ref_maybe_used_by_call_p_1, there is below code snippet
> >  /* The following builtins do not read from memory.  */
> >  case BUILT_IN_FREE:
> >  ...
> >return false;
> >
> > I am confused because free function does read from (and even write to)
> > memory pointed to by passed argument?
>
> Free is a black box and makes the memory pointed to disappear without
> actually worrying what values it holds. We rely on fact that we do not
> see free imlementation and does not worry about the details of its
> implementation (whcih probably has sort of linked list before address
> ptr points to)
> > I am thinking DSE optimizations like:
> >   *ptr = value;
> >   free(ptr);
> >   *ptr = undef;
> > Does GCC take advantage of UB to eliminate the first store to ptr if
> > free is considered not reading memory?
>
> The aim here is to optimize out *ptr = value.
>
> Honza
Thanks very much for explaining.

Thanks,
bin