Re: Some questions about the gcc __sync intrinsics

2016-03-10 Thread Edward Nevill
> /usr/local/linare-gcc-5.2/bin/gcc -S -O3 -march=armv8-a+lse test.c > > add_int: > ldaddal w0, w0, [x1] > add w2, w0, w0 > mov w0, w2 > ret Am I going mad, or does this just return the contents of the memory location * 2. ldaddal w0, w0, [x1] Returns the

Re: Some questions about the gcc __sync intrinsics

2016-03-10 Thread Edward Nevill
Hi Yvan, On 9 March 2016 at 13:22, Yvan Roux wrote: > Hi Ed, > > On 9 March 2016 at 14:02, Edward Nevill wrote: >> Hi, >> >> >> Why the extra (unnecessary?) memory barrier? > > This is because Linaro gcc-5-branch is in sync with FSF gcc-5-branch >

Some questions about the gcc __sync intrinsics

2016-03-09 Thread Edward Nevill
Hi, I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC 5.2-2015.11-1) and have noticed a difference with the __sync intrinsics. Here is the simple test case --- cut here --- int add_int(int add_value, int *dest) { return __sync_add_and_fetch(dest, add_value); } --- cut here

Re: gcc 5.2 code quality

2016-03-02 Thread Edward Nevill
On Wed, 2016-03-02 at 14:25 +, Renato Golin wrote: > On 2 March 2016 at 11:35, Edward Nevill wrote: > > cmp x2, 8 <<< (1) > > (1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as > > a 32 bit unsigned. > &g

gcc 5.2 code quality

2016-03-02 Thread Edward Nevill
Hi, I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to have improved significantly. For example, it now seems much better at using ldp/stp and it seems to has stopped gratuitous use of the SIMD registers. However, I still have a few whinges:-) See attached copy.c / co

-O2 faster than -O3

2015-04-02 Thread Edward Nevill
Hi, I did some tests on the following function --- CUT HERE --- int fibo(int n) { if (n < 2) return 1; return (fibo(n-2) + fibo(n-1)); } --- CUT HERE --- and I discovered that it is faster -O2 than -O3. This is with gcc 4.9.2. Looking at the disassembly I see it is using FP registers to hol

Does gcc know about ldp

2015-03-05 Thread Edward Nevill
Hi, I have being trying to persuade gcc to generate the ldp instruction without success. I have tried many combinations, below is an example. --- cut here --- #define LDP(x,y,p) { \ struct vec { long x, y; } data; \ data = *(struct vec *)p; p += 2; \ x = data.x; y = data.

Problems porting Boost.Context to aarch64 gcc

2015-02-24 Thread Edward Nevill
Hi, I am trying to port the Boost.Context library (from www.boost.org) to aarch64 gcc and have come across a gnarly problem. Boost.Context essentially does co-routine style context switching. It has a structure f_context which it uses to save and restore contexts. The structure f_context conta