Issue with __int128 in powerpc64le
Hi, I'm observing a weird behaviour in PowerPC64 Little Endian that does not seem to occur on other architectures supporting __int128. The following code, when compiled with -O1 generates wrong output. -- test.c #include typedef unsigned __int128 uint128_t; #define PRINT(value) \ { union u { uint128_t i; unsigned long long l[2]; } _t = { .i = value }; \ fprintf(stderr, "%s => <%016llx, %016llx>\n", #value, _t.l[1], _t.l[0]); } __attribute__((noinline)) uint128_t get_int(uint128_t value, unsigned int num_bytes) { uint128_t mask = ~(uint128_t)0; mask <<= (uint128_t)(8 * num_bytes); /* assuming 1 byte = 8 bits */ mask = ~mask; value &= mask; return value; } int main(int argc, char* argv[]) { uint128_t x = 0; x = get_int(10, /* num_bytes */ 1); PRINT(x); return 0; } -- end of test.c $ gcc -v Using built-in specs. COLLECT_GCC=/home/Computational/rferrer/gcc/install/bin/gcc COLLECT_LTO_WRAPPER=/home/Computational/rferrer/gcc/install/libexec/gcc/powerpc64le-unknown-linux-gnu/5.0.0/lto-wrapper Target: powerpc64le-unknown-linux-gnu Configured with: ../gcc-src/configure --prefix=/home/Computational/rferrer/gcc/install --enable-languages=c,c++,fortran --with-gmp=/home/Computational/rferrer/gcc/install --with-mpfr=/home/Computational/rferrer/gcc/install --with-mpc=/home/Computational/rferrer/gcc/install --enable-multiarch --disable-multilib Thread model: posix gcc version 5.0.0 20141218 (experimental) (GCC) $ make gcc -O0 -o test.O0 test.c ./test.O0 x => <, 000a> gcc -O1 -o test.O1 test.c ./test.O1 x => It looks like GCC somehow forgets to perform the logical not in the optimized version. I'd fill a PR but I'm not sure if I'm triggering some sort of undefined behaviour in the shift/not/and sequence in 'get_int'. Is this a bug in GCC or in the code above? Kind regards, -- Roger Ferrer Ibáñez
Re: Issue with __int128 in powerpc64le
On Fri, Dec 19, 2014 at 10:13 AM, Roger Ferrer Ibáñez wrote: > Hi, > > I'm observing a weird behaviour in PowerPC64 Little Endian that does > not seem to occur on other architectures supporting __int128. The > following code, when compiled with -O1 generates wrong output. > > -- test.c > #include > > typedef unsigned __int128 uint128_t; > > #define PRINT(value) \ > { union u { uint128_t i; unsigned long long l[2]; } _t = { .i = value }; \ > fprintf(stderr, "%s => <%016llx, %016llx>\n", #value, _t.l[1], > _t.l[0]); } > > __attribute__((noinline)) > uint128_t get_int(uint128_t value, unsigned int num_bytes) > { > uint128_t mask = ~(uint128_t)0; > mask <<= (uint128_t)(8 * num_bytes); /* assuming 1 byte = 8 bits */ > mask = ~mask; > value &= mask; > > return value; > } > > int main(int argc, char* argv[]) > { > uint128_t x = 0; > x = get_int(10, /* num_bytes */ 1); > > PRINT(x); > > return 0; > } > -- end of test.c > > $ gcc -v > Using built-in specs. > COLLECT_GCC=/home/Computational/rferrer/gcc/install/bin/gcc > COLLECT_LTO_WRAPPER=/home/Computational/rferrer/gcc/install/libexec/gcc/powerpc64le-unknown-linux-gnu/5.0.0/lto-wrapper > Target: powerpc64le-unknown-linux-gnu > Configured with: ../gcc-src/configure > --prefix=/home/Computational/rferrer/gcc/install > --enable-languages=c,c++,fortran > --with-gmp=/home/Computational/rferrer/gcc/install > --with-mpfr=/home/Computational/rferrer/gcc/install > --with-mpc=/home/Computational/rferrer/gcc/install --enable-multiarch > --disable-multilib > Thread model: posix > gcc version 5.0.0 20141218 (experimental) (GCC) > > $ make > gcc -O0 -o test.O0 test.c > ./test.O0 > x => <, 000a> > gcc -O1 -o test.O1 test.c > ./test.O1 > x => > > It looks like GCC somehow forgets to perform the logical not in the > optimized version. > > I'd fill a PR but I'm not sure if I'm triggering some sort of > undefined behaviour in the shift/not/and sequence in 'get_int'. > > Is this a bug in GCC or in the code above? Please open a bug in either case. Richard. > Kind regards, > > > -- > Roger Ferrer Ibáñez
Re: Issue with __int128 in powerpc64le
Done, it is PR64358. Kind regards, 2014-12-19 12:21 GMT+01:00 Richard Biener : > On Fri, Dec 19, 2014 at 10:13 AM, Roger Ferrer Ibáñez > wrote: >> Hi, >> >> I'm observing a weird behaviour in PowerPC64 Little Endian that does >> not seem to occur on other architectures supporting __int128. The >> following code, when compiled with -O1 generates wrong output. >> >> -- test.c >> #include >> >> typedef unsigned __int128 uint128_t; >> >> #define PRINT(value) \ >> { union u { uint128_t i; unsigned long long l[2]; } _t = { .i = value }; >> \ >> fprintf(stderr, "%s => <%016llx, %016llx>\n", #value, _t.l[1], >> _t.l[0]); } >> >> __attribute__((noinline)) >> uint128_t get_int(uint128_t value, unsigned int num_bytes) >> { >> uint128_t mask = ~(uint128_t)0; >> mask <<= (uint128_t)(8 * num_bytes); /* assuming 1 byte = 8 bits */ >> mask = ~mask; >> value &= mask; >> >> return value; >> } >> >> int main(int argc, char* argv[]) >> { >> uint128_t x = 0; >> x = get_int(10, /* num_bytes */ 1); >> >> PRINT(x); >> >> return 0; >> } >> -- end of test.c >> >> $ gcc -v >> Using built-in specs. >> COLLECT_GCC=/home/Computational/rferrer/gcc/install/bin/gcc >> COLLECT_LTO_WRAPPER=/home/Computational/rferrer/gcc/install/libexec/gcc/powerpc64le-unknown-linux-gnu/5.0.0/lto-wrapper >> Target: powerpc64le-unknown-linux-gnu >> Configured with: ../gcc-src/configure >> --prefix=/home/Computational/rferrer/gcc/install >> --enable-languages=c,c++,fortran >> --with-gmp=/home/Computational/rferrer/gcc/install >> --with-mpfr=/home/Computational/rferrer/gcc/install >> --with-mpc=/home/Computational/rferrer/gcc/install --enable-multiarch >> --disable-multilib >> Thread model: posix >> gcc version 5.0.0 20141218 (experimental) (GCC) >> >> $ make >> gcc -O0 -o test.O0 test.c >> ./test.O0 >> x => <, 000a> >> gcc -O1 -o test.O1 test.c >> ./test.O1 >> x => >> >> It looks like GCC somehow forgets to perform the logical not in the >> optimized version. >> >> I'd fill a PR but I'm not sure if I'm triggering some sort of >> undefined behaviour in the shift/not/and sequence in 'get_int'. >> >> Is this a bug in GCC or in the code above? > > Please open a bug in either case. > > Richard. > >> Kind regards, >> >> >> -- >> Roger Ferrer Ibáñez -- Roger Ferrer Ibáñez
GCC 4.8.4 Released
The GNU Compiler Collection version 4.8.4 has been released. GCC 4.8.4 is the fourth bug-fix release containing important fixes for regressions and serious bugs in GCC 4.8.3 with over 80 bugs fixed since the previous release. This release is available from the FTP servers listed at: http://www.gnu.org/order/ftp.html Please do not contact me directly regarding questions or comments about this release. Instead, use the resources available from http://gcc.gnu.org. As always, a vast number of people contributed to this GCC release -- far too many to thank them individually!
GCC 4.8.5 Status Report (2014-12-19)
Status == GCC 4.8.4 has been released, the branch is again open for regression bugfixes and documentation fixes. GCC 4.8.5 could be tentatively released in April next year. Quality Data Priority # Change from Last Report --- --- P10 +- 0 P2 123 +- 0 P36 +- 0 --- --- Total 129 +- 0 Previous Report === https://gcc.gnu.org/ml/gcc/2014-12/msg00080.html
Re: Missing git tags for released GCC
On Tue, Nov 25, 2014 at 3:12 AM, Jonathan Wakely wrote: > On 16 November 2014 at 15:51, H.J. Lu wrote: >> Hi, >> >> Git tags are missing for GCC 4.9.1, 4.9.2, 4.8.3 and 4.7.4. > > I can't create the tags but these are the release commits: > > git tag gcc-4_9_2-release c1283af40b65f1ad862cf5b27e2d9ed10b2076b6 > git tag gcc-4_9_1-release c6fa1b412663593960e6240eb66d82fa41a1fa0b > git tag gcc-4_8_3-release 6bbf0dec66c0e719b06cd2fe67559fda6df09000 > git tag gcc-4_7_4-release ae10eb82fe34c18640ad65c2ab94ffc53f349315 I added them together with gcc-4_8_4-release. Thanks. -- H.J.
GCC document link broken and some document questions
Hi, I found that links for the "GNAT User's Guide" are broken for version 4.8.4 and 4.9.2. The links are https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gnat_ugn_unw/ and https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gnat_ugn_unw/ There are links to the guide at https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gnat_ugn/ and https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gnat_ugn/ But the documents there have some texinfo left over (HTML only, PDF seems OK) And all the rest of the links to the PDF/PostSCript or HTML tarball have the same problem, need to remove "_unw" from the filename Also for GCC 4.9.2: https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gnat_ugn/Switches-for-gcc.html#Switches-for-gcc It mention the flag "-fdump-xref", but it does not work: cc1: error: unrecognized command line option ‘-fdump-xref’ Seems a mismatch in document. For the generated PDF and PS files, I am not sure why it puts the table of content at the end of the file. It defects its purpose. Thanks -- Qun-Ying
RE: Instruction scheduler with respect to prefetch instructions.
-Original Message- From: paul_kon...@dell.com [mailto:paul_kon...@dell.com] Sent: Saturday, December 13, 2014 9:46 PM To: Ajit Kumar Agarwal Cc: vmaka...@redhat.com; l...@redhat.com; richard.guent...@gmail.com; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Instruction scheduler with respect to prefetch instructions. > On Dec 13, 2014, at 5:22 AM, Ajit Kumar Agarwal > wrote: > > Hello All: > > Since the prefetch instruction have no direct consumers in the code > stream, they provide considerable freedom to the Instruction scheduler. They > are typically assigned lower priorities than most of the instructions in the > code stream. > This tends to cause all the prefetch instructions to be placed > together in the final schedule. This causes the performance Degradations by > placing them in clumps rather than evenly spreading the prefetch instructions. > > The evenly spreading the prefetch instruction gives better speed up > ratios as compared to be placing in clumps for dirty Misses. >>I can believe that’s true for some processors; is it true for all of them? I >>have the impression that some MIPS processors don’t mind clumped prefetches, so long as you don’t exceed the limit on total number of concurrently >>pending memory accesses. I think it's okay to have clumped prefetches for architectures that are decided based on prefetch distance as long it doesn't exceed the concurrent pending memory access. The clumped prefetches that are generated by the scheduler as there are no direct consumers in the code stream sometimes exceed the concurrent pending memory access if the special mechanism is not done by the scheduler like some information to be passed from the generation of prefetch algorithm phase to the scheduler. Due to the freedom provided to instruction scheduler for not having direct consumers in the code stream, clumps the prefetch instructions at the end of the basic blocks which will invalidates the actual place where the prefetch instruction is generated based on the prefetch distance. The prefetch algorithms based on prefetch distance takes care of the cases where the clumped prefetches degraded the performance due to dirty misses. My question is there any special way of handling the prefetch instruction with respect to the instruction scheduler to overcome the above. Thanks & Regards Ajit paul