clearing many bytes variables (could use one machine instruction)?
Hello All, With a recently compiled gcc-trunk on x86-64/linux, I am compiling the folllowing example: # /* file testmanychar.c */ extern void g (int, char *, char *, char *); void f (void) { char x0, x1, x2, x3, x4, x5, x6, x7; /* assuming x0 is word aligned on a x86_64, and variables are bytes in memory, we could clear all the variables in one machine instruction */ x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = (char) 0; g (10, &x0, &x1, &x2); g (20, &x2, &x3, &x4); g (30, &x4, &x5, &x6); g (40, &x6, &x7, &x0); } # My intuition was that GCC could store x0 on a 64 bits aligned byte, and x1 immediately after, and so one, and clear all the eight bytes at once using a single machine instruction [clearing a 64 bits word]. But this is not the case, since gcc-trunk -S -O3 -fverbose-asm testmanychar.c gives the following code # .type f, @function f: .LFB0: .cfi_startproc movq%rbx, -24(%rsp) #, movq%rbp, -16(%rsp) #, movl$10, %edi #, movq%r12, -8(%rsp) #, subq$40, %rsp #, .cfi_def_cfa_offset 48 leaq13(%rsp), %rbx #, tmp58 .cfi_offset 12, -16 .cfi_offset 6, -24 .cfi_offset 3, -32 leaq15(%rsp), %rbp #, tmp60 leaq14(%rsp), %rdx #, tmp59 leaq11(%rsp), %r12 #, tmp61 movb$0, 8(%rsp) #, x7 movb$0, 9(%rsp) #, x6 movq%rbx, %rcx # tmp58, movq%rbp, %rsi # tmp60, movb$0, 10(%rsp)#, x5 movb$0, 11(%rsp)#, x4 movb$0, 12(%rsp)#, x3 movb$0, 13(%rsp)#, x2 movb$0, 14(%rsp)#, x1 movb$0, 15(%rsp)#, x0 callg # leaq12(%rsp), %rdx #, tmp62 movq%r12, %rcx # tmp61, movq%rbx, %rsi # tmp58, movl$20, %edi #, leaq9(%rsp), %rbx #, tmp64 callg # leaq10(%rsp), %rdx #, tmp65 movq%rbx, %rcx # tmp64, movq%r12, %rsi # tmp61, movl$30, %edi #, callg # leaq8(%rsp), %rdx #, tmp68 movq%rbp, %rcx # tmp60, movq%rbx, %rsi # tmp64, movl$40, %edi #, callg # movq16(%rsp), %rbx #, movq24(%rsp), %rbp #, movq32(%rsp), %r12 #, addq$40, %rsp #, .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size f, .-f .ident "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk revision 157303]" # With gcc-trunk -S -O3 -fverbose-asm -march=core2 -mtune=core2 testmanychar.c I am getting still ## # options passed: testmanychar.c -march=core2 -mtune=core2 -O3 .globl f .type f, @function f: .LFB0: .cfi_startproc movq%rbx, -24(%rsp) #, movq%rbp, -16(%rsp) #, movq%r12, -8(%rsp) #, movl$10, %edi #, subq$40, %rsp #, .cfi_def_cfa_offset 48 leaq13(%rsp), %rbx #, tmp58 .cfi_offset 12, -16 .cfi_offset 6, -24 .cfi_offset 3, -32 leaq15(%rsp), %rbp #, tmp60 leaq11(%rsp), %r12 #, tmp61 leaq14(%rsp), %rdx #, tmp59 movq%rbx, %rcx # tmp58, movq%rbp, %rsi # tmp60, movb$0, 8(%rsp) #, x7 movb$0, 9(%rsp) #, x6 movb$0, 10(%rsp)#, x5 movb$0, 11(%rsp)#, x4 movb$0, 12(%rsp)#, x3 movb$0, 13(%rsp)#, x2 movb$0, 14(%rsp)#, x1 movb$0, 15(%rsp)#, x0 callg # leaq12(%rsp), %rdx #, tmp62 movq%r12, %rcx # tmp61, movq%rbx, %rsi # tmp58, movl$20, %edi #, leaq9(%rsp), %rbx #, tmp64 callg # leaq10(%rsp), %rdx #, tmp65 movq%rbx, %rcx # tmp64, movq%r12, %rsi # tmp61, movl$30, %edi #, callg # leaq8(%rsp), %rdx #, tmp68 movq%rbp, %rcx # tmp60, movq%rbx, %rsi # tmp64, movl$40, %edi #, callg # movq16(%rsp), %rbx #, movq24(%rsp), %rbp #, movq32(%rsp), %r12 #, addq$40, %rsp #, .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size f, .-f .ident "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk revision 157303]" I was hoping that movb$0, 8(%rsp) #, x7 movb$0, 9(%rsp) #, x6 movb$0, 10(%rsp)
Re: legitimate parallel make check?
IainS writes: > on my mere 8 cores :-) I'm still getting sporadic Bus errors on: > > "make -k -j8 check RUNTESTFLAGS ... " > [from the command line on a bootstrapped clean trunk @157307] > > so there's something else wrong somewhere... Probably make gets SIGBUS, which obviously won't be in the runtest output. > ... I guess a target-related issue ({i686,powerpc}-apple-darwin9) Or simply a hardware issue? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
legitimate parallel make check?
for the sake of keeping information segregated in log files (and balancing out test times) I tried the following (from within a regression testing script): 'make -k check-gcc-c RUNTESTFLAGS=$runTEST >${STATE}/$svnRevision- check-c-log.txt 2>&1' & (sleep 2) 'make -k check-gcc-c++ RUNTESTFLAGS=$runTEST >${STATE}/ $svnRevision-check-c++-log.txt 2>&1' & (sleep 2) 'make -k check-gcc-fortran RUNTESTFLAGS=$runTEST >${STATE}/ $svnRevision-check-fortran-log.txt 2>&1' & (sleep 2) ' make -k check-gcc-objc RUNTESTFLAGS=$runTEST >${STATE}/ $svnRevision-check-objc-log.txt 2>&1' & (sleep 2) 'make -k check-gcc-obj-c++ RUNTESTFLAGS=$runTEST >${STATE}/ $svnRevision-check-obc++-log.txt 2>&1' & (sleep 2) 'make -k check-target-libstdc++-v3 RUNTESTFLAGS=$runTEST >$ {STATE}/$svnRevision-check-target-libstdc++-log.txt 2>&1'& (sleep 2) 'make -k check-target-libgomp RUNTESTFLAGS=$runTEST >$ {STATE}/$svnRevision-check-target-libgomp-log.txt 2>&1' & firstly; this actually breaks the make process permanently (I suspect a race to make site.exp) unless the parallel jobs are launched with a time delay. by permanently I mean that after trying this - "make check-gcc" fails from the command line until config.status is re-run in the gcc dir. secondly; if I use "-k -j3 check-gcc-c " together with "-k -j2 check-gcc-fortran " (for example) I get sporadic bus errors : /bin/sh: line 1: 67618 Bus error `if [ -f ${srcdir}/../ dejagnu/runtest ] ; then echo ${srcdir}/../dejagnu/runtest ; else echo runtest; fi` --tool gfortran --target_board=unix\{-m32,-m64\} $runtestflags make[2]: [check-parallel-gfortran] Error 138 (ignored) large chunks of tests go silently missing at this point... === Am I trying something that is unsupported - or is this is a bug? === "make -k -j8 check " is not particularly helpful (a) because it tests gmp/mpfr/mpc every time (b) any redirected output is hard to scan for problems. Iain
Re: legitimate parallel make check?
On 9 Mar 2010, at 12:10, Rainer Orth wrote: IainS writes: Am I trying something that is unsupported - or is this is a bug? === "make -k -j8 check " is not particularly helpful (a) because it tests gmp/mpfr/mpc every time (b) any redirected output is hard to scan for problems. What you're trying is far too complicated: just use -k -j check from the toplevel and forget about the make check output, which, as you say, is interleaved and useless, just like the make -j bootstrap output. OK. Instead, run make mail-report.log afterwards and check that. Although, I note that contrib/test_summary only reports what gets into the *.sum files -- i.e. tests that complete or timeout. It doesn't log problems in the actual make process itself - to find those one still has to trawl through the interleaved stuff. It would be nice to allow the apparently independent targets [e.g. gcc- c,fortran,c++ etc.] to be (explicitly) make-checked in parallel. thanks, Iain
Re: legitimate parallel make check?
IainS writes: >> Instead, run make mail-report.log afterwards and check that. > > Although, I note that contrib/test_summary only reports what gets into the > *.sum files -- i.e. tests that complete or timeout. > It doesn't log problems in the actual make process itself - to find those > one still has to trawl through the interleaved stuff. No, you haven't. You need to look at the corresponding *.log files instead; most of this isn't in the regular make check output anyway. > It would be nice to allow the apparently independent targets [e.g. gcc- > c,fortran,c++ etc.] to be (explicitly) make-checked in parallel. This is exactly what happens when you invoke make -j -k check, provided that the parallelism specified is big enough (and the machine can handle the load). Otherwise, you may end up running (say) the c testsuite in parallel, which isn't bad either. As I wrote before, I'm going to use this on an (effectively) 64-core machine and hope to achieve to use all cores in parallel :-) Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: clearing many bytes variables (could use one machine instruction)?
On Tue, Mar 9, 2010 at 3:58 PM, Basile Starynkevitch wrote: > Hello All, > > With a recently compiled gcc-trunk on x86-64/linux, I am compiling the > folllowing example: > > # > > /* file testmanychar.c */ > extern void g (int, char *, char *, char *); > > void > f (void) > { > char x0, x1, x2, x3, x4, x5, x6, x7; > /* assuming x0 is word aligned on a x86_64, and variables are bytes in > memory, we could clear all the variables in one machine instruction */ > x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = (char) 0; > g (10, &x0, &x1, &x2); > g (20, &x2, &x3, &x4); > g (30, &x4, &x5, &x6); > g (40, &x6, &x7, &x0); > } > > # > > My intuition was that GCC could store x0 on a 64 bits aligned byte, and x1 > immediately after, and so one, and clear all the eight bytes at once using a > single machine instruction [clearing a 64 bits word]. > > But this is not the case, since > gcc-trunk -S -O3 -fverbose-asm testmanychar.c > gives the following code > > # > .type f, @function > f: > .LFB0: > .cfi_startproc > movq %rbx, -24(%rsp) #, > movq %rbp, -16(%rsp) #, > movl $10, %edi #, > movq %r12, -8(%rsp) #, > subq $40, %rsp #, > .cfi_def_cfa_offset 48 > leaq 13(%rsp), %rbx #, tmp58 > .cfi_offset 12, -16 > .cfi_offset 6, -24 > .cfi_offset 3, -32 > leaq 15(%rsp), %rbp #, tmp60 > leaq 14(%rsp), %rdx #, tmp59 > leaq 11(%rsp), %r12 #, tmp61 > movb $0, 8(%rsp) #, x7 > movb $0, 9(%rsp) #, x6 > movq %rbx, %rcx # tmp58, > movq %rbp, %rsi # tmp60, > movb $0, 10(%rsp) #, x5 > movb $0, 11(%rsp) #, x4 > movb $0, 12(%rsp) #, x3 > movb $0, 13(%rsp) #, x2 > movb $0, 14(%rsp) #, x1 > movb $0, 15(%rsp) #, x0 > call g # > leaq 12(%rsp), %rdx #, tmp62 > movq %r12, %rcx # tmp61, > movq %rbx, %rsi # tmp58, > movl $20, %edi #, > leaq 9(%rsp), %rbx #, tmp64 > call g # > leaq 10(%rsp), %rdx #, tmp65 > movq %rbx, %rcx # tmp64, > movq %r12, %rsi # tmp61, > movl $30, %edi #, > call g # > leaq 8(%rsp), %rdx #, tmp68 > movq %rbp, %rcx # tmp60, > movq %rbx, %rsi # tmp64, > movl $40, %edi #, > call g # > movq 16(%rsp), %rbx #, > movq 24(%rsp), %rbp #, > movq 32(%rsp), %r12 #, > addq $40, %rsp #, > .cfi_def_cfa_offset 8 > ret > .cfi_endproc > .LFE0: > .size f, .-f > .ident "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk revision > 157303]" > > # > > > With > gcc-trunk -S -O3 -fverbose-asm -march=core2 -mtune=core2 testmanychar.c > I am getting still > > ## > > # options passed: testmanychar.c -march=core2 -mtune=core2 -O3 > > .globl f > .type f, @function > f: > .LFB0: > .cfi_startproc > movq %rbx, -24(%rsp) #, > movq %rbp, -16(%rsp) #, > movq %r12, -8(%rsp) #, > movl $10, %edi #, > subq $40, %rsp #, > .cfi_def_cfa_offset 48 > leaq 13(%rsp), %rbx #, tmp58 > .cfi_offset 12, -16 > .cfi_offset 6, -24 > .cfi_offset 3, -32 > leaq 15(%rsp), %rbp #, tmp60 > leaq 11(%rsp), %r12 #, tmp61 > leaq 14(%rsp), %rdx #, tmp59 > movq %rbx, %rcx # tmp58, > movq %rbp, %rsi # tmp60, > movb $0, 8(%rsp) #, x7 > movb $0, 9(%rsp) #, x6 > movb $0, 10(%rsp) #, x5 > movb $0, 11(%rsp) #, x4 > movb $0, 12(%rsp) #, x3 > movb $0, 13(%rsp) #, x2 > movb $0, 14(%rsp) #, x1 > movb $0, 15(%rsp) #, x0 > call g # > leaq 12(%rsp), %rdx #, tmp62 > movq %r12, %rcx # tmp61, > movq %rbx, %rsi # tmp58, > movl $20, %edi #, > leaq 9(%rsp), %rbx #, tmp64 > call g # > leaq 10(%rsp), %rdx #, tmp65 > movq %rbx, %rcx # tmp64, > movq %r12, %rsi # tmp61, > movl $30, %edi #, > cal
Re: legitimate parallel make check?
On 3/9/2010 4:28 AM, IainS wrote: It would be nice to allow the apparently independent targets [e.g. gcc-c,fortran,c++ etc.] to be (explicitly) make-checked in parallel. On certain targets, it has been necessary to do this explicitly for a long time, submitting make check-gcc, make check-fortran, make check-g++ separately. Perhaps a script could be made which would detect when the build is complete, then submit the separate make check serial jobs together. -- Tim Prince
Re: Puzzle about mips pipeline description
"Amker.Cheng" writes: > In gcc internal, section 16.19.8, there is a rule about > "define_insn_reservation" like: > "`condition` defines what RTL insns are described by this > construction. You should re- > member that you will be in trouble if `condition` for two or more > different `define_insn_ > reservation` constructors if TRUE for an insn". > > While in mips.md, pipeline description for each processor are > included along with > generic.md, which providing a fallback for processor without specific > pipeline description. > > Here is the PUZZLE: Won't `define_insn_reservation` constructors >>From both specific > processor's and the generic md file break the rule mentioned before? > For example, It seems > conditions for the r3k_load(from 3000.md) and generic_load(from > generic.md) are both TRUE > for lw insn. Looks to me like the MIPS backend is relying on undocumented behaviour: when multiple define_insn_reservation statements match, the automata will match them in order. It would be better if the define_insn_reservation in generic.md checked "cpu". Ian
Re: legitimate parallel make check?
On 9 Mar 2010, at 12:46, Rainer Orth wrote: IainS writes: Instead, run make mail-report.log afterwards and check that. ... suite in parallel, which isn't bad either. As I wrote before, I'm going to use this on an (effectively) 64-core machine and hope to achieve to use all cores in parallel :-) on my mere 8 cores :-) I'm still getting sporadic Bus errors on: "make -k -j8 check RUNTESTFLAGS ... " [from the command line on a bootstrapped clean trunk @157307] so there's something else wrong somewhere... ... I guess a target-related issue ({i686,powerpc}-apple-darwin9) (note that this error does *not* get into the .log files, AFAICS) Iain
Re: legitimate parallel make check?
IainS writes: > Am I trying something that is unsupported - or is this is a bug? > > === > > "make -k -j8 check " is not particularly helpful (a) because it tests > gmp/mpfr/mpc every time (b) any redirected output is hard to scan for > problems. What you're trying is far too complicated: just use -k -j check from the toplevel and forget about the make check output, which, as you say, is interleaved and useless, just like the make -j bootstrap output. Instead, run make mail-report.log afterwards and check that. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [trans-mem] __sync_add_and_fetch_8 on ia32
On 02/19/2010 04:18 AM, Paolo Carlini wrote: > On 02/19/2010 01:07 PM, Paolo Carlini wrote: >> On 02/19/2010 12:38 PM, Aldy Hernandez wrote: >> >>> Since the TM library on ia32 is built with -m486, which doesn't >>> have 64-bit atomic operations, should we... >>> >>> a) Build with -m586 and above. b) Have _ITM_transactionId_t be >>> 32-bit quantities. c) Come up with some locking solution for >>> archs that don't have 64-bit atomic operations. >>> >> My personal opinion on this is that unless we are 100% sure that >> performance are decent anyway (thus 3), or 32-bits are certainly >> enough (thus 2), we should not waste time supporting anything older >> than -m586 for the transactional memory work. >> > ... a slightly more sophisticated variant of b) would be using > uint64_t for 64-bit targets and uint32_t for 32-bit targets, should > make everybody happy. There's a correctness issue in the library interface -- there's no clean way to handle that value overflowing. Rather than make the library interface way more complex, it seemed far easier to simply use a 64-bit value, which we simply assume won't overflow. I'd be fine with -m586 (which is how things will get built on e.g. Fedora), but that isn't going to help other 32-bit targets that miss that operation. We'll need a configure test and a mutex if needed. r~
Re: clearing many bytes variables (could use one machine instruction)?
On Tue, Mar 09, 2010 at 05:25:35PM +0600, Alexey Salmin wrote: > On Tue, Mar 9, 2010 at 3:58 PM, Basile Starynkevitch > wrote: > > Hello All, > > > > With a recently compiled gcc-trunk on x86-64/linux, I am compiling the > > folllowing example: > > > > # > > > > /* file testmanychar.c */ > > extern void g (int, char *, char *, char *); > > > > void > > f (void) > > { > > char x0, x1, x2, x3, x4, x5, x6, x7; > > /* assuming x0 is word aligned on a x86_64, and variables are bytes in > > memory, we could clear all the variables in one machine instruction */ > > x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = (char) 0; > > g (10, &x0, &x1, &x2); > > g (20, &x2, &x3, &x4); > > g (30, &x4, &x5, &x6); > > g (40, &x6, &x7, &x0); > > } > > > > # > > > > My intuition was that GCC could store x0 on a 64 bits aligned byte, and x1 > > immediately after, and so one, and clear all the eight bytes at once using > > a single machine instruction [clearing a 64 bits word]. > > Thing you're talking about is a kind of vectorization. If you want to > simplify the vectorizing for the compiler you should store your data > in arrays instead of separate variables and use loops to process your > data instead of separate operations. I knew about vectorization (of which I am not an expert), and I didn't mention it, because in my view this is not exactly vectorization. And I don't want to use an array of bytes for that purpose. I want to have a rather large number of individual variables. Perhaps I should give more context about this question. This is inside the MELT branch of GCC, which happens to generate C code (I won't explain again why, look in the wiki http://gcc.gnu.org/wiki/MiddleEndLispTranslator etc). In MELT I have a sophisticated pattern matching machinery, and pattern matching expressions are translated into a complex bunch of C code (which will be compiled by the C compiler on the host machine, which probably would be some GCC version). But I believe I have a design bug in today's implementation translation of MELT pattern matching (the translator mostly works, except for or sub-patterns; in the unlikely case the MELT translator fails to translate a complex pattern matching, it fails "nicely" with a fatal_error. AFAIK, it does not generate wrong C code.). I am right now working on this bug (by implementing a new pattern-matcher translator in gcc-melt-branch/gcc/melt/warmelt-normatch.melt). The new scheme for patternmatching translation involves having a boolean flag for each sub-pattern. Each such flag is initially cleared, may be set to true at most once, and may be tested several times. But I will have a lot of such flags (several dozens or hundreds). So the block I am considering to generate. So basically, I intend to generate a big C block like typedef char melt_flag_t; /* or perhaps bool */ { /* typically dozens or even a hundred of cleared variables */ melt_flag_t f0=0, f1=0, f2=0, f3=0, f4=0, f5=0, f6=0, f7=0, f8=0, f9=0; /* some complex code with conditionals and forward gotos, where each above flag is set at most once, and may be tested many times */ } I don't want to use an array, because in some cases (at least with -O3 for the GCC compiling that generated code, and when the translated pattern-match is simple enough so that we have only a dozen or two flags, I can imagine that GCC will optimize that and put some variables in registers, etc.. So I asked myself if GCC can clear efficiently a set of variables (BTW, this is mandatory in Java), hence my initial question. In my view, aggregating several small scalar variables inside a larger word data is not exactly vectorization (my perception of vectorization is that it is dealing with arrays). But I am not an expert on these issues so I may be wrong. Still, my initial question is not that important. Having a hundred of movb $0... is not a big deal today. But I was a bit surprized that GCC did not optimize that, since clearing local variables is common in many programs (and mandatory in Java). Cheers. PS. BTW, Zbigniew Chamski & myself will give a talk in Franch about GCC plugins at Solutions Linux in Paris, march 17th. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: [trans-mem] __sync_add_and_fetch_8 on ia32
On Fri, Feb 19, 2010 at 3:38 AM, Aldy Hernandez wrote: > libitm.so won't build on ia32 because of an undefined reference to > __sync_add_and_fetch_8. > > This is the build failure Pearly encountered here: > > http://gcc.gnu.org/ml/gcc-patches/2009-09/msg01201.html > > What happens is that we try to do a __sync_and_fetch() on global_tid, > which is of type _ITM_transactionId_t, a 64-bit quantity: > > typedef uint64_t _ITM_transactionId_t; /* Transaction identifier */ > > Since the TM library on ia32 is built with -m486, which doesn't have > 64-bit atomic operations, should we... > > a) Build with -m586 and above. > b) Have _ITM_transactionId_t be 32-bit quantities. > c) Come up with some locking solution for archs that don't have > 64-bit atomic operations. > FWIW, as of 2010-02-28, gcc will default to i686 unless you configure gcc with i[345]86-os. -- H.J.
Incorrect casting?
Hi, the following piece of code produces different output on svn trunk and gcc-4_4-branch: #include int main() { struct { unsigned bar:1; } foo; foo.bar = 0x1; printf("%08x\n", (unsigned char)(foo.bar * 0xfe)); printf("%08x\n", (unsigned char)(foo.bar * 0xff)); return 0; } monst...@yggdrasil /data/tmp $ gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc-svn/configure --enable-stage1-languages-c --enable-languages=c,c++ Thread model: posix gcc version 4.4.4 20100309 (prerelease) (GCC) monst...@yggdrasil /data/tmp $ ./a.out 00fe 0001 monst...@yggdrasil /data/tmp $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-svn/configure --enable-stage1-languages-c --enable-languages=c,c++ Thread model: posix gcc version 4.5.0 20100309 (experimental) (GCC) monst...@yggdrasil /data/tmp $ ./a.out 00fe 00ff Is there something illegal in this code or is it a bug somewhere? Thanks, Marcin
Re: clearing many bytes variables (could use one machine instruction)?
On 09/03/2010 17:42, Basile Starynkevitch wrote: > I knew about vectorization (of which I am not an expert), and I didn't > mention it, because in my view this is not exactly vectorization. Agreed. > And I don't want to use an array of bytes for that purpose. I want to have a > rather large number of individual variables. > typedef char melt_flag_t; /* or perhaps bool */ > > { > /* typically dozens or even a hundred of cleared variables */ >melt_flag_t f0=0, f1=0, f2=0, f3=0, f4=0, f5=0, f6=0, f7=0, f8=0, f9=0; > > /* some complex code with conditionals and forward gotos, > where each above flag is set at most once, and may be tested many times */ > } > > I don't want to use an array > So I asked myself if GCC can clear efficiently a set of variables > (BTW, this is mandatory in Java), hence my initial question. > > In my view, aggregating several small scalar variables inside a larger word > data is not exactly vectorization (my perception of vectorization is that it > is dealing with arrays). Yes, I think so; aggregation is the right word for it. Or maybe scalarization. If you wrap all these chars in a struct, can SRA handle it? cheers, DaveK
Re: clearing many bytes variables (could use one machine instruction)?
Dave Korn wrote: Yes, I think so; aggregation is the right word for it. Or maybe scalarization. If you wrap all these chars in a struct, can SRA handle it? I tried ### /* file testmanychar.c */ extern void g (int, char *, char *, char *); void f (void) { struct { char x0, x1, x2, x3, x4, x5, x6, x7; } s = {0,0,0,0,0,0,0}; /* assuming s.x0 is word aligned on a x86_64, and variables are bytes in memory, we could clear all the variables in one machine instruction */ g (10, &s.x0, &s.x1, &s.x2); g (20, &s.x2, &s.x3, &s.x4); g (30, &s.x4, &s.x5, &s.x6); g (40, &s.x6, &s.x7, &s.x0); } ### I compiled it with gcc-trunk -S -std=gnu99 -O3 -fverbose-asm \ -march=core2 -mtune=core2 testmanychar.c and I got: ### .globl f .type f, @function f: .LFB0: .cfi_startproc movq%rbx, -24(%rsp) #, movq%rbp, -16(%rsp) #, movq%r12, -8(%rsp) #, movl$10, %edi #, subq$40, %rsp #, .cfi_def_cfa_offset 48 leaq2(%rsp), %rbp #, tmp59 .cfi_offset 12, -16 .cfi_offset 6, -24 .cfi_offset 3, -32 leaq4(%rsp), %r12 #, tmp64 leaq1(%rsp), %rdx #, tmp61 movq%rbp, %rcx # tmp59, movq%rsp, %rsi #, movq$0, (%rsp) #, s callg # leaq3(%rsp), %rdx #, tmp66 movq%r12, %rcx # tmp64, movq%rbp, %rsi # tmp59, movl$20, %edi #, leaq6(%rsp), %rbp #, tmp70 callg # leaq5(%rsp), %rdx #, tmp72 movq%rbp, %rcx # tmp70, movq%r12, %rsi # tmp64, movl$30, %edi #, callg # leaq7(%rsp), %rdx #, tmp77 movq%rsp, %rcx #, movq%rbp, %rsi # tmp70, movl$40, %edi #, callg # movq16(%rsp), %rbx #, movq24(%rsp), %rbp #, movq32(%rsp), %r12 #, addq$40, %rsp #, .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size f, .-f .ident "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk revision 157303]" ### So it seems indeed that a structure can be cleared efficiently. However, having a structure has probably also some shortcommings: I would guess that it is much harder for GCC to keep my flags in registers only. Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: clearing many bytes variables (could use one machine instruction)?
On 09/03/2010 18:42, Basile Starynkevitch wrote: > Dave Korn wrote: >> >> Yes, I think so; aggregation is the right word for it. Or maybe >> scalarization. If you wrap all these chars in a struct, can SRA >> handle it? > > > I tried > So it seems indeed that a structure can be cleared efficiently. However, > having a structure has probably also some shortcommings: I would guess > that it is much harder for GCC to keep my flags in registers only. Dunno. Maybe you can get IPA-SRA to kick in and help you. cheers, DaveK
Re: legitimate parallel make check?
On 9 Mar 2010, at 19:13, Janis Johnson wrote: To run all of the compiler tests in parallel you can do "make -jn -k check-gcc" from the top level and let the existing build machinery take care of running chunks of tests in parallel and putting the results back together. that's fine (I couldn't see why not from looking at Makefile.in) (maybe Rainer is right and I have a memory fault on my 8-core box .. :-( ) -- just backing up prior to a single user check of that ... .. I don't seem to get the bus errors on a 4CPU g5 or a Core 2 duo .. but .. the 8-core machine is faster .. so ... race conditions are more likely to manifest there. You'd still have to handle the library tests you want to run. I assume that " make -jn -k check-target RUNTESTFLAGS " is equally reasonable? Perhaps what you want is a way to skip the host library testing, although if you're building them with your new compiler then testing them is a good part of testing that compiler. I do build gmp/mpfr/mpc in-tree... given that I almost always run the whole testsuite - I'm torn between thinking it's a good idea.. ... and that I'm essentially proving little when the versions of gmp/ mpfr/and mpc are static. -- FWIW: I'm trying to update contrib/btest.sh script to handle m32 & m64 plus a few other things ... Iain
Re: legitimate parallel make check?
On Tue, Mar 9, 2010 at 2:27 PM, IainS wrote: > I do build gmp/mpfr/mpc in-tree... How? Last I tried, it didn't work, as mpc used the system gmp/mpfr, not the just-built in-tree versions. Therefore, it's not really an "in-tree" build, and you can't build on a system that doesn't already have gmp/mpfr installed. At least, that's what it was... does that work now?
Re: legitimate parallel make check?
IainS writes: > .. I don't seem to get the bus errors on a 4CPU g5 or a Core 2 duo .. but > .. the 8-core machine is faster .. so ... race conditions are more likely > to manifest there. But race conditions don't manifest themselves in make SEGVs ;-( I'm regularly running make -k -j128 on a T5220, or -j32 on a Sun Fire X4450 with 4 4-core CPUs, without any problems. >> You'd still have to handle the library tests you want to run. > > I assume that " make -jn -k check-target RUNTESTFLAGS " is equally > reasonable? Indeed, as well as using a site.exp and pointing the DEJAGNU environment variable at it, as I've learned from looking at the regression tester sources in contrib/regression. Here's what I use for that: global target_list case "$target_triplet" in { { "i?86-*-solaris2.1[0-9]" } { # FIXME: Disable multilib testing if the host cannot execute AMD64 # binaries. Should be exceedingly rare now (cf. erebus). set target_list { "unix{,-m64}" } } { "mips-sgi-irix6*" } { # FIXME: Disable multilib testing if the host cannot execute N64 # binaries. We cannot selectively disable only one multilib during # the build. set target_list { "unix{,-mabi=32,-mabi=64}" } } { "sparc*-*-solaris2*" } { set target_list { "unix{,-m64}" } } default { # Works for alpha*-*-osf*, i?86-*-solaris2.[89] and mips-sgi-irix5* # testing. set target_list { "unix" } } } > I do build gmp/mpfr/mpc in-tree... > given that I almost always run the whole testsuite - I'm torn between > thinking it's a good idea.. > ... and that I'm essentially proving little when the versions of gmp/ > mpfr/and mpc are static. I'd keep gmp/mpfr/mpc at a fixed version, test them once, and be done with it. Testing the compiler and runtime libraries on its own takes enough time already. > FWIW: I'm trying to update contrib/btest.sh script to handle m32 & m64 plus > a few other things ... No need: works already via DEJAGNU described above. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: legitimate parallel make check?
On 9 Mar 2010, at 19:31, NightStrike wrote: On Tue, Mar 9, 2010 at 2:27 PM, IainS > wrote: I do build gmp/mpfr/mpc in-tree... How? Last I tried, it didn't work, as mpc used the system gmp/mpfr, not the just-built in-tree versions. Therefore, it's not really an "in-tree" build, and you can't build on a system that doesn't already have gmp/mpfr installed. At least, that's what it was... does that work now? it's been fixed for a while (resolution of PR42424) I keep one copy of gmp mpfr and mpc in a directory at the root of my source disk and ln -s to that from within the different gcc source trees. Iain
Re: legitimate parallel make check?
On 9 Mar 2010, at 19:36, Rainer Orth wrote: IainS writes: .. I don't seem to get the bus errors on a 4CPU g5 or a Core 2 duo .. but .. the 8-core machine is faster .. so ... race conditions are more likely to manifest there. But race conditions don't manifest themselves in make SEGVs ;-( I'm regularly running make -k -j128 on a T5220, or -j32 on a Sun Fire X4450 I'm running memchecks on my 8-core machine .. Indeed, as well as using a site.exp and pointing the DEJAGNU environment variable at it, as I've learned from looking at the regression tester sources in contrib/regression. Here's what I use for that: global target_list case "$target_triplet" in { { "i?86-*-solaris2.1[0-9]" } { # FIXME: Disable multilib testing if the host cannot execute AMD64 # binaries. Should be exceedingly rare now (cf. erebus). set target_list { "unix{,-m64}" } } { "mips-sgi-irix6*" } { # FIXME: Disable multilib testing if the host cannot execute N64 # binaries. We cannot selectively disable only one multilib during # the build. set target_list { "unix{,-mabi=32,-mabi=64}" } } { "sparc*-*-solaris2*" } { set target_list { "unix{,-m64}" } } default { # Works for alpha*-*-osf*, i?86-*-solaris2.[89] and mips-sgi-irix5* # testing. set target_list { "unix" } } } that's really neat indeed - I should have spotted the potential looking at the code in contrib ... although the site.exp is hardwired in btest.sh at present ; I guess one might be able to use .dejagnurc - just need to check on the order that the files are processed. FWIW: I'm trying to update contrib/btest.sh script to handle m32 & m64 plus a few other things ... No need: works already via DEJAGNU described above. hmm. I see that the make check would work using that neat method above... ... however, 1/ I want a different language list than the default... 2/ also some other different configure options... but mainly 3/ ... As I read it - btest.sh strips everything but the testname from the PASSES and FAILS files. (the awk lines that print $2 for a match of /FAIL: / etc.) This means (e.g.) that if a test passes at m32 and fails at m64 I think it will appear in both PASSES and FAILS.. ... this seems destined to result in confusion... I would like to identify regressions separately for m32 and m64 - esp. in obj-c/c++ where there are huge differences on NeXT runtime systems. if I've missed the blindingly obvious (the day job is taking up most of my mental resources ;-)) please point it out :-) Iain
gcc-4.4-20100309 is now available
Snapshot gcc-4.4-20100309 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100309/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 157328 You'll find: gcc-4.4-20100309.tar.bz2 Complete GCC (includes all of below) gcc-core-4.4-20100309.tar.bz2 C front end and core compiler gcc-ada-4.4-20100309.tar.bz2 Ada front end and runtime gcc-fortran-4.4-20100309.tar.bz2 Fortran front end and runtime gcc-g++-4.4-20100309.tar.bz2 C++ front end and runtime gcc-java-4.4-20100309.tar.bz2 Java front end and runtime gcc-objc-4.4-20100309.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.4-20100309.tar.bz2The GCC testsuite Diffs from 4.4-20100302 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.