clearing many bytes variables (could use one machine instruction)?

2010-03-09 Thread Basile Starynkevitch
Hello All,

With a recently compiled gcc-trunk on x86-64/linux, I am compiling the 
folllowing example:

#

/* file testmanychar.c */
extern void g (int, char *, char *, char *);

void
f (void)
{
  char x0, x1, x2, x3, x4, x5, x6, x7;
  /* assuming  x0 is word aligned on a x86_64, and variables are bytes in 
memory, we could clear all the variables in one machine instruction */
  x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = (char) 0;
  g (10, &x0, &x1, &x2);
  g (20, &x2, &x3, &x4);
  g (30, &x4, &x5, &x6);
  g (40, &x6, &x7, &x0);
}

#

My intuition was that GCC could store x0 on a 64 bits aligned byte, and x1 
immediately after, and so one, and clear all the eight bytes at once using a 
single machine instruction [clearing a 64 bits word].

But this is not the case, since 
   gcc-trunk -S -O3 -fverbose-asm testmanychar.c
gives the following code

#
.type   f, @function
f:
.LFB0:
.cfi_startproc
movq%rbx, -24(%rsp) #,
movq%rbp, -16(%rsp) #,
movl$10, %edi   #,
movq%r12, -8(%rsp)  #,
subq$40, %rsp   #,
.cfi_def_cfa_offset 48
leaq13(%rsp), %rbx  #, tmp58
.cfi_offset 12, -16
.cfi_offset 6, -24
.cfi_offset 3, -32
leaq15(%rsp), %rbp  #, tmp60
leaq14(%rsp), %rdx  #, tmp59
leaq11(%rsp), %r12  #, tmp61
movb$0, 8(%rsp) #, x7
movb$0, 9(%rsp) #, x6
movq%rbx, %rcx  # tmp58,
movq%rbp, %rsi  # tmp60,
movb$0, 10(%rsp)#, x5
movb$0, 11(%rsp)#, x4
movb$0, 12(%rsp)#, x3
movb$0, 13(%rsp)#, x2
movb$0, 14(%rsp)#, x1
movb$0, 15(%rsp)#, x0
callg   #
leaq12(%rsp), %rdx  #, tmp62
movq%r12, %rcx  # tmp61,
movq%rbx, %rsi  # tmp58,
movl$20, %edi   #,
leaq9(%rsp), %rbx   #, tmp64
callg   #
leaq10(%rsp), %rdx  #, tmp65
movq%rbx, %rcx  # tmp64,
movq%r12, %rsi  # tmp61,
movl$30, %edi   #,
callg   #
leaq8(%rsp), %rdx   #, tmp68
movq%rbp, %rcx  # tmp60,
movq%rbx, %rsi  # tmp64,
movl$40, %edi   #,
callg   #
movq16(%rsp), %rbx  #,
movq24(%rsp), %rbp  #,
movq32(%rsp), %r12  #,
addq$40, %rsp   #,
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE0:
.size   f, .-f
.ident  "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk revision 
157303]"

#


With  
  gcc-trunk -S -O3 -fverbose-asm -march=core2 -mtune=core2 testmanychar.c
I am getting still

##

# options passed:  testmanychar.c -march=core2 -mtune=core2 -O3

.globl f
.type   f, @function
f:
.LFB0:
.cfi_startproc
movq%rbx, -24(%rsp) #,
movq%rbp, -16(%rsp) #,
movq%r12, -8(%rsp)  #,
movl$10, %edi   #,
subq$40, %rsp   #,
.cfi_def_cfa_offset 48
leaq13(%rsp), %rbx  #, tmp58
.cfi_offset 12, -16
.cfi_offset 6, -24
.cfi_offset 3, -32
leaq15(%rsp), %rbp  #, tmp60
leaq11(%rsp), %r12  #, tmp61
leaq14(%rsp), %rdx  #, tmp59
movq%rbx, %rcx  # tmp58,
movq%rbp, %rsi  # tmp60,
movb$0, 8(%rsp) #, x7
movb$0, 9(%rsp) #, x6
movb$0, 10(%rsp)#, x5
movb$0, 11(%rsp)#, x4
movb$0, 12(%rsp)#, x3
movb$0, 13(%rsp)#, x2
movb$0, 14(%rsp)#, x1
movb$0, 15(%rsp)#, x0
callg   #
leaq12(%rsp), %rdx  #, tmp62
movq%r12, %rcx  # tmp61,
movq%rbx, %rsi  # tmp58,
movl$20, %edi   #,
leaq9(%rsp), %rbx   #, tmp64
callg   #
leaq10(%rsp), %rdx  #, tmp65
movq%rbx, %rcx  # tmp64,
movq%r12, %rsi  # tmp61,
movl$30, %edi   #,
callg   #
leaq8(%rsp), %rdx   #, tmp68
movq%rbp, %rcx  # tmp60,
movq%rbx, %rsi  # tmp64,
movl$40, %edi   #,
callg   #
movq16(%rsp), %rbx  #,
movq24(%rsp), %rbp  #,
movq32(%rsp), %r12  #,
addq$40, %rsp   #,
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE0:
    .size   f, .-f
.ident  "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk revision 
157303]"


I was hoping that 
movb$0, 8(%rsp) #, x7
movb$0, 9(%rsp) #, x6
movb$0, 10(%rsp)  

Re: legitimate parallel make check?

2010-03-09 Thread Rainer Orth
IainS  writes:

> on my mere 8 cores :-)   I'm still getting sporadic Bus errors on:
>
> "make -k -j8 check RUNTESTFLAGS ... "
> [from the command line on a bootstrapped clean trunk @157307]
>
> so there's something else wrong somewhere...

Probably make gets SIGBUS, which obviously won't be in the runtest output.

> ... I guess a target-related issue ({i686,powerpc}-apple-darwin9)

Or simply a hardware issue?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


legitimate parallel make check?

2010-03-09 Thread IainS
for the sake of keeping information segregated in log files (and  
balancing out test times)

 I tried the following (from within a regression testing script):

'make -k check-gcc-c RUNTESTFLAGS=$runTEST >${STATE}/$svnRevision- 
check-c-log.txt 2>&1' &
(sleep 2) 'make -k check-gcc-c++ RUNTESTFLAGS=$runTEST >${STATE}/ 
$svnRevision-check-c++-log.txt 2>&1' &
(sleep 2) 'make -k check-gcc-fortran RUNTESTFLAGS=$runTEST >${STATE}/ 
$svnRevision-check-fortran-log.txt 2>&1' &
(sleep 2) ' make -k check-gcc-objc RUNTESTFLAGS=$runTEST >${STATE}/ 
$svnRevision-check-objc-log.txt 2>&1' &
(sleep 2) 'make -k check-gcc-obj-c++ RUNTESTFLAGS=$runTEST >${STATE}/ 
$svnRevision-check-obc++-log.txt 2>&1' &
(sleep 2) 'make -k check-target-libstdc++-v3 RUNTESTFLAGS=$runTEST >$ 
{STATE}/$svnRevision-check-target-libstdc++-log.txt 2>&1'&
(sleep 2) 'make -k check-target-libgomp RUNTESTFLAGS=$runTEST >$ 
{STATE}/$svnRevision-check-target-libgomp-log.txt 2>&1' &


firstly; this actually breaks the make process permanently (I suspect  
a race to make site.exp) unless the parallel jobs are launched with a  
time delay.
by permanently I mean that after trying this - "make check-gcc"  fails  
from the command line until config.status is re-run in the gcc dir.




secondly;  if I use  "-k -j3 check-gcc-c "
   together with "-k -j2 check-gcc-fortran " (for example)

I get sporadic bus errors :
/bin/sh: line 1: 67618 Bus error   `if [ -f ${srcdir}/../ 
dejagnu/runtest ] ; then echo ${srcdir}/../dejagnu/runtest ; else echo  
runtest; fi` --tool gfortran --target_board=unix\{-m32,-m64\}  
$runtestflags

make[2]: [check-parallel-gfortran] Error 138 (ignored)

large chunks of tests go silently missing at this point...

===

Am I trying something that is unsupported - or is this is a bug?

===

"make -k -j8 check " is not particularly helpful (a) because it tests  
gmp/mpfr/mpc every time (b) any redirected output is hard to scan for  
problems.


Iain


Re: legitimate parallel make check?

2010-03-09 Thread IainS


On 9 Mar 2010, at 12:10, Rainer Orth wrote:


IainS  writes:


Am I trying something that is unsupported - or is this is a bug?

===

"make -k -j8 check " is not particularly helpful (a) because it tests
gmp/mpfr/mpc every time (b) any redirected output is hard to scan for
problems.


What you're trying is far too complicated: just use -k -j check  
from
the toplevel and forget about the make check output, which, as you  
say,

is interleaved and useless, just like the make -j bootstrap output.


OK.


Instead, run make mail-report.log afterwards and check that.


Although, I note that contrib/test_summary only reports what gets into  
the *.sum files -- i.e. tests that complete or timeout.
It doesn't log problems in the actual make process itself - to find  
those one still has to trawl through the interleaved stuff.


It would be nice to allow the apparently independent targets [e.g. gcc- 
c,fortran,c++ etc.] to be (explicitly) make-checked in parallel.


thanks,
Iain


Re: legitimate parallel make check?

2010-03-09 Thread Rainer Orth
IainS  writes:

>> Instead, run make mail-report.log afterwards and check that.
>
> Although, I note that contrib/test_summary only reports what gets into the
> *.sum files -- i.e. tests that complete or timeout.
> It doesn't log problems in the actual make process itself - to find those
> one still has to trawl through the interleaved stuff.

No, you haven't.  You need to look at the corresponding *.log files
instead; most of this isn't in the regular make check output anyway.

> It would be nice to allow the apparently independent targets [e.g. gcc-
> c,fortran,c++ etc.] to be (explicitly) make-checked in parallel.

This is exactly what happens when you invoke make -j -k check,
provided that the parallelism specified is big enough (and the machine
can handle the load).  Otherwise, you may end up running (say) the c
testsuite in parallel, which isn't bad either.

As I wrote before, I'm going to use this on an (effectively) 64-core
machine and hope to achieve to use all cores in parallel :-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: clearing many bytes variables (could use one machine instruction)?

2010-03-09 Thread Alexey Salmin
On Tue, Mar 9, 2010 at 3:58 PM, Basile Starynkevitch
 wrote:
> Hello All,
>
> With a recently compiled gcc-trunk on x86-64/linux, I am compiling the 
> folllowing example:
>
> #
>
> /* file testmanychar.c */
> extern void g (int, char *, char *, char *);
>
> void
> f (void)
> {
>  char x0, x1, x2, x3, x4, x5, x6, x7;
>  /* assuming  x0 is word aligned on a x86_64, and variables are bytes in 
> memory, we could clear all the variables in one machine instruction */
>  x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = (char) 0;
>  g (10, &x0, &x1, &x2);
>  g (20, &x2, &x3, &x4);
>  g (30, &x4, &x5, &x6);
>  g (40, &x6, &x7, &x0);
> }
>
> #
>
> My intuition was that GCC could store x0 on a 64 bits aligned byte, and x1 
> immediately after, and so one, and clear all the eight bytes at once using a 
> single machine instruction [clearing a 64 bits word].
>
> But this is not the case, since
>   gcc-trunk -S -O3 -fverbose-asm testmanychar.c
> gives the following code
>
> #
>        .type   f, @function
> f:
> .LFB0:
>        .cfi_startproc
>        movq    %rbx, -24(%rsp) #,
>        movq    %rbp, -16(%rsp) #,
>        movl    $10, %edi       #,
>        movq    %r12, -8(%rsp)  #,
>        subq    $40, %rsp       #,
>        .cfi_def_cfa_offset 48
>        leaq    13(%rsp), %rbx  #, tmp58
>        .cfi_offset 12, -16
>        .cfi_offset 6, -24
>        .cfi_offset 3, -32
>        leaq    15(%rsp), %rbp  #, tmp60
>        leaq    14(%rsp), %rdx  #, tmp59
>        leaq    11(%rsp), %r12  #, tmp61
>        movb    $0, 8(%rsp)     #, x7
>        movb    $0, 9(%rsp)     #, x6
>        movq    %rbx, %rcx      # tmp58,
>        movq    %rbp, %rsi      # tmp60,
>        movb    $0, 10(%rsp)    #, x5
>        movb    $0, 11(%rsp)    #, x4
>        movb    $0, 12(%rsp)    #, x3
>        movb    $0, 13(%rsp)    #, x2
>        movb    $0, 14(%rsp)    #, x1
>        movb    $0, 15(%rsp)    #, x0
>        call    g       #
>        leaq    12(%rsp), %rdx  #, tmp62
>        movq    %r12, %rcx      # tmp61,
>        movq    %rbx, %rsi      # tmp58,
>        movl    $20, %edi       #,
>        leaq    9(%rsp), %rbx   #, tmp64
>        call    g       #
>        leaq    10(%rsp), %rdx  #, tmp65
>        movq    %rbx, %rcx      # tmp64,
>        movq    %r12, %rsi      # tmp61,
>        movl    $30, %edi       #,
>        call    g       #
>        leaq    8(%rsp), %rdx   #, tmp68
>        movq    %rbp, %rcx      # tmp60,
>        movq    %rbx, %rsi      # tmp64,
>        movl    $40, %edi       #,
>        call    g       #
>        movq    16(%rsp), %rbx  #,
>        movq    24(%rsp), %rbp  #,
>        movq    32(%rsp), %r12  #,
>        addq    $40, %rsp       #,
>        .cfi_def_cfa_offset 8
>        ret
>        .cfi_endproc
> .LFE0:
>        .size   f, .-f
>        .ident  "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk revision 
> 157303]"
>
> #
>
>
> With
>  gcc-trunk -S -O3 -fverbose-asm -march=core2 -mtune=core2 testmanychar.c
> I am getting still
>
> ##
>
> # options passed:  testmanychar.c -march=core2 -mtune=core2 -O3
>
> .globl f
>        .type   f, @function
> f:
> .LFB0:
>        .cfi_startproc
>        movq    %rbx, -24(%rsp) #,
>        movq    %rbp, -16(%rsp) #,
>        movq    %r12, -8(%rsp)  #,
>        movl    $10, %edi       #,
>        subq    $40, %rsp       #,
>        .cfi_def_cfa_offset 48
>        leaq    13(%rsp), %rbx  #, tmp58
>        .cfi_offset 12, -16
>        .cfi_offset 6, -24
>        .cfi_offset 3, -32
>        leaq    15(%rsp), %rbp  #, tmp60
>        leaq    11(%rsp), %r12  #, tmp61
>        leaq    14(%rsp), %rdx  #, tmp59
>        movq    %rbx, %rcx      # tmp58,
>        movq    %rbp, %rsi      # tmp60,
>        movb    $0, 8(%rsp)     #, x7
>        movb    $0, 9(%rsp)     #, x6
>        movb    $0, 10(%rsp)    #, x5
>        movb    $0, 11(%rsp)    #, x4
>        movb    $0, 12(%rsp)    #, x3
>        movb    $0, 13(%rsp)    #, x2
>        movb    $0, 14(%rsp)    #, x1
>        movb    $0, 15(%rsp)    #, x0
>        call    g       #
>        leaq    12(%rsp), %rdx  #, tmp62
>        movq    %r12, %rcx      # tmp61,
>        movq    %rbx, %rsi      # tmp58,
>        movl    $20, %edi       #,
>        leaq    9(%rsp), %rbx   #, tmp64
>        call    g       #
>        leaq    10(%rsp), %rdx  #, tmp65
>        movq    %rbx, %rcx      # tmp64,
>        movq    %r12, %rsi      # tmp61,
>        movl    $30, %edi       #,
>        cal

Re: legitimate parallel make check?

2010-03-09 Thread Tim Prince

On 3/9/2010 4:28 AM, IainS wrote:



It would be nice to allow the apparently independent targets [e.g. 
gcc-c,fortran,c++ etc.] to be (explicitly) make-checked in parallel.



On certain targets, it has been necessary to do this explicitly for a 
long time, submitting make check-gcc, make check-fortran, make check-g++ 
separately.  Perhaps a script could be made which would detect when the 
build is complete, then submit the separate make check serial jobs together.



--
Tim Prince



Re: Puzzle about mips pipeline description

2010-03-09 Thread Ian Lance Taylor
"Amker.Cheng"  writes:

>   In gcc internal, section 16.19.8, there is a rule about
> "define_insn_reservation" like:
> "`condition` defines what RTL insns are described by this
> construction. You should re-
> member that you will be in trouble if `condition` for two or more
> different `define_insn_
> reservation` constructors if TRUE for an insn".
>
>   While in mips.md, pipeline description for each processor are
> included along with
> generic.md, which providing a fallback for processor without specific
> pipeline description.
>
>   Here is the PUZZLE: Won't `define_insn_reservation` constructors
>>From both specific
> processor's and the generic md file break the rule mentioned before?
> For example, It seems
> conditions for the r3k_load(from 3000.md) and generic_load(from
> generic.md) are both TRUE
> for lw insn.

Looks to me like the MIPS backend is relying on undocumented
behaviour: when multiple define_insn_reservation statements match, the
automata will match them in order.  It would be better if the
define_insn_reservation in generic.md checked "cpu".

Ian


Re: legitimate parallel make check?

2010-03-09 Thread IainS


On 9 Mar 2010, at 12:46, Rainer Orth wrote:


IainS  writes:


Instead, run make mail-report.log afterwards and check that.


... 


suite in parallel, which isn't bad either.


As I wrote before, I'm going to use this on an (effectively) 64-core
machine and hope to achieve to use all cores in parallel :-)


on my mere 8 cores :-)   I'm still getting sporadic Bus errors on:

"make -k -j8 check RUNTESTFLAGS ... "
[from the command line on a bootstrapped clean trunk @157307]

so there's something else wrong somewhere...
... I guess a target-related issue ({i686,powerpc}-apple-darwin9)

(note that this error does *not* get into the .log files, AFAICS)

Iain



Re: legitimate parallel make check?

2010-03-09 Thread Rainer Orth
IainS  writes:

> Am I trying something that is unsupported - or is this is a bug?
>
> ===
>
> "make -k -j8 check " is not particularly helpful (a) because it tests
> gmp/mpfr/mpc every time (b) any redirected output is hard to scan for
> problems.

What you're trying is far too complicated: just use -k -j check from
the toplevel and forget about the make check output, which, as you say,
is interleaved and useless, just like the make -j bootstrap output.

Instead, run make mail-report.log afterwards and check that.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [trans-mem] __sync_add_and_fetch_8 on ia32

2010-03-09 Thread Richard Henderson
On 02/19/2010 04:18 AM, Paolo Carlini wrote:
> On 02/19/2010 01:07 PM, Paolo Carlini wrote:
>> On 02/19/2010 12:38 PM, Aldy Hernandez wrote:
>> 
>>> Since the TM library on ia32 is built with -m486, which doesn't
>>> have 64-bit atomic operations, should we...
>>> 
>>> a) Build with -m586 and above. b) Have _ITM_transactionId_t be
>>> 32-bit quantities. c) Come up with some locking solution for
>>> archs that don't have 64-bit atomic operations.
>>> 
>> My personal opinion on this is that unless we are 100% sure that 
>> performance are decent anyway (thus 3), or 32-bits are certainly
>> enough (thus 2), we should not waste time supporting anything older
>> than -m586 for the transactional memory work.
>> 
> ... a slightly more sophisticated variant of b) would be using
> uint64_t for 64-bit targets and uint32_t for 32-bit targets, should
> make everybody happy.

There's a correctness issue in the library interface -- there's no
clean way to handle that value overflowing.  Rather than make the
library interface way more complex, it seemed far easier to simply
use a 64-bit value, which we simply assume won't overflow.

I'd be fine with -m586 (which is how things will get built on e.g.
Fedora), but that isn't going to help other 32-bit targets that miss
that operation.  We'll need a configure test and a mutex if needed.


r~


Re: clearing many bytes variables (could use one machine instruction)?

2010-03-09 Thread Basile Starynkevitch
On Tue, Mar 09, 2010 at 05:25:35PM +0600, Alexey Salmin wrote:
> On Tue, Mar 9, 2010 at 3:58 PM, Basile Starynkevitch
>  wrote:
> > Hello All,
> >
> > With a recently compiled gcc-trunk on x86-64/linux, I am compiling the 
> > folllowing example:
> >
> > #
> >
> > /* file testmanychar.c */
> > extern void g (int, char *, char *, char *);
> >
> > void
> > f (void)
> > {
> >  char x0, x1, x2, x3, x4, x5, x6, x7;
> >  /* assuming  x0 is word aligned on a x86_64, and variables are bytes in 
> > memory, we could clear all the variables in one machine instruction */
> >  x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = (char) 0;
> >  g (10, &x0, &x1, &x2);
> >  g (20, &x2, &x3, &x4);
> >  g (30, &x4, &x5, &x6);
> >  g (40, &x6, &x7, &x0);
> > }
> >
> > #
> >
> > My intuition was that GCC could store x0 on a 64 bits aligned byte, and x1 
> > immediately after, and so one, and clear all the eight bytes at once using 
> > a single machine instruction [clearing a 64 bits word].
> 
> Thing you're talking about is a kind of vectorization. If you want to
> simplify the vectorizing for the compiler you should store your data
> in arrays instead of separate variables and use loops to process your
> data instead of separate operations.


I knew about vectorization (of which I am not an expert), and I didn't
mention it, because in my view this is not exactly vectorization.


And I don't want to use an array of bytes for that purpose. I want to have a
rather large number of individual variables.


Perhaps I should give more context about this question. This is inside the
MELT branch of GCC, which happens to generate C code (I won't explain again
why, look in the wiki http://gcc.gnu.org/wiki/MiddleEndLispTranslator etc).
In MELT I have a sophisticated pattern matching machinery, and pattern
matching expressions are translated into a complex bunch of C code (which
will be compiled by the C compiler on the host machine, which probably would
be some GCC version).

But I believe I have a design bug in today's implementation translation of
MELT pattern matching (the translator mostly works, except for or
sub-patterns; in the unlikely case the MELT translator fails to translate a
complex pattern matching, it fails "nicely" with a fatal_error. AFAIK, it
does not generate wrong C code.). I am right now working on this bug (by
implementing a new pattern-matcher translator in
gcc-melt-branch/gcc/melt/warmelt-normatch.melt).

The new scheme for patternmatching translation involves having a boolean
flag for each sub-pattern. Each such flag is initially cleared, may be set to
true at most once, and may be tested several times. But I will have a lot of
such flags (several dozens or hundreds). So the block I am considering to
generate.

So basically, I intend to generate a big C block like

typedef char melt_flag_t; /* or perhaps bool */

{
 /* typically dozens or even a hundred of cleared variables */
   melt_flag_t f0=0, f1=0, f2=0, f3=0, f4=0, f5=0, f6=0, f7=0, f8=0, f9=0;

 /* some complex code with conditionals and forward gotos, 
where each above flag is set at most once, and may be tested many times */
}

I don't want to use an array, because in some cases (at least with -O3 for
the GCC compiling that generated code, and when the translated pattern-match
is simple enough so that we have only a dozen or two flags, I can imagine
that GCC will optimize that and put some variables in registers, etc..

So I asked myself if GCC can clear efficiently a set of variables 
(BTW, this is mandatory in Java), hence my initial question.

In my view, aggregating several small scalar variables inside a larger word
data is not exactly vectorization (my perception of vectorization is that it
is dealing with arrays). But I am not an expert on these issues so I may be 
wrong.

Still, my initial question is not that important. Having a hundred of movb
$0... is not a big deal today. But I was a bit surprized that GCC did not
optimize that, since clearing local variables is common in many programs
(and mandatory in Java).

Cheers.

PS. BTW, Zbigniew Chamski & myself will give a talk in Franch about GCC plugins
at Solutions Linux in Paris, march 17th.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: [trans-mem] __sync_add_and_fetch_8 on ia32

2010-03-09 Thread H.J. Lu
On Fri, Feb 19, 2010 at 3:38 AM, Aldy Hernandez  wrote:
> libitm.so won't build on ia32 because of an undefined reference to
> __sync_add_and_fetch_8.
>
> This is the build failure Pearly encountered here:
>
>        http://gcc.gnu.org/ml/gcc-patches/2009-09/msg01201.html
>
> What happens is that we try to do a __sync_and_fetch() on global_tid,
> which is of type _ITM_transactionId_t, a 64-bit quantity:
>
>        typedef uint64_t _ITM_transactionId_t;  /* Transaction identifier */
>
> Since the TM library on ia32 is built with -m486, which doesn't have
> 64-bit atomic operations, should we...
>
>        a) Build with -m586 and above.
>        b) Have _ITM_transactionId_t be 32-bit quantities.
>        c) Come up with some locking solution for archs that don't have
>           64-bit atomic operations.
>


FWIW, as of 2010-02-28, gcc will default to i686 unless you configure
gcc with i[345]86-os.


-- 
H.J.


Incorrect casting?

2010-03-09 Thread Marcin Baczyński
Hi,
the following piece of code produces different output on svn trunk and
gcc-4_4-branch:

#include 
int main()
{
struct { unsigned bar:1; } foo;

foo.bar = 0x1;

printf("%08x\n", (unsigned char)(foo.bar * 0xfe));
printf("%08x\n", (unsigned char)(foo.bar * 0xff));

return 0;
}

monst...@yggdrasil /data/tmp $ gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-svn/configure --enable-stage1-languages-c
--enable-languages=c,c++
Thread model: posix
gcc version 4.4.4 20100309 (prerelease) (GCC)
monst...@yggdrasil /data/tmp $ ./a.out
00fe
0001

monst...@yggdrasil /data/tmp $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-svn/configure --enable-stage1-languages-c
--enable-languages=c,c++
Thread model: posix
gcc version 4.5.0 20100309 (experimental) (GCC)
monst...@yggdrasil /data/tmp $ ./a.out
00fe
00ff

Is there something illegal in this code or is it a bug somewhere?

Thanks,
Marcin


Re: clearing many bytes variables (could use one machine instruction)?

2010-03-09 Thread Dave Korn
On 09/03/2010 17:42, Basile Starynkevitch wrote:

> I knew about vectorization (of which I am not an expert), and I didn't
> mention it, because in my view this is not exactly vectorization.

  Agreed.

> And I don't want to use an array of bytes for that purpose. I want to have a
> rather large number of individual variables.

> typedef char melt_flag_t; /* or perhaps bool */
> 
> {
>  /* typically dozens or even a hundred of cleared variables */
>melt_flag_t f0=0, f1=0, f2=0, f3=0, f4=0, f5=0, f6=0, f7=0, f8=0, f9=0;
> 
>  /* some complex code with conditionals and forward gotos, 
> where each above flag is set at most once, and may be tested many times */
> }
> 
> I don't want to use an array

> So I asked myself if GCC can clear efficiently a set of variables 
> (BTW, this is mandatory in Java), hence my initial question.
> 
> In my view, aggregating several small scalar variables inside a larger word
> data is not exactly vectorization (my perception of vectorization is that it
> is dealing with arrays). 

  Yes, I think so; aggregation is the right word for it.  Or maybe
scalarization.  If you wrap all these chars in a struct, can SRA handle it?

cheers,
  DaveK


Re: clearing many bytes variables (could use one machine instruction)?

2010-03-09 Thread Basile Starynkevitch

Dave Korn wrote:


  Yes, I think so; aggregation is the right word for it.  Or maybe
scalarization.  If you wrap all these chars in a struct, can SRA handle it?



I tried

###
/* file testmanychar.c */
extern void g (int, char *, char *, char *);

void
f (void)
{
  struct {
char x0, x1, x2, x3, x4, x5, x6, x7;
  } s = {0,0,0,0,0,0,0};
  /* assuming s.x0 is word aligned on a x86_64, and variables are
 bytes in memory, we could clear all the variables in one machine
 instruction */
  g (10, &s.x0, &s.x1, &s.x2);
  g (20, &s.x2, &s.x3, &s.x4);
  g (30, &s.x4, &s.x5, &s.x6);
  g (40, &s.x6, &s.x7, &s.x0);
}
###

I compiled it with
 gcc-trunk -S -std=gnu99 -O3 -fverbose-asm \
-march=core2 -mtune=core2 testmanychar.c

and I got:
###
.globl f
.type   f, @function
f:
.LFB0:
.cfi_startproc
movq%rbx, -24(%rsp) #,
movq%rbp, -16(%rsp) #,
movq%r12, -8(%rsp)  #,
movl$10, %edi   #,
subq$40, %rsp   #,
.cfi_def_cfa_offset 48
leaq2(%rsp), %rbp   #, tmp59
.cfi_offset 12, -16
.cfi_offset 6, -24
.cfi_offset 3, -32
leaq4(%rsp), %r12   #, tmp64
leaq1(%rsp), %rdx   #, tmp61
movq%rbp, %rcx  # tmp59,
movq%rsp, %rsi  #,
movq$0, (%rsp)  #, s
callg   #
leaq3(%rsp), %rdx   #, tmp66
movq%r12, %rcx  # tmp64,
movq%rbp, %rsi  # tmp59,
movl$20, %edi   #,
leaq6(%rsp), %rbp   #, tmp70
callg   #
leaq5(%rsp), %rdx   #, tmp72
movq%rbp, %rcx  # tmp70,
movq%r12, %rsi  # tmp64,
movl$30, %edi   #,
callg   #
leaq7(%rsp), %rdx   #, tmp77
movq%rsp, %rcx  #,
movq%rbp, %rsi  # tmp70,
movl$40, %edi   #,
callg   #
movq16(%rsp), %rbx  #,
movq24(%rsp), %rbp  #,
movq32(%rsp), %r12  #,
addq$40, %rsp   #,
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE0:
.size   f, .-f
.ident  "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk 
revision 157303]"


###

So it seems indeed that a structure can be cleared efficiently. However, 
having a structure has probably also some shortcommings: I would guess 
that it is much harder for GCC to keep my flags in registers only.


Regards.

--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: clearing many bytes variables (could use one machine instruction)?

2010-03-09 Thread Dave Korn
On 09/03/2010 18:42, Basile Starynkevitch wrote:
> Dave Korn wrote:
>>
>>   Yes, I think so; aggregation is the right word for it.  Or maybe
>> scalarization.  If you wrap all these chars in a struct, can SRA
>> handle it?
> 
> 
> I tried

> So it seems indeed that a structure can be cleared efficiently. However,
> having a structure has probably also some shortcommings: I would guess
> that it is much harder for GCC to keep my flags in registers only.

  Dunno.  Maybe you can get IPA-SRA to kick in and help you.

cheers,
  DaveK



Re: legitimate parallel make check?

2010-03-09 Thread IainS


On 9 Mar 2010, at 19:13, Janis Johnson wrote:

To run all of the compiler tests in parallel you can do "make -jn -k  
check-gcc" from the top level and let the existing build machinery  
take care of running chunks of tests in parallel and putting the  
results back together.


that's fine (I couldn't see why not from looking at Makefile.in)
 (maybe Rainer is right and I have a memory fault on my 8-core  
box .. :-(  ) -- just backing up prior to a single user check of  
that ...


..  I don't seem to get the bus errors on a 4CPU g5 or a Core 2 duo ..  
but .. the 8-core machine is faster .. so ... race conditions are more  
likely to manifest there.



You'd still have to handle the library tests you want to run.


I assume that " make -jn -k check-target RUNTESTFLAGS " is equally  
reasonable?


Perhaps what you want is a way to skip the host library testing,  
although if you're building them with your new compiler then testing  
them is a good part of testing that compiler.


I do build gmp/mpfr/mpc in-tree...
given that I almost always run the whole testsuite - I'm torn between  
thinking it's a good idea..
... and that I'm essentially proving little when the versions of gmp/ 
mpfr/and mpc are static.


--

FWIW: I'm trying to update contrib/btest.sh script to handle m32 & m64  
plus a few other things ...


Iain


Re: legitimate parallel make check?

2010-03-09 Thread NightStrike
On Tue, Mar 9, 2010 at 2:27 PM, IainS  wrote:
> I do build gmp/mpfr/mpc in-tree...

How?  Last I tried, it didn't work, as mpc used the system gmp/mpfr,
not the just-built in-tree versions.  Therefore, it's not really an
"in-tree" build, and you can't build on a system that doesn't already
have gmp/mpfr installed.  At least, that's what it was... does that
work now?


Re: legitimate parallel make check?

2010-03-09 Thread Rainer Orth
IainS  writes:

> ..  I don't seem to get the bus errors on a 4CPU g5 or a Core 2 duo .. but
> .. the 8-core machine is faster .. so ... race conditions are more  likely
> to manifest there.

But race conditions don't manifest themselves in make SEGVs ;-(  I'm
regularly running make -k -j128 on a T5220, or -j32 on a Sun Fire X4450
with 4 4-core CPUs, without any problems.

>> You'd still have to handle the library tests you want to run.
>
> I assume that " make -jn -k check-target RUNTESTFLAGS " is equally
> reasonable?

Indeed, as well as using a site.exp and pointing the DEJAGNU environment
variable at it, as I've learned from looking at the regression tester
sources in contrib/regression.  Here's what I use for that:

global target_list

case "$target_triplet" in {
{ "i?86-*-solaris2.1[0-9]" } {
# FIXME: Disable multilib testing if the host cannot execute AMD64
# binaries.  Should be exceedingly rare now (cf. erebus).
set target_list { "unix{,-m64}" }
}
{ "mips-sgi-irix6*" } {
# FIXME: Disable multilib testing if the host cannot execute N64
# binaries.  We cannot selectively disable only one multilib during
# the build.
set target_list { "unix{,-mabi=32,-mabi=64}" }
}
{ "sparc*-*-solaris2*" } {
set target_list { "unix{,-m64}" }
}
default {
# Works for alpha*-*-osf*, i?86-*-solaris2.[89] and mips-sgi-irix5*
# testing. 
set target_list { "unix" }
}
}

> I do build gmp/mpfr/mpc in-tree...
> given that I almost always run the whole testsuite - I'm torn between
> thinking it's a good idea..
> ... and that I'm essentially proving little when the versions of gmp/
> mpfr/and mpc are static.

I'd keep gmp/mpfr/mpc at a fixed version, test them once, and be done
with it.  Testing the compiler and runtime libraries on its own takes
enough time already.

> FWIW: I'm trying to update contrib/btest.sh script to handle m32 & m64 plus
> a few other things ...

No need: works already via DEJAGNU described above.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: legitimate parallel make check?

2010-03-09 Thread IainS


On 9 Mar 2010, at 19:31, NightStrike wrote:

On Tue, Mar 9, 2010 at 2:27 PM, IainS > wrote:

I do build gmp/mpfr/mpc in-tree...


How?  Last I tried, it didn't work, as mpc used the system gmp/mpfr,
not the just-built in-tree versions.  Therefore, it's not really an
"in-tree" build, and you can't build on a system that doesn't already
have gmp/mpfr installed.  At least, that's what it was... does that
work now?


it's been  fixed for a while (resolution of PR42424)
I keep one copy of gmp mpfr and mpc in a directory at the root of my  
source disk and ln -s to that from within the different gcc source  
trees.


Iain




Re: legitimate parallel make check?

2010-03-09 Thread IainS


On 9 Mar 2010, at 19:36, Rainer Orth wrote:


IainS  writes:

..  I don't seem to get the bus errors on a 4CPU g5 or a Core 2  
duo .. but
.. the 8-core machine is faster .. so ... race conditions are more   
likely

to manifest there.


But race conditions don't manifest themselves in make SEGVs ;-(  I'm
regularly running make -k -j128 on a T5220, or -j32 on a Sun Fire  
X4450


I'm running memchecks on my 8-core machine ..

Indeed, as well as using a site.exp and pointing the DEJAGNU  
environment

variable at it, as I've learned from looking at the regression tester
sources in contrib/regression.  Here's what I use for that:

global target_list

case "$target_triplet" in {
   { "i?86-*-solaris2.1[0-9]" } {
# FIXME: Disable multilib testing if the host cannot execute AMD64
# binaries.  Should be exceedingly rare now (cf. erebus).
set target_list { "unix{,-m64}" }
   }
   { "mips-sgi-irix6*" } {
# FIXME: Disable multilib testing if the host cannot execute N64
# binaries.  We cannot selectively disable only one multilib during
# the build.
set target_list { "unix{,-mabi=32,-mabi=64}" }
   }
   { "sparc*-*-solaris2*" } {
set target_list { "unix{,-m64}" }
   }
   default {
# Works for alpha*-*-osf*, i?86-*-solaris2.[89] and mips-sgi-irix5*
# testing.
   set target_list { "unix" }
   }
}


that's really neat indeed - I should have spotted the potential  
looking at the code in contrib

...  although the site.exp is hardwired in btest.sh at present ;
 I guess one might be able to use .dejagnurc - just need to check on  
the order that the files are processed.


FWIW: I'm trying to update contrib/btest.sh script to handle m32 &  
m64 plus

a few other things ...


No need: works already via DEJAGNU described above.


hmm.  I see that the make check would work using that neat method  
above...

... however,

1/ I want a different language list than the default...
2/ also some other different configure options...

but mainly
3/
...  As I read it -  btest.sh strips everything but the testname from  
the PASSES and FAILS files. (the awk lines that print $2 for a match  
of /FAIL: / etc.)
This means (e.g.) that if a test passes at m32 and fails at m64 I  
think it will appear in both PASSES and FAILS..

...  this seems destined to result in confusion...

I would like to  identify regressions separately for m32 and m64 -  
esp. in obj-c/c++ where there are huge differences on NeXT runtime  
systems.


if I've missed the blindingly obvious (the day job is taking up most  
of my mental resources ;-)) please point it out :-)


Iain


gcc-4.4-20100309 is now available

2010-03-09 Thread gccadmin
Snapshot gcc-4.4-20100309 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100309/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 157328

You'll find:

gcc-4.4-20100309.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.4-20100309.tar.bz2 C front end and core compiler

gcc-ada-4.4-20100309.tar.bz2  Ada front end and runtime

gcc-fortran-4.4-20100309.tar.bz2  Fortran front end and runtime

gcc-g++-4.4-20100309.tar.bz2  C++ front end and runtime

gcc-java-4.4-20100309.tar.bz2 Java front end and runtime

gcc-objc-4.4-20100309.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.4-20100309.tar.bz2The GCC testsuite

Diffs from 4.4-20100302 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.