Fleaser

2010-01-02 Thread Fleaser
Hello,
we are a small team and would need  your help,just click and you've already 
helped.We thanks in advance.

Look at our  website:
http://www.fleaser.com

Follow us on Twitter
http://twitter.com/fleaser


Send this message to your friends

If you already got mail delete it


Thanks for your help

Greetings

Fleaser Team


Re: Big regression showing up on darwin

2010-01-02 Thread Dave Korn
Andrew Pinski wrote:
> On Fri, Jan 1, 2010 at 7:07 AM, FX  wrote:
>> I know something is going on with section names, so I thought I'd mention 
>> that there is a big regression on darwin (most "-flto -fwhopr -O2" tests 
>> fail) at rev. 155544. An example is:
> 
> Really lto should be disabled when targeting darwin.  See PR 41529.

  Also this particular error is caused by the asterisks-in-DECL_ASSEMBLER_NAME
problem (PR42531) that should be fixed at r.15.

cheers,
  DaveK



Re: Please update GNU GCC mirror list

2010-01-02 Thread Gerald Pfeifer
On Wed, 16 Dec 2009, JohnT wrote:
> Some of the sites listed on the mirror list 
> http://gcc.gnu.org/mirrors.html aren't up to date and some aren't 
> accessible. LaffeyComputer.com doesn't allow access, and used to
> require a password for access. This isn't the way a GNU mirror site
> ought to operate. There should be free public access.

Thanks for the report, John.  I regularily run a link checker that
also covers our mirror sites.  That is not as easy as it may seem
at first since some mirrors only provide local access within their
geography and removing them based on my testing from one place on
the planet would be premature.

That said, mirrors.laffeycomputer.com may indeed not be active 
anymore.  Let me include mirrormas...@laffey.biz, the documented
contact for that mirror.

mirrormas...@laffey.biz, would you mind letting us know about the
status of your GCC mirror site?  Indeed I'm not able to access this
from any machine I am trying, always running into a timeout.

Gerald


Re: WTF?

2010-01-02 Thread Gerald Pfeifer
On Wed, 25 Nov 2009, Dave Korn wrote:
> But does it, though?  From http://gcc.gnu.org/svnwrite.html:
>[...]
> So, where are whitespace changes to non-comment parts of .c and .h 
> source files covered?  I think that there may be a bit of a common 
> assumption that "obvious" extends somewhat further than the wording of 
> the documentation actually implies - not just in the context of this 
> incident, but the question has occurred to me in other cases too, and 
> maybe now would be a good time to clear it up.

So...

On Wed, 25 Nov 2009, Kaveh R. Ghazi wrote:
> I agree the wording could be better.

...does one of you have a suggestion on how to improve the wording?

The svnwrite.html page never was ment to be "the law", more like
documenting best practises and some rules of thumb, but of course
improvements will be welcome.

Gerald


Re: The "right way" to handle alignment of pointer targets in the compiler?

2010-01-02 Thread Benjamin Redelings I

Thanks for the information!


How many people would take advantage of special machinery for some old
CPU, if that's your goal?
Some, but I suppose the old machinery will be gone eventually.  But, 
yes, I am most interested in current processors.





On CPUs introduced in the last 2 years, movupd should be as fast as
movapd,

OK, I didn't know this.  Thanks for the information!

 and -mtune=barcelona should work well in general, not only in

this example.
The bigger difference in performance, for longer loops, would come with
further batching of sums, favoring loop lengths of multiples of 4 (or 8,
with unrolling). That alignment already favors a fairly long loop.

As you're using C++, it seems you could have used inner_product() rather
than writing out a function.


That was a reduced test case. The code that I'm modifying is doing two 
simultaneous inner products with the same number of iterations:


for (int j = 0; j < kStateCount; j++) {
sum1 += matrices1w[j] * partials1v[j];
sum2 += matrices2w[j] * partials2v[j];
}

I tried using two separate calls to inner_product, and it turns out to 
be slightly slower.  GCC does not fuse the loops.



My Core I7 showed matrix multiply 25x25 times 25x100 producing 17Gflops
with gfortran in-line code. g++ produces about 80% of that.


So, one reason that I incorrectly assumed that movapd is necessary for 
good performance is because the SSE code is actually being matched in 
performance by non-SSE code - on a core2 processor and the x86_64 abi. 
I expected the SSE code to be two times faster, if vectorization was 
working, since I am using double precision.  But perhaps SSE should not 
be expected to give (much) of a performance advantage here?


For a recent gcc 4.5 with CXXFLAGS="-O3 -ffast-math -fno-tree-vectorize 
-march=native -mno-sse2 -mno-sse3 -mno-sse4" I got this code for the 
inner loop:


be00:   dd 04 07fldl   (%rdi,%rax,1)
be03:   dc 0c 01fmull  (%rcx,%rax,1)
be06:   de c1   faddp  %st,%st(1)
be08:   dd 04 06fldl   (%rsi,%rax,1)
be0b:   dc 0c 02fmull  (%rdx,%rax,1)
be0e:   48 83 c0 08 add$0x8,%rax
be12:   de c2   faddp  %st,%st(2)
be14:   4c 39 c0cmp%r8,%rax
be17:   75 e7   jnebe00

Using alternative CXXFLAGS="-O3 -march=native -g -ffast-math 
-mtune=generic" I get:


 1f1:   66 0f 57 c9 xorpd  %xmm1,%xmm1
 1f5:   31 c0   xor%eax,%eax
 1f7:   31 d2   xor%edx,%edx
 1f9:   66 0f 28 d1 movapd %xmm1,%xmm2
 1fd:   0f 1f 00nopl   (%rax)

 200:   f2 42 0f 10 1c 10   movsd  (%rax,%r10,1),%xmm3
 206:   83 c2 01add$0x1,%edx
 209:   f2 42 0f 10 24 00   movsd  (%rax,%r8,1),%xmm4
 20f:   66 41 0f 16 5c 02 08movhpd 0x8(%r10,%rax,1),%xmm3
 216:   66 42 0f 16 64 00 08movhpd 0x8(%rax,%r8,1),%xmm4
 21d:   66 0f 28 c3 movapd %xmm3,%xmm0
 221:   f2 41 0f 10 1c 03   movsd  (%r11,%rax,1),%xmm3
 227:   66 0f 59 c4 mulpd  %xmm4,%xmm0
 22b:   66 41 0f 16 5c 03 08movhpd 0x8(%r11,%rax,1),%xmm3
 232:   f2 42 0f 10 24 08   movsd  (%rax,%r9,1),%xmm4
 238:   66 42 0f 16 64 08 08movhpd 0x8(%rax,%r9,1),%xmm4
 23f:   48 83 c0 10 add$0x10,%rax
 243:   39 ea   cmp%ebp,%edx
 245:   66 0f 58 d0 addpd  %xmm0,%xmm2
 249:   66 0f 28 c3 movapd %xmm3,%xmm0
 24d:   66 0f 59 c4 mulpd  %xmm4,%xmm0
 251:   66 0f 58 c8 addpd  %xmm0,%xmm1
 255:   72 a9   jb 200

 257:   44 39 f3cmp%r14d,%ebx
 25a:   66 0f 7c c9 haddpd %xmm1,%xmm1
 25e:   44 89 f0mov%r14d,%eax
 261:   66 0f 7c d2 haddpd %xmm2,%xmm2

(Note the presence of movsd / movhpd instead of movupd.)

So... should I expect the SSE code to be any faster?  If not, could you 
possibly say why not?  Are there other operations (besided inner 
products) where SSE code would actually be expected to be faster?


-BenRI


Re: The "right way" to handle alignment of pointer targets in the compiler?

2010-01-02 Thread Tim Prince

Benjamin Redelings I wrote:

Thanks for the information!


Here are several reasons (there are more) why gcc uses 64-bit loads by 
default:
1) For a single dot product, the rate of 64-bit data loads roughly 
balances the latency of adds to the same register. Parallel dot products 
(using 2 accumulators) would take advantage of faster 128-bit loads.
2) run-time checks to adjust alignment, if possible, don't pay off for 
loop counts < about 40.
3) several obsolete CPU architectures implemented 128-bit loads by pairs 
of 64-bit loads.
4) 64-bit loads were generally more efficient than movupd, prior to 
barcelona.


In the case you quote, with parallel dot products, 128-bit loads would 
be required so as to show much performance gain over x87.


GCC aliasing rules: more aggressive than C99?

2010-01-02 Thread Joshua Haberman
The aliasing policies that GCC implements seem to be more strict than
what is in the C99 standard.  I am wondering if this is true or whether
I am mistaken (I am not an expert on the standard, so the latter is
definitely possible).

The relevant text is:

  An object shall have its stored value accessed only by an lvalue
  expression that has one of the following types:

  * a type compatible with the effective type of the object,
  [...]
  * an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or

To me this allows the following:

  int i;
  union u { int x; } *pu = (union u*)&i;
  printf("%d\n", pu->x);

In this example, the object "i", which is of type "int", is having its
stored value accessed by an lvalue expression of type "union u", which
includes the type "int" among its members.

I have seen other articles that interpret the standard in this way.
See section "Casting through a union (2)" from this article, which
claims that casts of this sort are legal and that GCC's warnings
against them are false positives:
  
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

However, this appears to be contrary to GCC's documentation.  From the
manpage:

  Similarly, access by taking the address, casting the resulting
  pointer and dereferencing the result has undefined behavior, even
  if the cast uses a union type, e.g.:

  int f() {
double d = 3.0; 
return ((union a_union *) &d)->i;
  }

I have also been able to experimentally verify that GCC will mis-compile
this fragment if we expect the behavior the standard specifies:
  int g;
  struct A { int x; };
  int foo(struct A *a) {
if(g) a->x = 5;
return g;
  }

With GCC 4.3.3 -O3 on x86-64 (Ubuntu), g is only loaded once:

 :
   0:   8b 05 00 00 00 00   moveax,DWORD PTR [rip+0x0]# 6 

   6:   85 c0   test   eax,eax
   8:   74 06   je 10 
   a:   c7 07 05 00 00 00   movDWORD PTR [rdi],0x5
  10:   f3 c3   repz ret

But this is incorrect if foo() was called as:
  
  foo((struct A*)&g);

Here is another example:
  
  struct A { int x; };
  struct B { int x; }; 
  int foo(struct A *a, struct B *b) { 
if(a->x) b->x = 5;
return a->x;
  }

When I compile this, a->x is only loaded once, even though foo()
could have been called like this:
  
  int i;
  foo((struct A*)&i, (struct B*)&i);

>From this I conclude that GCC diverges from the standard, in that it does not
allow casts of this sort.  In one sense this is good (because the policy GCC
implements is more aggressive, and yet still reasonable) but on the other hand
it means (if I am not mistaken) that GCC will incorrectly optimize strictly
conforming programs.

Clarifications are most welcome!

Josh