[Bug c++/35117] Vectorization on power PC

2008-02-14 Thread victork at gcc dot gnu dot org
--- Comment #34 from victork at gcc dot gnu dot org 2008-02-14 13:41 --- > How do I resolve those issues? which might prevent from the vectorized code to > run and therefore I dont see a bigger performance improvement? > I'd appriciate any assistance... This note is just information an

[Bug c++/35117] Vectorization on power PC

2008-02-13 Thread eyal at geomage dot com
--- Comment #33 from eyal at geomage dot com 2008-02-13 16:06 --- Hi All, I've done some changes that hopefully prevent the memory from being a performance bottleneck. I see a perf gain of ~10%. However the compiler still gives me the warnings in comment #19 - Test.cpp:24: note: versi

[Bug c++/35117] Vectorization on power PC

2008-02-12 Thread eyal at geomage dot com
--- Comment #32 from eyal at geomage dot com 2008-02-12 11:28 --- (In reply to comment #31) > > I would appriciate, however, a further explaination about this issue. > The explanation has to deal with CPU architecture and is not related to > compilers. In case of cache miss the memory l

[Bug c++/35117] Vectorization on power PC

2008-02-12 Thread victork at gcc dot gnu dot org
--- Comment #31 from victork at gcc dot gnu dot org 2008-02-12 10:51 --- > I would appriciate, however, a further explaination about this issue. The explanation has to deal with CPU architecture and is not related to compilers. In case of cache miss the memory load and store take tens

[Bug c++/35117] Vectorization on power PC

2008-02-12 Thread eyal at geomage dot com
--- Comment #30 from eyal at geomage dot com 2008-02-12 08:43 --- Hi, Thanks a lot for the input about a potential memory bottle-neck. I indeed was under the impression that once I got the loop vectorized, I'd immidiatly see a performance boost. I would appriciate, however, a further

[Bug c++/35117] Vectorization on power PC

2008-02-11 Thread dberlin at gcc dot gnu dot org
--- Comment #29 from dberlin at gcc dot gnu dot org 2008-02-11 16:29 --- Vectorization is not magic. I'm also not sure where you got the idea that vectorization = magic speedup There is no real "expected performance gain" on memory bound applications because the processor spends all of

[Bug c++/35117] Vectorization on power PC

2008-02-11 Thread victork at gcc dot gnu dot org
--- Comment #28 from victork at gcc dot gnu dot org 2008-02-11 14:21 --- > As for the last email, Victor: > 1. Using a smaller number of iterations, doesnt help me. This is not what > the > real world code runs. Looks like in your example the memory subsystem is a performance bott

[Bug c++/35117] Vectorization on power PC

2008-02-11 Thread eyal at geomage dot com
--- Comment #27 from eyal at geomage dot com 2008-02-11 14:00 --- Hi, I am a bit lost and appriciate your guidelines. Up till now, after all those emails, I still have no clue as to why such a simple test case doesnt work. As far as I understood the vectorization should have shown betw

[Bug c++/35117] Vectorization on power PC

2008-02-11 Thread victork at gcc dot gnu dot org
--- Comment #26 from victork at gcc dot gnu dot org 2008-02-11 13:41 --- Probably, the small difference between vectorized and non-vectorized versions can be explained by the fact that big arrays do not fit the memory cache. Here is the version of the original program which shows that m

[Bug c++/35117] Vectorization on power PC

2008-02-11 Thread irar at il dot ibm dot com
--- Comment #25 from irar at il dot ibm dot com 2008-02-11 13:35 --- (In reply to comment #21) > (In reply to comment #14) > > Giving it another thought, this is not necessary an alias analysis issue, > > even > > that it fails to tell that the pointers not alias. Since in this case the

[Bug c++/35117] Vectorization on power PC

2008-02-11 Thread victork at gcc dot gnu dot org
--- Comment #24 from victork at gcc dot gnu dot org 2008-02-11 12:23 --- Hi, Here are some more of my observations. 1. For some unclear reason there is indeed no much difference between vectorized and non-vectorized versions for long runs like "time ./TestNoVec 92200 8 89720 1000", but

[Bug c++/35117] Vectorization on power PC

2008-02-10 Thread eyal at geomage dot com
--- Comment #23 from eyal at geomage dot com 2008-02-10 15:47 --- (In reply to comment #22) > 1. It looks like vectorizer was enabled in both cases, since -O3 enables the > vectorizer by the default. You need to add -fno-tree-vectorize to disable it > explicitly. > 2. To get better resul

[Bug c++/35117] Vectorization on power PC

2008-02-10 Thread victork at gcc dot gnu dot org
--- Comment #22 from victork at gcc dot gnu dot org 2008-02-10 15:06 --- 1. It looks like vectorizer was enabled in both cases, since -O3 enables the vectorizer by the default. You need to add -fno-tree-vectorize to disable it explicitly. 2. To get better results from vectorized versio

[Bug c++/35117] Vectorization on power PC

2008-02-10 Thread eyal at geomage dot com
--- Comment #21 from eyal at geomage dot com 2008-02-10 13:48 --- (In reply to comment #14) > Giving it another thought, this is not necessary an alias analysis issue, even > that it fails to tell that the pointers not alias. Since in this case the > pointers do differ, the runtime test

[Bug c++/35117] Vectorization on power PC

2008-02-09 Thread eyal at geomage dot com
--- Comment #20 from eyal at geomage dot com 2008-02-10 07:56 --- Hi, I've tried putting the loop to be vectorized in a different method and the compiler output looks better, but the performance is still the same as the non-vectorized code. #include #include #include typedef float

[Bug c++/35117] Vectorization on power PC

2008-02-09 Thread eyal at geomage dot com
--- Comment #19 from eyal at geomage dot com 2008-02-10 07:42 --- Hi, This is the simplest test I have. #include #include #include typedef float ARRTYPE; int main ( int argc, char *argv[] ) { int m_nSamples = atoi( argv[1] ); int itBegin = atoi( argv[2] );

[Bug c++/35117] Vectorization on power PC

2008-02-09 Thread eres at il dot ibm dot com
--- Comment #18 from eres at il dot ibm dot com 2008-02-10 07:30 --- > To further optimize this loop we would probably want to overlap the store with > subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help > with > that. I intend to test the impact of -fmodulo-sche

[Bug c++/35117] Vectorization on power PC

2008-02-08 Thread eyal at geomage dot com
--- Comment #17 from eyal at geomage dot com 2008-02-08 08:58 --- > Using malloc instead of new does generate better code and improves performance > slightly for me, admittedly not as much as we would like; the kernel becomes: > (using only -O3 -S -m64 -maltivec) > .L29: > lvx 13

[Bug c++/35117] Vectorization on power PC

2008-02-08 Thread eyal at geomage dot com
--- Comment #16 from eyal at geomage dot com 2008-02-08 08:55 --- Thanks a lot Ira, I appriciate it. If you need the full test code with .vect file and makefiles,please let me know. thanks, eyal -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117

[Bug c++/35117] Vectorization on power PC

2008-02-08 Thread zaks at il dot ibm dot com
--- Comment #15 from zaks at il dot ibm dot com 2008-02-08 08:49 --- (In reply to comment #5) > (In reply to comment #3) > > I think this is a dup of another bug I filed with respect of the builtin > > operator new that getting the malloc attribute. > Are you refering to using malloc ins

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread irar at il dot ibm dot com
--- Comment #14 from irar at il dot ibm dot com 2008-02-07 20:44 --- Giving it another thought, this is not necessary an alias analysis issue, even that it fails to tell that the pointers not alias. Since in this case the pointers do differ, the runtime test should take the flow to the v

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread irar at il dot ibm dot com
--- Comment #13 from irar at il dot ibm dot com 2008-02-07 13:22 --- CC'ing Daniel and Diego, maybe they can help with the alias analysis issues. -- irar at il dot ibm dot com changed: What|Removed |Added ---

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread eyal at geomage dot com
--- Comment #12 from eyal at geomage dot com 2008-02-07 13:07 --- (In reply to comment #11) > (In reply to comment #10) > > Is there some pragma or a coding convention I can use to make the compiler > > understant those pointers have nothing to do with each other? > There is __restrict__

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread irar at il dot ibm dot com
--- Comment #11 from irar at il dot ibm dot com 2008-02-07 13:04 --- (In reply to comment #10) > Is there some pragma or a coding convention I can use to make the compiler > understant those pointers have nothing to do with each other? There is __restrict__, but it is useful only for fu

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread eyal at geomage dot com
--- Comment #10 from eyal at geomage dot com 2008-02-07 12:58 --- (In reply to comment #9) > (In reply to comment #8) > > { > > float *pTempSumPhase_Temp_cre_angle = (float*) malloc (sizeof(float) > > *m_nSamples); > > float *pTempSum2Phase_Temp_cre_angle = (float*) mallo

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread irar at il dot ibm dot com
--- Comment #9 from irar at il dot ibm dot com 2008-02-07 12:54 --- (In reply to comment #8) > { > float *pTempSumPhase_Temp_cre_angle = (float*) malloc (sizeof(float) > *m_nSamples); > float *pTempSum2Phase_Temp_cre_angle = (float*) malloc (sizeof(float) > *m_nSamples);

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread eyal at geomage dot com
--- Comment #8 from eyal at geomage dot com 2008-02-07 12:16 --- Hi Ira, Here is the compiler output for the real code. Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references *D.86651_134 and *D.8_160 Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread eyal at geomage dot com
--- Comment #7 from eyal at geomage dot com 2008-02-07 11:06 --- (In reply to comment #6) > (In reply to comment #2) > > Yes the loop is vectorized. > ... > > Eyal.cpp:34: note: created 9 versioning for alias checks. > > Eyal.cpp:34: note: LOOP VECTORIZED.(get_loop_exit_condition > The

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread irar at il dot ibm dot com
--- Comment #6 from irar at il dot ibm dot com 2008-02-07 10:53 --- (In reply to comment #2) > Yes the loop is vectorized. ... > Eyal.cpp:34: note: created 9 versioning for alias checks. > Eyal.cpp:34: note: LOOP VECTORIZED.(get_loop_exit_condition The vectorizer created runtime check

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread eyal at geomage dot com
--- Comment #5 from eyal at geomage dot com 2008-02-07 10:43 --- (In reply to comment #3) > I think this is a dup of another bug I filed with respect of the builtin > operator new that getting the malloc attribute. Are you refering to using malloc instead of new? using malloc didnt mak

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread pinskia at gcc dot gnu dot org
--- Comment #4 from pinskia at gcc dot gnu dot org 2008-02-07 10:40 --- That is PR 23383. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread pinskia at gcc dot gnu dot org
--- Comment #3 from pinskia at gcc dot gnu dot org 2008-02-07 10:37 --- I think this is a dup of another bug I filed with respect of the builtin operator new that getting the malloc attribute. -- pinskia at gcc dot gnu dot org changed: What|Removed

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread eyal at geomage dot com
--- Comment #2 from eyal at geomage dot com 2008-02-07 10:36 --- Yes the loop is vectorized. What do you mean by memory bound? dont you think that vectorization can help here? I see around 20% performance gain in the real application. Bellow is the compiler output: Eyal.cpp:34: note: de

[Bug c++/35117] Vectorization on power PC

2008-02-07 Thread rguenth at gcc dot gnu dot org
--- Comment #1 from rguenth at gcc dot gnu dot org 2008-02-07 10:29 --- The testcase looks completely memory bound. Does the compiler tell you it does vectorization at all? Have you tried without -fprefetch-loop-arrays (with todays HW prefetchers and the simple access patterns it's pro