--- Comment #34 from victork at gcc dot gnu dot org 2008-02-14 13:41
---
> How do I resolve those issues? which might prevent from the vectorized code to
> run and therefore I dont see a bigger performance improvement?
> I'd appriciate any assistance...
This note is just information an
--- Comment #33 from eyal at geomage dot com 2008-02-13 16:06 ---
Hi All,
I've done some changes that hopefully prevent the memory from being a
performance bottleneck. I see a perf gain of ~10%. However the compiler still
gives me the warnings in comment #19 -
Test.cpp:24: note: versi
--- Comment #32 from eyal at geomage dot com 2008-02-12 11:28 ---
(In reply to comment #31)
> > I would appriciate, however, a further explaination about this issue.
> The explanation has to deal with CPU architecture and is not related to
> compilers. In case of cache miss the memory l
--- Comment #31 from victork at gcc dot gnu dot org 2008-02-12 10:51
---
> I would appriciate, however, a further explaination about this issue.
The explanation has to deal with CPU architecture and is not related to
compilers. In case of cache miss the memory load and store take tens
--- Comment #30 from eyal at geomage dot com 2008-02-12 08:43 ---
Hi,
Thanks a lot for the input about a potential memory bottle-neck. I indeed was
under the impression that once I got the loop vectorized, I'd immidiatly see a
performance boost.
I would appriciate, however, a further
--- Comment #29 from dberlin at gcc dot gnu dot org 2008-02-11 16:29
---
Vectorization is not magic.
I'm also not sure where you got the idea that vectorization = magic speedup
There is no real "expected performance gain" on memory bound applications
because the processor spends all of
--- Comment #28 from victork at gcc dot gnu dot org 2008-02-11 14:21
---
> As for the last email, Victor:
> 1. Using a smaller number of iterations, doesnt help me. This is not what
> the
> real world code runs.
Looks like in your example the memory subsystem is a performance bott
--- Comment #27 from eyal at geomage dot com 2008-02-11 14:00 ---
Hi,
I am a bit lost and appriciate your guidelines. Up till now, after all those
emails, I still have no clue as to why such a simple test case doesnt work. As
far as I understood the vectorization should have shown betw
--- Comment #26 from victork at gcc dot gnu dot org 2008-02-11 13:41
---
Probably, the small difference between vectorized and non-vectorized versions
can be explained by the fact that big arrays do not fit the memory cache.
Here is the version of the original program which shows that m
--- Comment #25 from irar at il dot ibm dot com 2008-02-11 13:35 ---
(In reply to comment #21)
> (In reply to comment #14)
> > Giving it another thought, this is not necessary an alias analysis issue,
> > even
> > that it fails to tell that the pointers not alias. Since in this case the
--- Comment #24 from victork at gcc dot gnu dot org 2008-02-11 12:23
---
Hi,
Here are some more of my observations.
1. For some unclear reason there is indeed no much difference between
vectorized and non-vectorized versions for long runs like "time ./TestNoVec
92200 8 89720 1000", but
--- Comment #23 from eyal at geomage dot com 2008-02-10 15:47 ---
(In reply to comment #22)
> 1. It looks like vectorizer was enabled in both cases, since -O3 enables the
> vectorizer by the default. You need to add -fno-tree-vectorize to disable it
> explicitly.
> 2. To get better resul
--- Comment #22 from victork at gcc dot gnu dot org 2008-02-10 15:06
---
1. It looks like vectorizer was enabled in both cases, since -O3 enables the
vectorizer by the default. You need to add -fno-tree-vectorize to disable it
explicitly.
2. To get better results from vectorized versio
--- Comment #21 from eyal at geomage dot com 2008-02-10 13:48 ---
(In reply to comment #14)
> Giving it another thought, this is not necessary an alias analysis issue, even
> that it fails to tell that the pointers not alias. Since in this case the
> pointers do differ, the runtime test
--- Comment #20 from eyal at geomage dot com 2008-02-10 07:56 ---
Hi,
I've tried putting the loop to be vectorized in a different method and the
compiler output looks better, but the performance is still the same as the
non-vectorized code.
#include
#include
#include
typedef float
--- Comment #19 from eyal at geomage dot com 2008-02-10 07:42 ---
Hi,
This is the simplest test I have.
#include
#include
#include
typedef float ARRTYPE;
int main ( int argc, char *argv[] )
{
int m_nSamples = atoi( argv[1] );
int itBegin = atoi( argv[2] );
--- Comment #18 from eres at il dot ibm dot com 2008-02-10 07:30 ---
> To further optimize this loop we would probably want to overlap the store with
> subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help
> with
> that.
I intend to test the impact of -fmodulo-sche
--- Comment #17 from eyal at geomage dot com 2008-02-08 08:58 ---
> Using malloc instead of new does generate better code and improves performance
> slightly for me, admittedly not as much as we would like; the kernel becomes:
> (using only -O3 -S -m64 -maltivec)
> .L29:
> lvx 13
--- Comment #16 from eyal at geomage dot com 2008-02-08 08:55 ---
Thanks a lot Ira, I appriciate it.
If you need the full test code with .vect file and makefiles,please let me
know.
thanks,
eyal
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117
--- Comment #15 from zaks at il dot ibm dot com 2008-02-08 08:49 ---
(In reply to comment #5)
> (In reply to comment #3)
> > I think this is a dup of another bug I filed with respect of the builtin
> > operator new that getting the malloc attribute.
> Are you refering to using malloc ins
--- Comment #14 from irar at il dot ibm dot com 2008-02-07 20:44 ---
Giving it another thought, this is not necessary an alias analysis issue, even
that it fails to tell that the pointers not alias. Since in this case the
pointers do differ, the runtime test should take the flow to the v
--- Comment #13 from irar at il dot ibm dot com 2008-02-07 13:22 ---
CC'ing Daniel and Diego, maybe they can help with the alias analysis issues.
--
irar at il dot ibm dot com changed:
What|Removed |Added
---
--- Comment #12 from eyal at geomage dot com 2008-02-07 13:07 ---
(In reply to comment #11)
> (In reply to comment #10)
> > Is there some pragma or a coding convention I can use to make the compiler
> > understant those pointers have nothing to do with each other?
> There is __restrict__
--- Comment #11 from irar at il dot ibm dot com 2008-02-07 13:04 ---
(In reply to comment #10)
> Is there some pragma or a coding convention I can use to make the compiler
> understant those pointers have nothing to do with each other?
There is __restrict__, but it is useful only for fu
--- Comment #10 from eyal at geomage dot com 2008-02-07 12:58 ---
(In reply to comment #9)
> (In reply to comment #8)
> > {
> > float *pTempSumPhase_Temp_cre_angle = (float*) malloc (sizeof(float)
> > *m_nSamples);
> > float *pTempSum2Phase_Temp_cre_angle = (float*) mallo
--- Comment #9 from irar at il dot ibm dot com 2008-02-07 12:54 ---
(In reply to comment #8)
> {
> float *pTempSumPhase_Temp_cre_angle = (float*) malloc (sizeof(float)
> *m_nSamples);
> float *pTempSum2Phase_Temp_cre_angle = (float*) malloc (sizeof(float)
> *m_nSamples);
--- Comment #8 from eyal at geomage dot com 2008-02-07 12:16 ---
Hi Ira,
Here is the compiler output for the real code.
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86651_134 and *D.8_160
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check
--- Comment #7 from eyal at geomage dot com 2008-02-07 11:06 ---
(In reply to comment #6)
> (In reply to comment #2)
> > Yes the loop is vectorized.
> ...
> > Eyal.cpp:34: note: created 9 versioning for alias checks.
> > Eyal.cpp:34: note: LOOP VECTORIZED.(get_loop_exit_condition
> The
--- Comment #6 from irar at il dot ibm dot com 2008-02-07 10:53 ---
(In reply to comment #2)
> Yes the loop is vectorized.
...
> Eyal.cpp:34: note: created 9 versioning for alias checks.
> Eyal.cpp:34: note: LOOP VECTORIZED.(get_loop_exit_condition
The vectorizer created runtime check
--- Comment #5 from eyal at geomage dot com 2008-02-07 10:43 ---
(In reply to comment #3)
> I think this is a dup of another bug I filed with respect of the builtin
> operator new that getting the malloc attribute.
Are you refering to using malloc instead of new?
using malloc didnt mak
--- Comment #4 from pinskia at gcc dot gnu dot org 2008-02-07 10:40 ---
That is PR 23383.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117
--- Comment #3 from pinskia at gcc dot gnu dot org 2008-02-07 10:37 ---
I think this is a dup of another bug I filed with respect of the builtin
operator new that getting the malloc attribute.
--
pinskia at gcc dot gnu dot org changed:
What|Removed
--- Comment #2 from eyal at geomage dot com 2008-02-07 10:36 ---
Yes the loop is vectorized. What do you mean by memory bound? dont you think
that vectorization can help here? I see around 20% performance gain in the real
application.
Bellow is the compiler output:
Eyal.cpp:34: note: de
--- Comment #1 from rguenth at gcc dot gnu dot org 2008-02-07 10:29 ---
The testcase looks completely memory bound. Does the compiler tell you it
does vectorization at all? Have you tried without -fprefetch-loop-arrays
(with todays HW prefetchers and the simple access patterns it's pro
34 matches
Mail list logo