------- Comment #28 from victork at gcc dot gnu dot org 2008-02-11 14:21 ------- > As for the last email, Victor: > 1. Using a smaller number of iterations, doesnt help me. This is not what > the > real world code runs.
Looks like in your example the memory subsystem is a performance bottleneck. Vectorization alone does not help. Probably you need to think how to partition your arrays to fit the data cache. > 2. new/malloc almost didnt do anything maybe a gain of 20% With data allocated my malloc compiler is able to prove independence statically. So, it would be better to alocate memory by malloc. > 3. The difference between 1.738sec and 0.781sec can either be a 2 times > performance gain or simply a 1 second gain that would remain 1 second for more > intensive calculations. Therefore I cant use/rely on the test you did. See an example in my previous comment. It is about 2.4 times performance gain. -- Victor -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117