One case where ICC can generate much faster code sometimes is by using
the nontemporal pragma [https://software.intel.com/en-us/node/524559]
with loops.
AFAIK, there's no such equivalent pragma in gcc
[https://gcc.gnu.org/ml/gcc/2012-01/msg00028.html].
When I tried this simple example
https://github.com/rnburn/square_timing/blob/master/bench.cpp that
measures times for this loop:
void compute(const double* x, index_t N, double* y) {
#pragma vector nontemporal
for(index_t i=0; i wrote:
> Dear Paul,
>
> The opinion you've mentioned is common in scientific community. However, in
> more detail it often surfaces that the used set of GCC compiler options
> simply does not correspond to that "fast" version of Intel. For instance,
> when you do "-O3" for Intel it actually corresponds to (at least) "-O3
> -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously
> introduces significant performance gap.
>
> Kind regards,
> - Dmitry Mikushin | Applied Parallel Computing LLC |
> https://parallel-computing.pro
>
>
> 2018-06-06 18:51 GMT+03:00 Paul Menzel :
>
>> Dear GCC folks,
>>
>>
>> Some scientists in our organization still want to use the Intel compiler,
>> as they say, it produces faster code, which is then executed on clusters.
>> Some resources on the Web [1][2] confirm this. (I am aware, that it’s
>> heavily dependent on the actual program.)
>>
>> My question is, is it realistic, that GCC could catch up and that the
>> scientists will start to use it over Intel’s compiler? Or will Intel
>> developers always have the lead, because they have secret documentation and
>> direct contact with the processor designers?
>>
>> If it is realistic, how can we get there? Would first the program be
>> written, and then the compiler be optimized for that? Or are just more GCC
>> developers needed?
>>
>>
>> Kind regards,
>>
>> Paul
>>
>>
>> [1]: https://colfaxresearch.com/compiler-comparison/
>> [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679
>> .1280&rep=rep1&type=pdf
>>
>>