On Thu, Apr 19, 2018 at 8:33 AM, Thomas Koenig <[email protected]> wrote:
> Hi Matt,
> [timings]
>
>> Intel AVX2:
>>
>> C_SW 1.4931
>> D_SW 5.4254
>> PG_D 1.0878
>> TRACER_2D 24.7418
>> REMAPPING 27.2644
>
>
>> Now I looked at GNU Fortran (7.3.0). Here my "stock" flags are quite
>> boring (and all flags, not just the optimization ones):
>
>
> [Various options elided, the best was]:
>
>> GNU Haswell NoFMA Repack:
>> C_SW 2.4350
>> D_SW 9.7109
>> PG_D 0.7869
>> TRACER_2D 163.6474
>> REMAPPING 100.6820
>>
>> So, my questions to you gurus are: Is there something I could try adding
>> to my gfortran options that might help with this discrepancy between Intel
>> AVX2 and GCC? Or perhaps I need to *remove* something (some flag kills the
>> vectorizer)?
>
> The gcc 8 release is just around the corner, and a lot of improvements
> have been made to code generation, also for AVX2. You might want to give
> the current trunk (or the soon-to-be-released) release candidate, or the
> then newly released gcc8 a spin.
>
> Second, this performance gap with respect to Intel (a factor of 6.6 for
> your TRACER_2D routine) is dramatic. If anything like this persists in gcc8,
> the only way to get this fixed is to submit a bug report.
> Profile the code, try to reduce the code to something that shows
> the the problem (and that you can put in a bug report).
Dependent on what those routines do (do they call math intrinsics like
sin or cos?) ICC has an advantage with a highly optimized vectorized
math library. You can use that from gfortran as well by using
-mveclibabi=svml and linking against libsvml.{a,so} which comes with ICC.
Unfortunately gfortran cannot exercise glibcs libmvec at the moment.
Richard.
> Regards
>
> Thomas