On Wed, Jun 20, 2012 at 12:47 AM, Walter Landry <wlan...@caltech.edu> wrote:
> Richard Guenther <richard.guent...@gmail.com> wrote:
>> On Fri, Jun 15, 2012 at 12:54 AM, Walter Landry <wlan...@caltech.edu> wrote:
>>> Hello Everyone,
>>>
>>> I thought you might be interested in some C++ expression template
>>> benchmarks I have done.
>>>
>>>  http://www.wlandry.net/Projects/FTensor#Benchmarks
>>>
>>> I found that GCC optimized the expression template code better than
>>> unrolling expressions by hand.  In fact, GCC was far, far better at
>>> optimizing code with expression templates than any other compiler.  I
>>> ran the same benchmarks back in 2003, and GCC has improved quite a lot
>>> since then.
>>
>> Heh, yeah - quite possibly because I myself was working with a POOMA
>> based CFD code during my PhD which made me start working on inproving
>> GCC for expression template code ;)  It is btw interesting to try to enable
>> profile-feedback for the compilers - for some compilers you'll see that
>> the profile-generating executables are so slow as to be unusable (as they
>> seem to keep all calls of the expression templates).
>
> I got around to trying profile guided optimization.  For GCC it did
> not make much difference, but for Intel it made a huge improvement for
> the expression template code.  Of course, the training executable ran
> 20 times slower.  But that was better than the Open64 compiler which
> was too slow for me to get results.

That's good to hear - my experience with ICC (I think it was 9.x) was even
worse, a slowdown of a factor of 1000 or so which made PGO impractical, too.
Impractical PGO are usually a sign that PGO instrumentation is done before
any inlining happens.

>  I have added a section on PGO.
>
>  http://www.wlandry.net/Projects/FTensor#PGO
>
> I also added results from Open64 and Pathscale's ENZO.

Thanks,
Richard.

Reply via email to