On Wed, Jun 20, 2012 at 12:47 AM, Walter Landry <wlan...@caltech.edu> wrote: > Richard Guenther <richard.guent...@gmail.com> wrote: >> On Fri, Jun 15, 2012 at 12:54 AM, Walter Landry <wlan...@caltech.edu> wrote: >>> Hello Everyone, >>> >>> I thought you might be interested in some C++ expression template >>> benchmarks I have done. >>> >>> http://www.wlandry.net/Projects/FTensor#Benchmarks >>> >>> I found that GCC optimized the expression template code better than >>> unrolling expressions by hand. In fact, GCC was far, far better at >>> optimizing code with expression templates than any other compiler. I >>> ran the same benchmarks back in 2003, and GCC has improved quite a lot >>> since then. >> >> Heh, yeah - quite possibly because I myself was working with a POOMA >> based CFD code during my PhD which made me start working on inproving >> GCC for expression template code ;) It is btw interesting to try to enable >> profile-feedback for the compilers - for some compilers you'll see that >> the profile-generating executables are so slow as to be unusable (as they >> seem to keep all calls of the expression templates). > > I got around to trying profile guided optimization. For GCC it did > not make much difference, but for Intel it made a huge improvement for > the expression template code. Of course, the training executable ran > 20 times slower. But that was better than the Open64 compiler which > was too slow for me to get results.
That's good to hear - my experience with ICC (I think it was 9.x) was even worse, a slowdown of a factor of 1000 or so which made PGO impractical, too. Impractical PGO are usually a sign that PGO instrumentation is done before any inlining happens. > I have added a section on PGO. > > http://www.wlandry.net/Projects/FTensor#PGO > > I also added results from Open64 and Pathscale's ENZO. Thanks, Richard.