Hello, I'm master student at high-performance computing at barcelona supercomputing center. And I'm working on my thesis regarding openmp accelerator model implementation onto our compiler (OmpSs). Actually i almost finished implementation of all new directives to generate CUDA code and same implementation OpenCL doesn't take so much according to my design. But i haven't even tried for Intel mic and apu other hardware accelerator :) Now i'm bench-marking output kernel codes which are generated by my compiler. although output kernel is generally naive, speedup is not very very bad. when I compare results with HMPP OpenACC 3.2.x compiler, speedups are almost same or in some cases my results are slightly better than. That's why in this term, i am going to work on compiler level or runtime level optimizations for gpus.
When i looked gcc openmp 4.0 project, i couldn't see any things about code generation. Are you going to announce later? or should i apply gsoc with my idea about code generations and device code optimizations? Güray Özen ~grypp