I'll be submitting our current ptx backend in a series of 23 patches in reply to this mail. This is currently a work-in-progress and still rough around the edges. We'd like to do all our OpenACC work on the gomp4 branch, so I'm submitting this as a proposal to see if it would be acceptable for this branch in its current state. Beyond these patches, some other pieces are necessary for it to be useful: a post-processor for gcc output that reorders it in such a way that ptxas can process it, a small "library" assembly file that provides a _main entry point that calls main as expected, and a modified version of the CUDA ptxjit example program that can take an assembly file produced by the toolchain and execute that _main entry. This makes it possible to run a fair number of gcc testcases.
The first 22 patches are preliminary. Some fix problems exposed by the port and could be applied to mainline now if someone wanted to approve them. Others deal with a number of issues unique to ptx: * It's a virtual target, which means register allocation happens in the "assembler". We changed gcc to have a target hook to disable most everything from IRA onwards. * The assembly syntax is sufficiently different from what is normal that we need extra hooks to print out variables. We also need to print declarations for all referenced functions and variables in the output file. * Everything must live in an address space. There are several for global variables, constant data, and local variables. We have C frontend changes to apply and deal with implicit address spaces. It's not clear whether we'll want the C frontend changes in the final version of this - we may not care about compiling C directly to ptx without going through OpenACC. For the moment it's very useful to be able to run parts of the gcc testsuite. Bootstrapped and tested on x86_64-linux with all patches applied to gomp4-branch. Bernd