On Tue, Jan 05, 2021 at 12:13:59PM +0000, Julian Brown wrote:
> Just to check, does my reply below address your concerns --
> particularly with regards to the current usage of CUDA streams
> serializing kernel executions from different host threads? Given that
> situation, and the observed speed improvement with OpenMP offloading to
> NVPTX with the patch, I'm not sure how much sense it makes to do
> anything more sophisticated than this -- especially without a test case
> that demonstrates a performance regression (or an exacerbated
> out-of-memory condition) with the patch.

I guess I can live with it for GCC 11, but would like this to be
reconsidered for GCC 12, people do run OpenMP offloading code from multiple
often concurrent threads and we shouldn't serialize it unnecessarily.

        Jakub

Reply via email to