https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122280

--- Comment #16 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Hi, I see that gcc has recently had some new patches for openmp nvptx added.


I how have checked this again. 

With clang++ and options

export LIBRARY_PATH=/usr/lib64/nvptx64-nvidia-cuda/
clang++-21  -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda   main.cpp

the output is: 


Process returned 0 (0x0)   execution time : 0.346 s
Press ENTER to continue.

So for the clang compiled file, all the multiplications (single threaded on
host, multi threaded on host with collapse(2), multi threaded on gpu with teams
target distribute for the first and parallel for for the second loop, as well
as teams target distribute parallel for collapse(2) agree when compiled with
clang.


With gcc, gcc --version
gcc (Gentoo 16.0.9999 p, commit dfcc7b24a3954a0417f7f03651f6de52d5d796e3)
16.0.0 20251115 (experimental) aaa625a51cff750b40bc98f6555adc3f3f5f297b


 and options

g++-16 -fopenmp -foffload=nvptx-none -save-temps -fno-stack-protector

I get:


A
1 2 3 4 5 6 7 8 9 10 11 12 
12 11 10 9 8 7 6 5 4 3 2 1 
2 4 6 8 10 12 1 3 5 7 9 11 
11 9 7 5 3 1 12 10 8 6 4 2 
3 6 9 12 2 5 8 11 1 4 7 10 
10 7 4 1 11 8 5 2 12 9 6 3 
4 8 12 3 7 11 2 6 10 1 5 9 
9 5 1 7 3 11 8 4 12 6 2 10 
5 10 3 8 1 6 11 4 9 2 7 12 
12 7 2 9 4 11 6 1 8 3 10 5 
6 1 8 3 10 5 12 7 2 9 4 11 
11 2 9 4 12 7 3 10 5 1 8 6 


B
12 11 10 9 8 7 6 5 4 3 2 1 
1 2 3 4 5 6 7 8 9 10 11 12 
3 6 9 12 2 5 8 11 1 4 7 10 
10 7 4 1 11 8 5 2 12 9 6 3 
5 10 3 8 1 6 11 4 9 2 7 12 
12 9 6 3 10 7 4 1 8 5 2 11 
2 4 6 8 10 12 1 3 5 7 9 11 
11 8 5 2 9 6 3 12 7 4 1 10 
3 6 9 12 2 5 8 11 1 4 7 10 
10 7 4 1 11 8 5 2 12 9 6 3 
4 8 12 3 7 11 2 6 10 1 5 9 
9 5 1 7 3 11 8 4 12 6 2 10 


multiplication of A and B: result1 (single threaded) != result3 (target teams
distribute parallel for collapse(2)) at attempt=1
541 529 457 422 516 648 414 438 640 401 389 689 
525 550 479 488 511 548 470 459 530 431 456 637 
575 564 433 415 486 607 477 382 669 399 388 689 
491 515 503 495 541 589 407 515 501 433 457 637 
557 508 435 395 560 631 397 456 633 449 400 663 
509 571 501 515 467 565 487 441 537 383 445 663 
500 530 476 531 413 551 499 517 519 382 412 754 
587 537 451 475 539 609 439 401 573 441 391 641 
485 473 449 466 516 648 414 438 596 457 445 697 
561 566 523 448 551 616 418 387 586 403 408 617 
549 548 427 484 509 640 442 405 598 403 402 677 
572 613 510 507 457 570 474 491 537 318 359 676 


520 520 434 434 653 653 438 438 442 442 549 549 
550 550 488 488 576 576 459 459 622 622 609 609 
564 564 453 453 659 659 410 410 505 505 813 813 
515 515 474 474 600 600 607 607 412 412 655 655 
577 577 368 368 625 625 456 456 422 422 666 666 
559 559 515 515 569 569 441 441 478 478 663 663 
496 496 476 476 413 413 450 450 523 523 403 403 
589 589 451 451 962 962 444 444 647 647 412 412 
480 480 454 454 594 594 409 409 596 596 486 486 
561 561 530 530 544 544 418 418 574 574 500 500 
598 598 427 427 510 510 442 442 622 622 396 396 
572 572 510 510 446 446 476 476 526 526 359 359 



Process returned 0 (0x0)   execution time : 0.223 s
Press ENTER to continue.


So the collape(2) for the gcc-16 compiled file does still not agree. 


And this with the driver NVIDIA-SMI 580.95.05 that gentoo downgraded to
recently....

However, some of the memory problems reported by valgrind appear for both gcc
and clang.



This here are the valgrind problems when I run it with clang (which at least
will give me the correct output.

A gpu that allows unsolicited screenshots of my activities is usually not what
I want.


I guess tomorrow, I'll mention this at nvidia's forum and then, if they can
neither make it credible me that this is a false positive from valgrind or if
they don't fix it, switch to an AMD gpu....

But, actually, the wrong values for the gcc output are even more problematic, I
think.... And they are still there with the driver downgrade... I am using
linux-6.17.8-gentoo-dist. That is my card   NVIDIA GeForce RTX 5060 Ti    





==8776== Memcheck, a memory error detector
==8776== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==8776== Using Valgrind-3.26.0 and LibVEX; rerun with -h for copyright info
==8776== Command: ./a.out
==8776== 
==8776== Warning: noted but unhandled ioctl 0x30000001 with no direction hints.
==8776==    This could cause spurious value errors to appear.
==8776==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a
proper wrapper.
==8776== Warning: set address range perms: large range [0x59d2b000,
0x25dd2b000) (noaccess)
==8776== Warning: set address range perms: large range [0x59d2b000,
0x25dd2b000) (noaccess)
==8776== Warning: set address range perms: large range [0x10000000000,
0x10204000000) (noaccess)
==8776== Warning: set address range perms: large range [0x200000000,
0x300200000) (noaccess)
==8776== 
==8776== HEAP SUMMARY:
==8776==     in use at exit: 3,924,389 bytes in 2,688 blocks
==8776==   total heap usage: 116,356 allocs, 113,668 frees, 84,020,911 bytes
allocated
==8776== 
==8776== 72 bytes in 1 blocks are possibly lost in loss record 2,581 of 2,674
==8776==    at 0x48C08D8: malloc (vg_replace_malloc.c:447)
==8776==    by 0x12FF676B: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12349B56: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1235CEFA: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12319265: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 80 bytes in 1 blocks are possibly lost in loss record 2,587 of 2,674
==8776==    at 0x48C08D8: malloc (vg_replace_malloc.c:447)
==8776==    by 0x1249778A: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123657B1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1233FAFC: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123188F7: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 104 bytes in 1 blocks are possibly lost in loss record 2,593 of 2,674
==8776==    at 0x48C08D8: malloc (vg_replace_malloc.c:447)
==8776==    by 0x4D6FC6B: ___kmp_allocate_align(unsigned long, unsigned long,
char const*, int) (in /usr/lib64/libomp.so)
==8776==    by 0x4D74630: ___kmp_allocate (in /usr/lib64/libomp.so)
==8776==    by 0x4E12456: KMPNativeAffinity::allocate_mask() (in
/usr/lib64/libomp.so)
==8776==    by 0x4E09504: __kmp_aux_affinity_initialize_masks(kmp_affinity_t&)
[clone .constprop.0] (in /usr/lib64/libomp.so)
==8776==    by 0x4E0D910: __kmp_aux_affinity_initialize(kmp_affinity_t&) (in
/usr/lib64/libomp.so)
==8776==    by 0x4DAB412: __kmp_do_middle_initialize() (in
/usr/lib64/libomp.so)
==8776==    by 0x4DAB916: __kmp_parallel_initialize (in /usr/lib64/libomp.so)
==8776==    by 0x4DAC444: __kmp_fork_call (in /usr/lib64/libomp.so)
==8776==    by 0x4D89C94: __kmpc_fork_call (in /usr/lib64/libomp.so)
==8776==    by 0x4004884: void matrix_multiply_dot_w<double>(DataBlock<double>
const&, DataBlock<double> const&, DataBlock<double>&) (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x4002804: main (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 112 bytes in 1 blocks are possibly lost in loss record 2,594 of 2,674
==8776==    at 0x48C08D8: malloc (vg_replace_malloc.c:447)
==8776==    by 0x4D6FC6B: ___kmp_allocate_align(unsigned long, unsigned long,
char const*, int) (in /usr/lib64/libomp.so)
==8776==    by 0x4D74630: ___kmp_allocate (in /usr/lib64/libomp.so)
==8776==    by 0x4E1242A: KMPNativeAffinity::allocate_mask() (in
/usr/lib64/libomp.so)
==8776==    by 0x4E09504: __kmp_aux_affinity_initialize_masks(kmp_affinity_t&)
[clone .constprop.0] (in /usr/lib64/libomp.so)
==8776==    by 0x4E0D910: __kmp_aux_affinity_initialize(kmp_affinity_t&) (in
/usr/lib64/libomp.so)
==8776==    by 0x4DAB412: __kmp_do_middle_initialize() (in
/usr/lib64/libomp.so)
==8776==    by 0x4DAB916: __kmp_parallel_initialize (in /usr/lib64/libomp.so)
==8776==    by 0x4DAC444: __kmp_fork_call (in /usr/lib64/libomp.so)
==8776==    by 0x4D89C94: __kmpc_fork_call (in /usr/lib64/libomp.so)
==8776==    by 0x4004884: void matrix_multiply_dot_w<double>(DataBlock<double>
const&, DataBlock<double> const&, DataBlock<double>&) (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x4002804: main (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,620 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12369655: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x124C25F8: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12F98DAC: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12317BDD: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,621 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12318859: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776==    by 0x4002234: (below main) (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,622 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12318888: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776==    by 0x4002234: (below main) (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,623 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x124B1738: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x124B198D: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x124B1B1B: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1231857B: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,624 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x124B1738: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x124B198D: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x124B1B42: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1231857B: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,625 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1231879B: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776==    by 0x4002234: (below main) (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,626 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1234A2E6: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1235CF14: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12319265: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,627 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1234A304: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x1235CF14: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12319265: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,628 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12318A8F: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776==    by 0x4002234: (below main) (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,629 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12467D51: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12318B1F: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,630 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12467D71: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12318B1F: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776== 
==8776== 152 bytes in 1 blocks are possibly lost in loss record 2,631 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x1242AF58: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12467D91: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12318B1F: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776== 
==8776== 160 bytes in 1 blocks are possibly lost in loss record 2,635 of 2,674
==8776==    at 0x48C08D8: malloc (vg_replace_malloc.c:447)
==8776==    by 0x4D6FC6B: ___kmp_allocate_align(unsigned long, unsigned long,
char const*, int) (in /usr/lib64/libomp.so)
==8776==    by 0x4D74630: ___kmp_allocate (in /usr/lib64/libomp.so)
==8776==    by 0x4E369A2: __kmp_init_dynamic_user_locks (in
/usr/lib64/libomp.so)
==8776==    by 0x4DC909B: __kmp_env_initialize(char const*) (in
/usr/lib64/libomp.so)
==8776==    by 0x4DA8836: __kmp_do_serial_initialize() (in
/usr/lib64/libomp.so)
==8776==    by 0x4DA8DE6: __kmp_serial_initialize (in /usr/lib64/libomp.so)
==8776==    by 0x4E75D06: ompt_libomp_connect (in /usr/lib64/libomp.so)
==8776==    by 0x4F22CFB: llvm::omp::target::ompt::connectLibrary() (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F10B20: initRuntime() (in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F0284F: __tgt_register_lib (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== 352 bytes in 1 blocks are possibly lost in loss record 2,645 of 2,674
==8776==    at 0x48C8613: calloc (vg_replace_malloc.c:1678)
==8776==    by 0x408D333: _dl_allocate_tls (in /usr/lib64/ld-linux-x86-64.so.2)
==8776==    by 0x508F245: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==8776==    by 0x12360F63: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x124479CB: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12317B33: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x123576D1: ??? (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x12348D1F: cuInit (in /usr/lib64/libcuda.so.580.95.05)
==8776==    by 0x4F7733E: cuInit(unsigned int) (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F6CA83: llvm::omp::target::plugin::CUDAPluginTy::initImpl()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F29250: llvm::omp::target::plugin::GenericPluginTy::init()
(in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F1535C: PluginManager::registerLib(__tgt_bin_desc*) (in
/usr/lib64/libomptarget.so.21.1)
==8776== 
==8776== 2,455 bytes in 1 blocks are possibly lost in loss record 2,661 of
2,674
==8776==    at 0x48C08D8: malloc (vg_replace_malloc.c:447)
==8776==    by 0x4D6FC6B: ___kmp_allocate_align(unsigned long, unsigned long,
char const*, int) (in /usr/lib64/libomp.so)
==8776==    by 0x4D74630: ___kmp_allocate (in /usr/lib64/libomp.so)
==8776==    by 0x4DCA15F: __kmp_env_dump() (in /usr/lib64/libomp.so)
==8776==    by 0x4DA852C: __kmp_do_serial_initialize() (in
/usr/lib64/libomp.so)
==8776==    by 0x4DA8DE6: __kmp_serial_initialize (in /usr/lib64/libomp.so)
==8776==    by 0x4E75D06: ompt_libomp_connect (in /usr/lib64/libomp.so)
==8776==    by 0x4F22CFB: llvm::omp::target::ompt::connectLibrary() (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F10B20: initRuntime() (in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F0284F: __tgt_register_lib (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776== 
==8776== 5,416 (616 direct, 4,800 indirect) bytes in 1 blocks are definitely
lost in loss record 2,665 of 2,674
==8776==    at 0x48C10D3: operator new(unsigned long) (vg_replace_malloc.c:488)
==8776==    by 0x4F6462E: createPlugin_cuda (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F13427: PluginManager::init() (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F10B28: initRuntime() (in /usr/lib64/libomptarget.so.21.1)
==8776==    by 0x4F0284F: __tgt_register_lib (in
/usr/lib64/libomptarget.so.21.1)
==8776==    by 0x40021EC: .omp_offloading.descriptor_reg (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776==    by 0x500D72E: __libc_start_main@@GLIBC_2.34 (in
/usr/lib64/libc.so.6)
==8776==    by 0x4002234: (below main) (in
/home/benni/projects/openmptestnew/openmpoffloatest/a.out)
==8776== 
==8776== LEAK SUMMARY:
==8776==    definitely lost: 616 bytes in 1 blocks
==8776==    indirectly lost: 4,800 bytes in 4 blocks
==8776==      possibly lost: 5,159 bytes in 19 blocks
==8776==    still reachable: 3,913,814 bytes in 2,664 blocks
==8776==         suppressed: 0 bytes in 0 blocks
==8776== Reachable blocks (those to which a pointer was found) are not shown.
==8776== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==8776== 
==8776== For lists of detected and suppressed errors, rerun with: -s
==8776== ERROR SUMMARY: 20 errors from 20 contexts (suppressed: 0 from 0)

Reply via email to