https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122280
Benjamin Schulz <schulz.benjamin at googlemail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #62556|0 |1
is obsolete| |
--- Comment #14 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Created attachment 62750
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62750&action=edit
gpu_compiler_test-nvptx-none.ii
Hi out of curiosity, I ran valgrind on the attached main.cpp when compiled with
gcc-16. Options were
-fopenmp -foffload=nvptx-none -foffload-options=nvptx-none=-march=sm_89
-save-temps -fno-stack-protector
(unfortunately, i have no higher sm available. My rtx 5060 would need sm_120)
I attached the .ii file from saved-temps,
That is the output:
As you can see from main.cpp, for all memory mapped, I call the exit data
release pragma, the stl vectors on host should erase themselves after program
end...
Still there is this "definite memory loss" of 16 bytes that is there in all my
other programs that use nvptx offload with my nvidia RTX 5060 Ti...
And you can see the multiplication with collapse(2) for the first and second
loop in the multiplication generates wrong and random results (which is a
severe problem), while on host collapse(2) can (and indeed always should) be
used for the first loops in the matrix multiplication without any problem.
valgrind --leak-check=yes ./gpu_compiler_test
==17747== Memcheck, a memory error detector
==17747== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==17747== Using Valgrind-3.26.0 and LibVEX; rerun with -h for copyright info
==17747== Command: ./gpu_compiler_test
==17747==
==17772==
==17772== HEAP SUMMARY:
==17772== in use at exit: 235,106 bytes in 129 blocks
==17772== total heap usage: 1,521 allocs, 1,392 frees, 909,766 bytes
allocated
==17772==
==17772== 16 bytes in 1 blocks are definitely lost in loss record 10 of 77
==17772== at 0x490F8D8: malloc (vg_replace_malloc.c:447)
==17772== by 0x11C36E47: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11D26356: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11C22C22: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x52875BE: start_thread (in /usr/lib64/libc.so.6)
==17772== by 0x531A283: clone (in /usr/lib64/libc.so.6)
==17772==
==17772== 384 bytes in 8 blocks are possibly lost in loss record 60 of 77
==17772== at 0x490F8D8: malloc (vg_replace_malloc.c:447)
==17772== by 0x11E34865: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11E3430C: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11A827E4: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x12A01D41: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11A80012: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772==
==17772== 400 bytes in 1 blocks are possibly lost in loss record 61 of 77
==17772== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17772== by 0x40DC333: _dl_allocate_tls (in
/usr/lib64/ld-linux-x86-64.so.2)
==17772== by 0x5288245: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17772== by 0x11C38013: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11D1E72B: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11BEEB23: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17772== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17772== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17772== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17772== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17772==
==17772== 9,200 bytes in 23 blocks are possibly lost in loss record 75 of 77
==17772== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17772== by 0x40DC333: _dl_allocate_tls (in
/usr/lib64/ld-linux-x86-64.so.2)
==17772== by 0x5288245: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17772== by 0x50D007B: gomp_team_start (team.c:859)
==17772== by 0x50C0C21: GOMP_parallel (parallel.c:176)
==17772== by 0x4004121: void matrix_multiply_dot_w<double>(DataBlock<double>
const&, DataBlock<double> const&, DataBlock<double>&) (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17772== by 0x400279E: main (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17772==
==17772== LEAK SUMMARY:
==17772== definitely lost: 16 bytes in 1 blocks
==17772== indirectly lost: 0 bytes in 0 blocks
==17772== possibly lost: 9,984 bytes in 32 blocks
==17772== still reachable: 225,106 bytes in 96 blocks
==17772== suppressed: 0 bytes in 0 blocks
==17772== Reachable blocks (those to which a pointer was found) are not shown.
==17772== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==17772==
==17772== For lists of detected and suppressed errors, rerun with: -s
==17772== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
==17747== Warning: noted but unhandled ioctl 0x30000001 with no direction
hints.
==17747== This could cause spurious value errors to appear.
==17747== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a
proper wrapper.
==17747== Warning: set address range perms: large range [0x59d2b000,
0x25dd2b000) (noaccess)
==17747== Warning: set address range perms: large range [0x59d2b000,
0x25dd2b000) (noaccess)
==17747== Warning: set address range perms: large range [0x10000000000,
0x10204000000) (noaccess)
==17747== Warning: set address range perms: large range [0x200000000,
0x300200000) (noaccess)
==17747== Conditional jump or move depends on uninitialised value(s)
==17747== at 0x128B0ABB: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11E2AC07: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF1433: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF16F4: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF1F4E: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C56B57: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C45EA1: cuLinkAddData_v2 (in
/usr/lib64/libcuda.so.580.105.08)
==17747== by 0x4960A20: link_ptx (plugin-nvptx.c:795)
==17747== by 0x49630F0: GOMP_OFFLOAD_load_image (plugin-nvptx.c:1569)
==17747== by 0x50D9D40: gomp_load_image_to_device (target.c:2604)
==17747== by 0x50E5907: gomp_init_device (target.c:3007)
==17747== by 0x50E5A84: resolve_device (target.c:186)
==17747==
==17747== Conditional jump or move depends on uninitialised value(s)
==17747== at 0x128B0AC8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11E2AC07: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF1433: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF16F4: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF1F4E: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C56B57: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C45EA1: cuLinkAddData_v2 (in
/usr/lib64/libcuda.so.580.105.08)
==17747== by 0x4960A20: link_ptx (plugin-nvptx.c:795)
==17747== by 0x49630F0: GOMP_OFFLOAD_load_image (plugin-nvptx.c:1569)
==17747== by 0x50D9D40: gomp_load_image_to_device (target.c:2604)
==17747== by 0x50E5907: gomp_init_device (target.c:3007)
==17747== by 0x50E5A84: resolve_device (target.c:186)
==17747==
==17747== Conditional jump or move depends on uninitialised value(s)
==17747== at 0x128B0AEE: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11E2AC07: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF1433: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF16F4: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF1F4E: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C56B57: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C45EA1: cuLinkAddData_v2 (in
/usr/lib64/libcuda.so.580.105.08)
==17747== by 0x4960A20: link_ptx (plugin-nvptx.c:795)
==17747== by 0x49630F0: GOMP_OFFLOAD_load_image (plugin-nvptx.c:1569)
==17747== by 0x50D9D40: gomp_load_image_to_device (target.c:2604)
==17747== by 0x50E5907: gomp_init_device (target.c:3007)
==17747== by 0x50E5A84: resolve_device (target.c:186)
==17747==
A
1 2 3 4 5 6 7 8 9 10 11 12
12 11 10 9 8 7 6 5 4 3 2 1
2 4 6 8 10 12 1 3 5 7 9 11
11 9 7 5 3 1 12 10 8 6 4 2
3 6 9 12 2 5 8 11 1 4 7 10
10 7 4 1 11 8 5 2 12 9 6 3
4 8 12 3 7 11 2 6 10 1 5 9
9 5 1 7 3 11 8 4 12 6 2 10
5 10 3 8 1 6 11 4 9 2 7 12
12 7 2 9 4 11 6 1 8 3 10 5
6 1 8 3 10 5 12 7 2 9 4 11
11 2 9 4 12 7 3 10 5 1 8 6
B
12 11 10 9 8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 10 11 12
3 6 9 12 2 5 8 11 1 4 7 10
10 7 4 1 11 8 5 2 12 9 6 3
5 10 3 8 1 6 11 4 9 2 7 12
12 9 6 3 10 7 4 1 8 5 2 11
2 4 6 8 10 12 1 3 5 7 9 11
11 8 5 2 9 6 3 12 7 4 1 10
3 6 9 12 2 5 8 11 1 4 7 10
10 7 4 1 11 8 5 2 12 9 6 3
4 8 12 3 7 11 2 6 10 1 5 9
9 5 1 7 3 11 8 4 12 6 2 10
multiplication of A and B: result1 (single threaded) != result3 (target teams
distribute parallel for collapse(2)) at attempt=1
541 529 457 422 516 648 414 438 640 401 389 689
525 550 479 488 511 548 470 459 530 431 456 637
575 564 433 415 486 607 477 382 669 399 388 689
491 515 503 495 541 589 407 515 501 433 457 637
557 508 435 395 560 631 397 456 633 449 400 663
509 571 501 515 467 565 487 441 537 383 445 663
500 530 476 531 413 551 499 517 519 382 412 754
587 537 451 475 539 609 439 401 573 441 391 641
485 473 449 466 516 648 414 438 596 457 445 697
561 566 523 448 551 616 418 387 586 403 408 617
549 548 427 484 509 640 442 405 598 403 402 677
572 613 510 507 457 570 474 491 537 318 359 676
530 530 422 422 653 653 493 493 506 506 689 689
673 673 512 512 618 618 459 459 487 487 693 693
568 568 487 487 628 628 382 382 552 552 689 689
515 515 563 563 589 589 515 515 433 433 637 637
535 535 395 395 763 763 516 516 449 449 723 723
655 655 515 515 565 565 441 441 383 383 663 663
500 500 476 476 413 413 499 499 563 563 726 726
587 587 451 451 539 539 444 444 576 576 391 391
621 621 513 513 516 516 392 392 642 642 445 445
567 567 511 511 551 551 418 418 574 574 415 415
549 549 427 427 510 510 409 409 532 532 402 402
572 572 574 574 502 502 474 474 564 564 421 421
==17747==
==17747== HEAP SUMMARY:
==17747== in use at exit: 3,808,483 bytes in 189 blocks
==17747== total heap usage: 31,770 allocs, 31,581 frees, 83,420,818 bytes
allocated
==17747==
==17747== 32 bytes in 1 blocks are definitely lost in loss record 27 of 141
==17747== at 0x490F8D8: malloc (vg_replace_malloc.c:447)
==17747== by 0x50B539C: gomp_malloc (alloc.c:38)
==17747== by 0x4963186: GOMP_OFFLOAD_load_image (plugin-nvptx.c:1595)
==17747== by 0x50D9D40: gomp_load_image_to_device (target.c:2604)
==17747== by 0x50E5907: gomp_init_device (target.c:3007)
==17747== by 0x50E5A84: resolve_device (target.c:186)
==17747== by 0x50EA01F: GOMP_target_enter_exit_data (target.c:4409)
==17747== by 0x40042BE: void matrix_multiply_dot_g<double>(DataBlock<double>
const&, DataBlock<double> const&, DataBlock<double>&, int) (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747== by 0x4002859: main (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747==
==17747== 48 bytes in 1 blocks are possibly lost in loss record 31 of 141
==17747== at 0x490F8D8: malloc (vg_replace_malloc.c:447)
==17747== by 0x50B539C: gomp_malloc (alloc.c:38)
==17747== by 0x4963176: GOMP_OFFLOAD_load_image (plugin-nvptx.c:1590)
==17747== by 0x50D9D40: gomp_load_image_to_device (target.c:2604)
==17747== by 0x50E5907: gomp_init_device (target.c:3007)
==17747== by 0x50E5A84: resolve_device (target.c:186)
==17747== by 0x50EA01F: GOMP_target_enter_exit_data (target.c:4409)
==17747== by 0x40042BE: void matrix_multiply_dot_g<double>(DataBlock<double>
const&, DataBlock<double> const&, DataBlock<double>&, int) (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747== by 0x4002859: main (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747==
==17747== 72 bytes in 1 blocks are possibly lost in loss record 54 of 141
==17747== at 0x490F8D8: malloc (vg_replace_malloc.c:447)
==17747== by 0x128CD86B: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C20BC6: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C33FAA: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF0255: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747==
==17747== 72 bytes in 1 blocks are possibly lost in loss record 55 of 141
==17747== at 0x490F8D8: malloc (vg_replace_malloc.c:447)
==17747== by 0x50B539C: gomp_malloc (alloc.c:38)
==17747== by 0x50D9D6C: gomp_load_image_to_device (target.c:2622)
==17747== by 0x50E5907: gomp_init_device (target.c:3007)
==17747== by 0x50E5A84: resolve_device (target.c:186)
==17747== by 0x50EA01F: GOMP_target_enter_exit_data (target.c:4409)
==17747== by 0x40042BE: void matrix_multiply_dot_g<double>(DataBlock<double>
const&, DataBlock<double> const&, DataBlock<double>&, int) (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747== by 0x4002859: main (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747==
==17747== 80 bytes in 1 blocks are possibly lost in loss record 57 of 141
==17747== at 0x490F8D8: malloc (vg_replace_malloc.c:447)
==17747== by 0x11D6E4DA: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C3C861: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C16AAC: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEF8E7: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 90 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C40705: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D99498: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x1286FDCC: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEEBCD: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 91 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEF849: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x50E595E: gomp_init_targets_once (target.c:132)
==17747== by 0x50E595E: gomp_get_num_devices (target.c:138)
==17747== by 0x50E595E: resolve_device (target.c:147)
==17747== by 0x50EA01F: GOMP_target_enter_exit_data (target.c:4409)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 92 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEF878: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x50E595E: gomp_init_targets_once (target.c:132)
==17747== by 0x50E595E: gomp_get_num_devices (target.c:138)
==17747== by 0x50E595E: resolve_device (target.c:147)
==17747== by 0x50EA01F: GOMP_target_enter_exit_data (target.c:4409)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 93 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D885A8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D887FD: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D8898B: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEF56B: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 94 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D885A8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D887FD: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D889B2: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEF56B: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 95 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEF78B: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x50E595E: gomp_init_targets_once (target.c:132)
==17747== by 0x50E595E: gomp_get_num_devices (target.c:138)
==17747== by 0x50E595E: resolve_device (target.c:147)
==17747== by 0x50EA01F: GOMP_target_enter_exit_data (target.c:4409)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 96 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C21356: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C33FC4: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF0255: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 97 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C21374: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C33FC4: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BF0255: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 98 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEFA7F: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x50E595E: gomp_init_targets_once (target.c:132)
==17747== by 0x50E595E: gomp_get_num_devices (target.c:138)
==17747== by 0x50E595E: resolve_device (target.c:147)
==17747== by 0x50EA01F: GOMP_target_enter_exit_data (target.c:4409)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 99 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D3EAB1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEFB0F: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x50E595E: gomp_init_targets_once (target.c:132)
==17747== by 0x50E595E: gomp_get_num_devices (target.c:138)
==17747== by 0x50E595E: resolve_device (target.c:147)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 100 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D3EAD1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEFB0F: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x50E595E: gomp_init_targets_once (target.c:132)
==17747== by 0x50E595E: gomp_get_num_devices (target.c:138)
==17747== by 0x50E595E: resolve_device (target.c:147)
==17747==
==17747== 152 bytes in 1 blocks are possibly lost in loss record 101 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x11D01CB8: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D3EAF1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEFB0F: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747== by 0x528CC28: pthread_once@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x50E595E: gomp_init_targets_once (target.c:132)
==17747== by 0x50E595E: gomp_get_num_devices (target.c:138)
==17747== by 0x50E595E: resolve_device (target.c:147)
==17747==
==17747== 216 bytes in 1 blocks are possibly lost in loss record 105 of 141
==17747== at 0x490F8D8: malloc (vg_replace_malloc.c:447)
==17747== by 0x50B539C: gomp_malloc (alloc.c:38)
==17747== by 0x50D9D7C: gomp_load_image_to_device (target.c:2624)
==17747== by 0x50E5907: gomp_init_device (target.c:3007)
==17747== by 0x50E5A84: resolve_device (target.c:186)
==17747== by 0x50EA01F: GOMP_target_enter_exit_data (target.c:4409)
==17747== by 0x40042BE: void matrix_multiply_dot_g<double>(DataBlock<double>
const&, DataBlock<double> const&, DataBlock<double>&, int) (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747== by 0x4002859: main (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747==
==17747== 400 bytes in 1 blocks are possibly lost in loss record 112 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x40DC333: _dl_allocate_tls (in
/usr/lib64/ld-linux-x86-64.so.2)
==17747== by 0x5288245: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x11C38013: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11D1E72B: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11BEEB23: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C2E4F1: ??? (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x11C1FCCF: cuInit (in /usr/lib64/libcuda.so.580.105.08)
==17747== by 0x495F825: nvptx_get_num_devices (plugin-nvptx.c:679)
==17747== by 0x4960FC6: GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c:1327)
==17747== by 0x50D7046: gomp_target_init.part.0 (target.c:6079)
==17747== by 0x528CBA3: __pthread_once_slow.isra.0 (in /usr/lib64/libc.so.6)
==17747==
==17747== 9,200 bytes in 23 blocks are possibly lost in loss record 133 of 141
==17747== at 0x4917613: calloc (vg_replace_malloc.c:1678)
==17747== by 0x40DC333: _dl_allocate_tls (in
/usr/lib64/ld-linux-x86-64.so.2)
==17747== by 0x5288245: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==17747== by 0x50D007B: gomp_team_start (team.c:859)
==17747== by 0x50C0C21: GOMP_parallel (parallel.c:176)
==17747== by 0x4004121: void matrix_multiply_dot_w<double>(DataBlock<double>
const&, DataBlock<double> const&, DataBlock<double>&) (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747== by 0x400279E: main (in
/home/benni/projects/openmptestnew/openmpoffloatest/gpu_compiler_test)
==17747==
==17747== LEAK SUMMARY:
==17747== definitely lost: 32 bytes in 1 blocks
==17747== indirectly lost: 0 bytes in 0 blocks
==17747== possibly lost: 11,912 bytes in 41 blocks
==17747== still reachable: 3,796,539 bytes in 147 blocks
==17747== suppressed: 0 bytes in 0 blocks
==17747== Reachable blocks (those to which a pointer was found) are not shown.
==17747== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==17747==
==17747== Use --track-origins=yes to see where uninitialised values come from
==17747== For lists of detected and suppressed errors, rerun with: -s
==17747== ERROR SUMMARY: 155 errors from 23 contexts (suppressed: 0 from 0)