https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122280

Benjamin Schulz <schulz.benjamin at googlemail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #62678|0                           |1
        is obsolete|                            |

--- Comment #9 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Created attachment 62693
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62693&action=edit
main.cpp

I now reduced the test case to a single file, which does 3 matrix
multiplications and distplays the result. 

Compile e.g. with  

-fopenmp -foffload=nvptx-none -fno-stack-protector -O3 -Wall


Below is the output for my system: 

gcc (Gentoo 15.2.1_p20251018 p1) 15.2.1 20251018

NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version:
13.0     |
NVIDIA GeForce RTX 5060 Ti 

The first output is single threaded on host(1),

The second output is multi threaded with parallel for collapse(2) on host.
Agrees with (1).

The third output is on gpu, with target teams distribute for the first loop,
followed by parallel for on the second loop. It agrees with (1)


The fourth output (note the 911 in the wrong 911 row, 3 column) is target teams
distribute parallel for collapse (2). And its wrong. But it should be possible
to use the target teams distribute parallel for collapse(2) construct here. 


The attached code for gpu just uses simple mapping macros and a class like this


template<typename T>
class DataBlock
{
public:
    size_t* dpextents;
    size_t* dpstrides;
    T* dpdata;
    size_t dpdatalength;
    DataBlock(size_t *ext,size_t *str, T *dat, size_t datlength):      
dpextents(ext), dpstrides(str), dpdata(dat), dpdatalength(datlength) {}
};



541 529 457 422 516 648 414 438 640 401 389 689 
525 550 479 488 511 548 470 459 530 431 456 637 
575 564 433 415 486 607 477 382 669 399 388 689 
491 515 503 495 541 589 407 515 501 433 457 637 
557 508 435 395 560 631 397 456 633 449 400 663 
509 571 501 515 467 565 487 441 537 383 445 663 
500 530 476 531 413 551 499 517 519 382 412 754 
587 537 451 475 539 609 439 401 573 441 391 641 
485 473 449 466 516 648 414 438 596 457 445 697 
561 566 523 448 551 616 418 387 586 403 408 617 
549 548 427 484 509 640 442 405 598 403 402 677 
572 613 510 507 457 570 474 491 537 318 359 676 


541 529 457 422 516 648 414 438 640 401 389 689 
525 550 479 488 511 548 470 459 530 431 456 637 
575 564 433 415 486 607 477 382 669 399 388 689 
491 515 503 495 541 589 407 515 501 433 457 637 
557 508 435 395 560 631 397 456 633 449 400 663 
509 571 501 515 467 565 487 441 537 383 445 663 
500 530 476 531 413 551 499 517 519 382 412 754 
587 537 451 475 539 609 439 401 573 441 391 641 
485 473 449 466 516 648 414 438 596 457 445 697 
561 566 523 448 551 616 418 387 586 403 408 617 
549 548 427 484 509 640 442 405 598 403 402 677 
572 613 510 507 457 570 474 491 537 318 359 676 


541 529 457 422 516 648 414 438 640 401 389 689 
525 550 479 488 511 548 470 459 530 431 456 637 
575 564 433 415 486 607 477 382 669 399 388 689 
491 515 503 495 541 589 407 515 501 433 457 637 
557 508 435 395 560 631 397 456 633 449 400 663 
509 571 501 515 467 565 487 441 537 383 445 663 
500 530 476 531 413 551 499 517 519 382 412 754 
587 537 451 475 539 609 439 401 573 441 391 641 
485 473 449 466 516 648 414 438 596 457 445 697 
561 566 523 448 551 616 418 387 586 403 408 617 
549 548 427 484 509 640 442 405 598 403 402 677 
572 613 510 507 457 570 474 491 537 318 359 676 


541 529 457 422 516 648 414 438 640 401 389 689 
525 550 479 488 511 548 470 459 530 431 456 637 
575 564 433 415 486 607 477 382 669 399 388 689 
491 515 503 495 541 589 407 515 501 433 457 637 
557 508 435 395 560 631 397 456 633 449 400 663 
509 571 501 515 467 565 487 441 537 383 445 663 
500 530 476 531 413 551 499 517 519 382 412 754 
587 537 451 475 539 609 439 401 573 441 391 641 
485 473 449 466 516 648 414 438 596 457 445 697 
561 566 523 448 551 616 418 387 586 403 408 617 
549 548 911 484 509 640 442 405 598 403 402 677 
572 613 510 507 457 570 474 491 537 318 359 676 



Process returned 0 (0x0)   execution time : 0.303 s
Press ENTER to continue.


This should not happen. It is just embarassing, I think

Reply via email to