Hi Matt,
I used these command-line for generating the cachefiles.
gem5/gem5-resources/src/gpu/DNNMark/
docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID
gcr.io/gem5-test/gcn-gpu:v21-2 make
docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py
cachefiles.csv --gfx-version=gfx801 --num-cus=128
Maybe the option for -num-cus=128 is NOT supported ?
How to confirm the -num-cus=128 is updated in some file(s) ?
Thanks,
David
From: Matt Sinclair <[email protected]>
Sent: Wednesday, March 9, 2022 1:13 PM
To: gem5 users mailing list <[email protected]>; Poremba, Matthew
<[email protected]>; Kyle Roarty <[email protected]>
Cc: David Fong <[email protected]>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark
test_fwd_softmax
That error in #2 means MIOpen can't find the kernel again. Did you change the
number of CUs to 128 (or whatever number of CUs you are using) when you
generated the cachefiles?
Matt
From: David Fong via gem5-users
<[email protected]<mailto:[email protected]>>
Sent: Wednesday, March 9, 2022 12:50 PM
To: Poremba, Matthew <[email protected]<mailto:[email protected]>>;
gem5 users mailing list <[email protected]<mailto:[email protected]>>
Cc: David Fong <[email protected]<mailto:[email protected]>>
Subject: [gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with
DNNMark test_fwd_softmax
Hi Matt,
Thanks for your quick response.
The hack is not working.
1. I had to start from scratch or I get same error
2. After running the same steps + the hack before gem5 compile, I'm getting
these error messages
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find
file: /tmp/miopen-MIOpenSoftmax.cl-96e7-d3d7-ce59-9759/MIOpenSoftmax.cl.o
MIOpen Error: 7 at
/home/dfong/work/ext_ips/gem5-apu-cu128-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
574458882500
Am I missing some other setting ?
David
FULL MESSAGE WITH . . . TO REDUCE SIZE
docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py --num-compute-units 128 -n3
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax --options="-config
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee
gem5_apu_cu128_run_dnnmark_test_fwd_softmax_50latency.log
Global frequency set at 1000000000000 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192
Mbytes) does not match the address range assigned (512 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting
gdb connections
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output
to the main memory is now called `memory_out_port`
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
warn: failed to generate dot output from m5out/config.dot
build/GCN3_X86/sim/simulate.cc:194: info: Entering event queue @ 0. Starting
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy
updates recently became the responsibility of SLICC state machines. Make sure
to setMRU() near callbacks in .sm files!
gem5 Simulator System.
http://gem5.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__gem5.org&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=3rBNiY51yOPXt7vo7L8qPkOJ9AMK-FJCYdxhVIFfcB8&s=D7WjRoTxGhJgKdjciWeXQw6c15M2Q-1yYu4SqHnuw0U&e=>
gem5 is copyrighted software; use the --copyright option for details.
gem5 version 21.2.1.0
gem5 compiled Mar 9 2022 18:21:02
gem5 started Mar 9 2022 18:27:12
gem5 executing on dc013b3a89f5, pid 1
command line: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
--num-compute-units 128 -n3
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax '--options=-config
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin'
info: Standard input is not a terminal, disabling listeners.
Num SQC = 32 Num scalar caches = 32 Num CU = 128
incrementing idx on 4
incrementing idx on 8
incrementing idx on 12
incrementing idx on 16
incrementing idx on 20
incrementing idx on 24
incrementing idx on 28
incrementing idx on 32
incrementing idx on 36
incrementing idx on 40
incrementing idx on 44
incrementing idx on 48
incrementing idx on 52
incrementing idx on 56
incrementing idx on 60
incrementing idx on 64
incrementing idx on 68
incrementing idx on 72
incrementing idx on 76
incrementing idx on 80
incrementing idx on 84
incrementing idx on 88
incrementing idx on 92
incrementing idx on 96
incrementing idx on 100
incrementing idx on 104
incrementing idx on 108
incrementing idx on 112
incrementing idx on 116
incrementing idx on 120
incrementing idx on 124
. . .
"dot" with args ['-Tsvg', '/tmp/tmped75d08r'] returned code: 1
stdout, stderr:
b''
b'Error: /tmp/tmped75d08r: syntax error in line 119533 scanning a quoted string
(missing endquote? longer than 16384?)\nString
starting:"clk_domain=system.ruby.clk_domain \\eventq_index=0 \\latency=1\n'
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
. . .
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall rt_sigaction(...)
(further warnings will be suppressed)
build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall
rt_sigprocmask(...)
(further warnings will be suppressed)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall get_mempolicy(...)
build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring
non-temporal hint, modeling as cacheable!
build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction
'frndint' unimplemented
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:704: warn: unimplemented
ioctl: AMDKFD_IOC_ACQUIRE_VM
build/GCN3_X86/sim/syscall_emul.hh:1862: warn: mmap: writing to shared mmap
region is currently unsupported. The write succeeds on the target, but it will
not be propagated to the host or shared mappings
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:455: warn: Signal events are
only supported currently
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the
requested power state, request ignored
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
. . .
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find
file: /tmp/miopen-MIOpenSoftmax.cl-96e7-d3d7-ce59-9759/MIOpenSoftmax.cl.o
MIOpen Error: 7 at
/home/dfong/work/ext_ips/gem5-apu-cu128-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
574458882500
Exiting because exiting with last active thread context
David
From: Poremba, Matthew <[email protected]<mailto:[email protected]>>
Sent: Tuesday, March 8, 2022 4:23 PM
To: gem5 users mailing list <[email protected]<mailto:[email protected]>>
Cc: David Fong <[email protected]<mailto:[email protected]>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark
test_fwd_softmax
[Public]
Hi David,
You are hitting the limit on the number of same MachineTypes in a Ruby network.
You can change this by modifying the `build_opts/GCN_X86` file and adding a
new line with `NUMBER_BITS_PER_SET = '128'`, or higher, and then recompile
gem5. As far as I know there is not a limit on the number of CUs.
-Matt
From: David Fong via gem5-users
<[email protected]<mailto:[email protected]>>
Sent: Tuesday, March 8, 2022 3:51 PM
To: David Fong via gem5-users <[email protected]<mailto:[email protected]>>
Cc: David Fong <[email protected]<mailto:[email protected]>>
Subject: [gem5-users] gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark
test_fwd_softmax
[CAUTION: External Email]
Hi,
I built gem5 with X86 and APU (gfx801) with CUS=128 to run DNNMark
test_fwd_softmax showing steps below and message outputs from the run
Is there a limitation on number of CUs (compute units) for the APU (gfx801) or
do I need to add the number of compute units (128) on one of the cmd-lines
below ?
Thanks,
David
git clone
https://gem5.googlesource.com/public/gem5<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgem5.googlesource.com-252Fpublic-252Fgem5-26data-3D04-257C01-257Cmatthew.poremba-2540amd.com-257C43a4c2768a7b409609ca08da015ebddc-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637823803685522602-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3DE6QPfUhM7qFb3gobEkSzCp2HdvVKXuQuGSgxRREcNkc-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=F21ZFu946IONLFjaIYSOXbhp72fP4psEV1yX4oaNmfA&s=4STq7Q1VfHpQCUuTTRNemzSiZeGr1r0hUDLBAidD46E&e=>
git clone
https://gem5.googlesource.com/public/gem5-resources<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgem5.googlesource.com-252Fpublic-252Fgem5-2Dresources-26data-3D04-257C01-257Cmatthew.poremba-2540amd.com-257C43a4c2768a7b409609ca08da015ebddc-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637823803685522602-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3DqIXdStZk2TYrUHFxTKXguFios5oKN6eQ6WL59RA8sAc-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=F21ZFu946IONLFjaIYSOXbhp72fP4psEV1yX4oaNmfA&s=56gjdqaVCOChrWuZOZ2nDT-soU7aTZ6-flU90R58dQg&e=>
gem5/gem5-resources
# COMPILE DNNMARK TESTS
cd gem5/gem5-resources/src/gpu/DNNMark
docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID
gcr.io/gem5-test/gcn-gpu:v21-2 make
docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py
cachefiles.csv --gfx-version=gfx801 --num-cus=128
g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data
./generate_rand_data
# BUILD GEM5
cd ../../../..
docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt
# RUN TEST
cd ../
docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py --num-compute-units 128 -n3
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax --options="-config
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee
gem5_apu_cu128_run_dnnmark_test_fwd_softmax_50latency.log
Global frequency set at 1000000000000 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192
Mbytes) does not match the address range assigned (512 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (10000) does not
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/statistics.hh:280: warn: One of the stats is a legacy stat.
Legacy stat is a stat that does not belong to any statistics::Group. Legacy
stat is deprecated.
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
build/GCN3_X86/base/statistics.hh:280: warn: One of the stats is a legacy stat.
Legacy stat is a stat that does not belong to any statistics::Group. Legacy
stat is deprecated.
build/GCN3_X86/mem/ruby/common/Set.hh:214: fatal: Number of bits(64) < size
specified(65). Increase the number of bits and recompile.
Memory Usage: 2359940 Kbytes
_______________________________________________
gem5-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s