Hello, We (Intel) would like to submit patches to enable fundamental debug support for Intel GPU devices. In the future, we plan to add more patches that improve the performance and the user experience. Those patches are already available in the downstream "Intel Distribution for GDB" debugger at
https://github.com/intel/gdb The v1 of the submission is available at https://sourceware.org/pipermail/gdb-patches/2024-July/210264.html and v2 is available at https://sourceware.org/pipermail/gdb-patches/2024-December/214029.html This revision (v3) makes the following changes: - The comments that have been received so far are addressed. - Patches are rebased on the master branch. - A number of patches have been refactored to improve the code. GPU threads operate in a SIMD/SIMT (single instruction multiple data, single instruction multiple thread) manner: they are vectorized, where each lane (also known as "execution channel") executes the same instruction but using different data values. Lanes of the same thread execute in a lock-step movement. Displaying the value of a source program variable therefore requires not only a thread context but also a lane context. GDB currently does not have this knowledge built-in. Furthermore, some DWARF extensions are necessary to express data locations in a lane-relative way, which are currently under discussion of or to be submitted to the DWARF committee. Hence, with this submission, variables may appear with an error like "<error reading variable: Unhandled dwarf expression opcode 0xed>". Similar restrictions apply also to the AMD ROCm (AMDGPU) target in the upstream GDB for the same reasons. The downstream "Intel Distribution for GDB" debugger implements lane support as well as DWARF extensions and hence is able to print lane-relative values properly. Lane support is a future GDB topic; see a BoF hosted by Intel and AMD in GNU Tools Cauldron 2024 for more details if interested. We provide a gdbserver low target definition. The target uses the Level-Zero debug API: https://spec.oneapi.io/level-zero/latest/tools/PROG.html#program-debug https://spec.oneapi.io/level-zero/latest/tools/api.html#debug The user-space implementation of the Level-Zero Debug API comes from "Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver": https://github.com/intel/compute-runtime The kernel-space implementation of the Level-Zero Debug API, i.e. the 'eudebug' feature of the "Xe Intel graphics driver", is under submission: https://lists.freedesktop.org/archives/intel-xe/2024-December/061476.html (v3) https://lists.freedesktop.org/archives/intel-xe/2024-October/052260.html (v2) https://lists.freedesktop.org/archives/intel-xe/2024-July/043605.html (v1) For Level-Zero based devices, we model hardware threads. There is one GDB thread for each hardware thread on the device. We opted for this model for the following reasons: - Programs that use GPUs to accelerate computation typically offload many computation kernels. Hence, software threads in GPUs have much shorter lives than threads in multi-threaded CPU programs. For real-world cases, the data processed by the GPU is typically large, causing the number of software threads to be usually higher than the number of available hardware threads. Therefore, dealing with software threads may cause proliferation of threads. Modeling hardware threads, on the other hand, means that they would be created once at the beginning of the debug session and then the list of threads stays stable. - As of today, Intel GPUs do not switch context for threads. That is, once a software thread is assigned to run on a particular hardware thread, it always runs on that hardware thread until termination. Therefore, focusing on a hardware thread does not create context switch confusion for the user that would otherwise be experienced with e.g. CPU threads. Hardware threads may be idle inbetween computation kernel executions or when a kernel does not utilize the GPU fully. They may also be used by applications other than the one currently under debug. During these times, those hardware threads cannot be interacted with (e.g. cannot be interrupted) by the current debug user and appear as unavailable. To handle this case, we introduce an UNAVAILABLE wait kind and also model it as a thread execution state. In particular, UNAVAILABLE means that we have tried to stop the thread and failed. The Intel GPU target can be used in combination with a native target, relying on GDB's multi-target feature, to debug the GPU and the host application in the same debug session. For this, bring the native app (e.g. a SYCL [https://www.khronos.org/sycl/] program) to a state where the Level-Zero backend for the GPU has been initialized (e.g. after the first queue has been created in SYCL), then create a gdbserver instance and connect to it from a second inferior. At GNU Tools Cauldron 2024, we gave a talk presenting the approach. We'd welcome interested parties to watch the recording. https://www.youtube.com/watch?v=sYep57kjvHM Below is a sample session that shows how to set up inferiors and targets manually. In the downstream debugger, a Python script is used to take these steps in an automated manner for better user experience. $ gdb demo ... (gdb) maintenance set target-non-stop on (gdb) tbreak 60 Temporary breakpoint 1 at 0x4049c8: file demo.cpp, line 60. (gdb) run ... [SYCL] Using device: [Intel(R) Graphics ...] from [Intel(R) oneAPI Unified Runtime over Level-Zero] Thread 1 "demo" hit Temporary breakpoint 1, main (argc=1, argv=0x7fffffffd9b8) at demo.cpp:60 60 range data_range{length}; (gdb) # Connect the Intel GT gdbserver by specifying the host inferior PID. (gdb) add-inferior -no-connection [New inferior 2] Added inferior 2 (gdb) inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] (gdb) info inferiors Num Description Connection Executable 1 process 16458 1 (native) /temp/demo * 2 <null> (gdb) target remote | gdbserver-intelgt --attach - 16458 Remote debugging using | gdbserver-intelgt --attach - 16458 Attached; given pid = 16458, updated to 1 Remote debugging using stdio <unavailable> in ?? () (gdb) We also include patches for the testsuite, where we introduce the infrastructure and a number of test cases using SYCL. For convenience, the patches in this series are available at https://github.com/intel/gdb/tree/upstream/intelgt-mvp To those who may want to try the debugger, we also provide https://github.com/intel/gdb/tree/upstream/intelgt-mvp-plus with a number of additional patches (not yet upstreamed) that bring (1) lane support, (2) ability to make GPU threads do inferior calls for expression evaluation, (3) a minimal Python script that starts and connects gdbserver-intelgt automatically for more convenience. Submission of these additional features (and more) is planned for future after the fundamental debug support is accepted and after aligning with the upstream community on GPU support (e.g. lanes) in GDB. For those who might be interested, below is a link to the instructions that explain how to build the debug sw stack (the kernel-mode and user-mode drivers, plus GDB) from the patches submitted to upstream. https://gitlab.freedesktop.org/miku/kernel Best regards, Baris --- Albertano Caruso (2): gdb, intelgt: add disassemble feature for the Intel GT architecture. testsuite, arch, intelgt: add a disassembly test Klaus Gerlicher (1): gdb, ze: on a whole process stop, mark all threads as not_resumed Markus Metzger (13): gdb, arch, intelgt: add intelgt arch definitions gdb, gdbserver, ze: in-memory libraries gdb, gdbserver, rsp, ze: acknowledge libraries gdb, solib, ze: update target_solib_ops::bfd_open_from_target_memory gdb, infrun, ze: allow saving process events gdb, ze: add TARGET_WAITKIND_UNAVAILABLE gdb, infrun, ze: handle stopping unavailable threads gdb, infrun, ze: allow resuming unavailable threads gdb, gdbserver, ze: add U stop reply gdb, gdbserver, ze: add library notification to U stop reply gdbserver: wait for stopped threads in queue_stop_reply_callback gdb, dwarf, ze: add DW_OP_INTEL_regval_bits gdbserver, ze, intelgt: introduce ze-low and intel-ze-low targets Natalia Saiapova (2): bfd: add intelgt target to BFD gdb: do not create a thread after a process event. Nils-Christian Kempke (1): gdb, gdbserver, gdbsupport: add 'device' tag to XML target description Tankut Baris Aktemur (25): gdb, intelgt: add intelgt as a basic machine ld: add intelgt as a target configuration opcodes: add intelgt as a configuration gdb, intelgt: add the target-dependent definitions for the Intel GT architecture gdbserver, ze: report TARGET_WAITKIND_UNAVAILABLE events gdb, ze: handle TARGET_WAITKIND_UNAVAILABLE in stop_all_threads gdb, remote: handle thread unavailability in print_one_stopped_thread gdb, remote: do 'remote_add_inferior' in 'remote_notice_new_inferior' earlier gdb, remote: handle a generic process PID in remote_notice_new_inferior gdb, remote: handle a generic process PID in process_stop_reply gdb: use the pid from inferior in setup_inferior gdb: revise the pid_to_exec_file target op gdb: load solibs if the target does not have the notion of an exec file gdbserver: import AC_LIB_HAVE_LINKFLAGS macro into the autoconf script gdbserver: add a pointer to the owner thread in regcache gdbserver: adjust pid after the target attaches gdbserver: allow configuring for a heterogeneous target testsuite, sycl: add SYCL support testsuite, sycl: add test for backtracing inside a kernel testsuite, sycl: add test for 'info locals' and 'info args' testsuite, sycl: add tests for stepping and accessing data elements testsuite, sycl: add test for 1-D and 2-D parallel_for kernels testsuite, sycl: add test for scheduler-locking testsuite, arch, intelgt: add intelgt-program-bp.exp testsuite, sycl: test canceling a stepping flow bfd/Makefile.am | 2 + bfd/Makefile.in | 4 + bfd/archures.c | 4 + bfd/bfd-in2.h | 6 + bfd/config.bfd | 13 +- bfd/configure | 1 + bfd/configure.ac | 1 + bfd/cpu-intelgt.c | 57 + bfd/elf64-intelgt.c | 195 ++ bfd/libbfd.h | 2 + bfd/reloc.c | 7 + bfd/targets.c | 2 + binutils/dwarf.c | 6 + binutils/readelf.c | 9 + config.sub | 4 +- gdb/Makefile.in | 8 +- gdb/NEWS | 25 + gdb/README | 8 + gdb/arch/intelgt.c | 191 ++ gdb/arch/intelgt.h | 186 ++ gdb/config.in | 3 + gdb/configure | 559 ++++- gdb/configure.ac | 52 + gdb/configure.tgt | 5 + gdb/disasm-selftests.c | 12 + gdb/doc/gdb.texinfo | 162 +- gdb/dwarf2/expr.c | 37 + gdb/dwarf2/expr.h | 5 + gdb/dwarf2/loc.c | 2 + gdb/exec.c | 6 + gdb/features/gdb-target.dtd | 19 +- gdb/features/library-list.dtd | 22 +- gdb/fork-child.c | 10 +- gdb/gdbthread.h | 12 +- gdb/infcmd.c | 14 +- gdb/inferior.h | 4 + gdb/infrun.c | 124 +- gdb/intelgt-tdep.c | 986 ++++++++ gdb/nat/fork-inferior.c | 10 + gdb/remote.c | 227 +- gdb/selftest-arch.c | 6 +- gdb/solib-target.c | 153 +- gdb/solib-target.h | 3 + gdb/solib.c | 81 +- gdb/solib.h | 19 +- gdb/target-delegates-gen.c | 50 + gdb/target-descriptions.c | 19 + gdb/target.c | 16 + gdb/target.h | 24 + gdb/target/waitstatus.c | 1 + gdb/target/waitstatus.h | 22 + gdb/testsuite/README | 9 + gdb/testsuite/boards/intel-offload.exp | 36 + gdb/testsuite/gdb.arch/intelgt-disassemble.exp | 82 + gdb/testsuite/gdb.arch/intelgt-program-bp.exp | 83 + gdb/testsuite/gdb.arch/sycl-simple.cpp | 42 + gdb/testsuite/gdb.sycl/break.exp | 63 + gdb/testsuite/gdb.sycl/break2.exp | 66 + gdb/testsuite/gdb.sycl/call-stack.cpp | 92 + gdb/testsuite/gdb.sycl/call-stack.exp | 179 ++ gdb/testsuite/gdb.sycl/info-locals-and-args.exp | 78 + gdb/testsuite/gdb.sycl/parallel-for-1D.cpp | 72 + gdb/testsuite/gdb.sycl/parallel-for-1D.exp | 55 + gdb/testsuite/gdb.sycl/parallel-for-2D.cpp | 73 + gdb/testsuite/gdb.sycl/parallel-for-2D.exp | 55 + gdb/testsuite/gdb.sycl/scheduler-locking.exp | 67 + gdb/testsuite/gdb.sycl/single-task.cpp | 50 + gdb/testsuite/gdb.sycl/step-canceled.exp | 86 + gdb/testsuite/gdb.sycl/step-into-function.exp | 47 + gdb/testsuite/gdb.sycl/step-parallel-for.exp | 63 + gdb/testsuite/gdb.sycl/step.exp | 51 + gdb/testsuite/gdb.threads/killed-outside.exp | 4 + gdb/testsuite/lib/gdb.exp | 17 +- gdb/testsuite/lib/intelgt-utils.exp | 43 + gdb/testsuite/lib/sycl-devices.cpp | 107 + gdb/testsuite/lib/sycl-hello.cpp | 43 + gdb/testsuite/lib/sycl-util.cpp | 135 + gdb/testsuite/lib/sycl.exp | 410 ++++ gdb/thread.c | 2 +- gdb/top.c | 10 + gdb/xml-tdesc.c | 76 + gdbserver/Makefile.in | 4 +- gdbserver/acinclude.m4 | 5 + gdbserver/config.in | 6 + gdbserver/configure | 500 ++++ gdbserver/configure.ac | 18 + gdbserver/configure.srv | 15 +- gdbserver/dll.cc | 175 +- gdbserver/dll.h | 54 +- gdbserver/gdbthread.h | 2 +- gdbserver/intelgt-ze-low.cc | 1016 ++++++++ gdbserver/linux-low.cc | 6 +- gdbserver/linux-low.h | 2 +- gdbserver/netbsd-low.cc | 2 +- gdbserver/netbsd-low.h | 2 +- gdbserver/regcache.cc | 1 + gdbserver/regcache.h | 3 + gdbserver/remote-utils.cc | 21 + gdbserver/server.cc | 319 ++- gdbserver/server.h | 7 + gdbserver/target.cc | 14 + gdbserver/target.h | 31 +- gdbserver/tdesc.cc | 16 + gdbserver/tdesc.h | 3 + gdbserver/win32-low.cc | 4 +- gdbserver/win32-low.h | 2 +- gdbserver/ze-low.cc | 2996 +++++++++++++++++++++++ gdbserver/ze-low.h | 496 ++++ gdbsupport/tdesc.cc | 48 + gdbsupport/tdesc.h | 90 + include/dwarf2.def | 4 + include/elf/intelgt.h | 39 + ld/configure.tgt | 2 + opcodes/configure | 1 + opcodes/configure.ac | 1 + 115 files changed, 11351 insertions(+), 146 deletions(-) --- base-commit: 9af083a959a03ef068e1b7869263dddb4fb913c3 change-id: 20241213-upstream-intelgt-mvp-684d5f2f6730 Best regards, -- Tankut Baris Aktemur <[email protected]> Intel Deutschland GmbH Registered Address: Am Campeon 10, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928
