Hello,

We (Intel) would like to submit patches to enable fundamental debug
support for Intel GPU devices.  In the future, we plan to add more
patches that improve the performance and the user experience.
Those patches are already available in the downstream "Intel
Distribution for GDB" debugger at

  https://github.com/intel/gdb

The v1 of the submission is available at

  https://sourceware.org/pipermail/gdb-patches/2024-July/210264.html

and v2 is available at

  https://sourceware.org/pipermail/gdb-patches/2024-December/214029.html

This revision (v3) makes the following changes:

  - The comments that have been received so far are addressed.

  - Patches are rebased on the master branch.

  - A number of patches have been refactored to improve the code.

GPU threads operate in a SIMD/SIMT (single instruction multiple data,
single instruction multiple thread) manner: they are vectorized, where
each lane (also known as "execution channel") executes the same
instruction but using different data values.  Lanes of the same thread
execute in a lock-step movement.  Displaying the value of a source
program variable therefore requires not only a thread context but also
a lane context.  GDB currently does not have this knowledge built-in.
Furthermore, some DWARF extensions are necessary to express data
locations in a lane-relative way, which are currently under discussion
of or to be submitted to the DWARF committee.  Hence, with this
submission, variables may appear with an error like "<error reading
variable: Unhandled dwarf expression opcode 0xed>".  Similar
restrictions apply also to the AMD ROCm (AMDGPU) target in the
upstream GDB for the same reasons.  The downstream "Intel Distribution
for GDB" debugger implements lane support as well as DWARF extensions
and hence is able to print lane-relative values properly.  Lane
support is a future GDB topic; see a BoF hosted by Intel and AMD in
GNU Tools Cauldron 2024 for more details if interested.

We provide a gdbserver low target definition.  The target uses the
Level-Zero debug API:

  https://spec.oneapi.io/level-zero/latest/tools/PROG.html#program-debug
  https://spec.oneapi.io/level-zero/latest/tools/api.html#debug

The user-space implementation of the Level-Zero Debug API comes from
"Intel(R) Graphics Compute Runtime for oneAPI Level Zero and
OpenCL(TM) Driver":

  https://github.com/intel/compute-runtime

The kernel-space implementation of the Level-Zero Debug API, i.e.  the
'eudebug' feature of the "Xe Intel graphics driver", is under submission:

  https://lists.freedesktop.org/archives/intel-xe/2024-December/061476.html (v3)
  https://lists.freedesktop.org/archives/intel-xe/2024-October/052260.html (v2)
  https://lists.freedesktop.org/archives/intel-xe/2024-July/043605.html (v1)

For Level-Zero based devices, we model hardware threads.  There is one
GDB thread for each hardware thread on the device.  We opted for this
model for the following reasons:

  - Programs that use GPUs to accelerate computation typically offload
    many computation kernels.  Hence, software threads in GPUs have
    much shorter lives than threads in multi-threaded CPU programs.
    For real-world cases, the data processed by the GPU is typically
    large, causing the number of software threads to be usually higher
    than the number of available hardware threads.  Therefore, dealing
    with software threads may cause proliferation of threads.
    Modeling hardware threads, on the other hand, means that they
    would be created once at the beginning of the debug session and
    then the list of threads stays stable.

  - As of today, Intel GPUs do not switch context for threads.  That
    is, once a software thread is assigned to run on a particular
    hardware thread, it always runs on that hardware thread until
    termination.  Therefore, focusing on a hardware thread does not
    create context switch confusion for the user that would otherwise
    be experienced with e.g. CPU threads.

Hardware threads may be idle inbetween computation kernel executions
or when a kernel does not utilize the GPU fully.  They may also be
used by applications other than the one currently under debug.  During
these times, those hardware threads cannot be interacted with
(e.g. cannot be interrupted) by the current debug user and appear as
unavailable.  To handle this case, we introduce an UNAVAILABLE wait
kind and also model it as a thread execution state.  In particular,
UNAVAILABLE means that we have tried to stop the thread and failed.

The Intel GPU target can be used in combination with a native target,
relying on GDB's multi-target feature, to debug the GPU and the host
application in the same debug session.  For this, bring the native app
(e.g.  a SYCL [https://www.khronos.org/sycl/] program) to a state
where the Level-Zero backend for the GPU has been initialized (e.g.
after the first queue has been created in SYCL), then create a
gdbserver instance and connect to it from a second inferior.
At GNU Tools Cauldron 2024, we gave a talk presenting the approach.
We'd welcome interested parties to watch the recording.

  https://www.youtube.com/watch?v=sYep57kjvHM

Below is a sample session that shows how to set up inferiors and
targets manually.  In the downstream debugger, a Python script is used
to take these steps in an automated manner for better user experience.

  $ gdb demo
  ...
  (gdb) maintenance set target-non-stop on
  (gdb) tbreak 60
  Temporary breakpoint 1 at 0x4049c8: file demo.cpp, line 60.
  (gdb) run
  ...
  [SYCL] Using device: [Intel(R) Graphics ...] from [Intel(R) oneAPI Unified 
Runtime over Level-Zero]

  Thread 1 "demo" hit Temporary breakpoint 1, main (argc=1, 
argv=0x7fffffffd9b8) at demo.cpp:60
  60          range data_range{length};
  (gdb)

  # Connect the Intel GT gdbserver by specifying the host inferior PID.

  (gdb) add-inferior -no-connection
  [New inferior 2]
  Added inferior 2
  (gdb) inferior 2
  [Switching to inferior 2 [<null>] (<noexec>)]
  (gdb) info inferiors
    Num  Description       Connection           Executable
    1    process 16458     1 (native)           /temp/demo
  * 2    <null>
  (gdb) target remote | gdbserver-intelgt --attach - 16458
  Remote debugging using | gdbserver-intelgt --attach - 16458
  Attached; given pid = 16458, updated to 1
  Remote debugging using stdio
  <unavailable> in ?? ()
  (gdb)

We also include patches for the testsuite, where we introduce the
infrastructure and a number of test cases using SYCL.

For convenience, the patches in this series are available at

  https://github.com/intel/gdb/tree/upstream/intelgt-mvp

To those who may want to try the debugger, we also provide

  https://github.com/intel/gdb/tree/upstream/intelgt-mvp-plus

with a number of additional patches (not yet upstreamed) that bring
(1) lane support, (2) ability to make GPU threads do inferior calls
for expression evaluation, (3) a minimal Python script that starts and
connects gdbserver-intelgt automatically for more convenience.
Submission of these additional features (and more) is planned for
future after the fundamental debug support is accepted and after
aligning with the upstream community on GPU support (e.g. lanes) in
GDB.

For those who might be interested, below is a link to the instructions
that explain how to build the debug sw stack (the kernel-mode and
user-mode drivers, plus GDB) from the patches submitted to upstream.

  https://gitlab.freedesktop.org/miku/kernel

Best regards,
Baris

---
Albertano Caruso (2):
      gdb, intelgt: add disassemble feature for the Intel GT architecture.
      testsuite, arch, intelgt: add a disassembly test

Klaus Gerlicher (1):
      gdb, ze: on a whole process stop, mark all threads as not_resumed

Markus Metzger (13):
      gdb, arch, intelgt: add intelgt arch definitions
      gdb, gdbserver, ze: in-memory libraries
      gdb, gdbserver, rsp, ze: acknowledge libraries
      gdb, solib, ze: update target_solib_ops::bfd_open_from_target_memory
      gdb, infrun, ze: allow saving process events
      gdb, ze: add TARGET_WAITKIND_UNAVAILABLE
      gdb, infrun, ze: handle stopping unavailable threads
      gdb, infrun, ze: allow resuming unavailable threads
      gdb, gdbserver, ze: add U stop reply
      gdb, gdbserver, ze: add library notification to U stop reply
      gdbserver: wait for stopped threads in queue_stop_reply_callback
      gdb, dwarf, ze: add DW_OP_INTEL_regval_bits
      gdbserver, ze, intelgt: introduce ze-low and intel-ze-low targets

Natalia Saiapova (2):
      bfd: add intelgt target to BFD
      gdb: do not create a thread after a process event.

Nils-Christian Kempke (1):
      gdb, gdbserver, gdbsupport: add 'device' tag to XML target description

Tankut Baris Aktemur (25):
      gdb, intelgt: add intelgt as a basic machine
      ld: add intelgt as a target configuration
      opcodes: add intelgt as a configuration
      gdb, intelgt: add the target-dependent definitions for the Intel GT 
architecture
      gdbserver, ze: report TARGET_WAITKIND_UNAVAILABLE events
      gdb, ze: handle TARGET_WAITKIND_UNAVAILABLE in stop_all_threads
      gdb, remote: handle thread unavailability in print_one_stopped_thread
      gdb, remote: do 'remote_add_inferior' in 'remote_notice_new_inferior' 
earlier
      gdb, remote: handle a generic process PID in remote_notice_new_inferior
      gdb, remote: handle a generic process PID in process_stop_reply
      gdb: use the pid from inferior in setup_inferior
      gdb: revise the pid_to_exec_file target op
      gdb: load solibs if the target does not have the notion of an exec file
      gdbserver: import AC_LIB_HAVE_LINKFLAGS macro into the autoconf script
      gdbserver: add a pointer to the owner thread in regcache
      gdbserver: adjust pid after the target attaches
      gdbserver: allow configuring for a heterogeneous target
      testsuite, sycl: add SYCL support
      testsuite, sycl: add test for backtracing inside a kernel
      testsuite, sycl: add test for 'info locals' and 'info args'
      testsuite, sycl: add tests for stepping and accessing data elements
      testsuite, sycl: add test for 1-D and 2-D parallel_for kernels
      testsuite, sycl: add test for scheduler-locking
      testsuite, arch, intelgt: add intelgt-program-bp.exp
      testsuite, sycl: test canceling a stepping flow

 bfd/Makefile.am                                 |    2 +
 bfd/Makefile.in                                 |    4 +
 bfd/archures.c                                  |    4 +
 bfd/bfd-in2.h                                   |    6 +
 bfd/config.bfd                                  |   13 +-
 bfd/configure                                   |    1 +
 bfd/configure.ac                                |    1 +
 bfd/cpu-intelgt.c                               |   57 +
 bfd/elf64-intelgt.c                             |  195 ++
 bfd/libbfd.h                                    |    2 +
 bfd/reloc.c                                     |    7 +
 bfd/targets.c                                   |    2 +
 binutils/dwarf.c                                |    6 +
 binutils/readelf.c                              |    9 +
 config.sub                                      |    4 +-
 gdb/Makefile.in                                 |    8 +-
 gdb/NEWS                                        |   25 +
 gdb/README                                      |    8 +
 gdb/arch/intelgt.c                              |  191 ++
 gdb/arch/intelgt.h                              |  186 ++
 gdb/config.in                                   |    3 +
 gdb/configure                                   |  559 ++++-
 gdb/configure.ac                                |   52 +
 gdb/configure.tgt                               |    5 +
 gdb/disasm-selftests.c                          |   12 +
 gdb/doc/gdb.texinfo                             |  162 +-
 gdb/dwarf2/expr.c                               |   37 +
 gdb/dwarf2/expr.h                               |    5 +
 gdb/dwarf2/loc.c                                |    2 +
 gdb/exec.c                                      |    6 +
 gdb/features/gdb-target.dtd                     |   19 +-
 gdb/features/library-list.dtd                   |   22 +-
 gdb/fork-child.c                                |   10 +-
 gdb/gdbthread.h                                 |   12 +-
 gdb/infcmd.c                                    |   14 +-
 gdb/inferior.h                                  |    4 +
 gdb/infrun.c                                    |  124 +-
 gdb/intelgt-tdep.c                              |  986 ++++++++
 gdb/nat/fork-inferior.c                         |   10 +
 gdb/remote.c                                    |  227 +-
 gdb/selftest-arch.c                             |    6 +-
 gdb/solib-target.c                              |  153 +-
 gdb/solib-target.h                              |    3 +
 gdb/solib.c                                     |   81 +-
 gdb/solib.h                                     |   19 +-
 gdb/target-delegates-gen.c                      |   50 +
 gdb/target-descriptions.c                       |   19 +
 gdb/target.c                                    |   16 +
 gdb/target.h                                    |   24 +
 gdb/target/waitstatus.c                         |    1 +
 gdb/target/waitstatus.h                         |   22 +
 gdb/testsuite/README                            |    9 +
 gdb/testsuite/boards/intel-offload.exp          |   36 +
 gdb/testsuite/gdb.arch/intelgt-disassemble.exp  |   82 +
 gdb/testsuite/gdb.arch/intelgt-program-bp.exp   |   83 +
 gdb/testsuite/gdb.arch/sycl-simple.cpp          |   42 +
 gdb/testsuite/gdb.sycl/break.exp                |   63 +
 gdb/testsuite/gdb.sycl/break2.exp               |   66 +
 gdb/testsuite/gdb.sycl/call-stack.cpp           |   92 +
 gdb/testsuite/gdb.sycl/call-stack.exp           |  179 ++
 gdb/testsuite/gdb.sycl/info-locals-and-args.exp |   78 +
 gdb/testsuite/gdb.sycl/parallel-for-1D.cpp      |   72 +
 gdb/testsuite/gdb.sycl/parallel-for-1D.exp      |   55 +
 gdb/testsuite/gdb.sycl/parallel-for-2D.cpp      |   73 +
 gdb/testsuite/gdb.sycl/parallel-for-2D.exp      |   55 +
 gdb/testsuite/gdb.sycl/scheduler-locking.exp    |   67 +
 gdb/testsuite/gdb.sycl/single-task.cpp          |   50 +
 gdb/testsuite/gdb.sycl/step-canceled.exp        |   86 +
 gdb/testsuite/gdb.sycl/step-into-function.exp   |   47 +
 gdb/testsuite/gdb.sycl/step-parallel-for.exp    |   63 +
 gdb/testsuite/gdb.sycl/step.exp                 |   51 +
 gdb/testsuite/gdb.threads/killed-outside.exp    |    4 +
 gdb/testsuite/lib/gdb.exp                       |   17 +-
 gdb/testsuite/lib/intelgt-utils.exp             |   43 +
 gdb/testsuite/lib/sycl-devices.cpp              |  107 +
 gdb/testsuite/lib/sycl-hello.cpp                |   43 +
 gdb/testsuite/lib/sycl-util.cpp                 |  135 +
 gdb/testsuite/lib/sycl.exp                      |  410 ++++
 gdb/thread.c                                    |    2 +-
 gdb/top.c                                       |   10 +
 gdb/xml-tdesc.c                                 |   76 +
 gdbserver/Makefile.in                           |    4 +-
 gdbserver/acinclude.m4                          |    5 +
 gdbserver/config.in                             |    6 +
 gdbserver/configure                             |  500 ++++
 gdbserver/configure.ac                          |   18 +
 gdbserver/configure.srv                         |   15 +-
 gdbserver/dll.cc                                |  175 +-
 gdbserver/dll.h                                 |   54 +-
 gdbserver/gdbthread.h                           |    2 +-
 gdbserver/intelgt-ze-low.cc                     | 1016 ++++++++
 gdbserver/linux-low.cc                          |    6 +-
 gdbserver/linux-low.h                           |    2 +-
 gdbserver/netbsd-low.cc                         |    2 +-
 gdbserver/netbsd-low.h                          |    2 +-
 gdbserver/regcache.cc                           |    1 +
 gdbserver/regcache.h                            |    3 +
 gdbserver/remote-utils.cc                       |   21 +
 gdbserver/server.cc                             |  319 ++-
 gdbserver/server.h                              |    7 +
 gdbserver/target.cc                             |   14 +
 gdbserver/target.h                              |   31 +-
 gdbserver/tdesc.cc                              |   16 +
 gdbserver/tdesc.h                               |    3 +
 gdbserver/win32-low.cc                          |    4 +-
 gdbserver/win32-low.h                           |    2 +-
 gdbserver/ze-low.cc                             | 2996 +++++++++++++++++++++++
 gdbserver/ze-low.h                              |  496 ++++
 gdbsupport/tdesc.cc                             |   48 +
 gdbsupport/tdesc.h                              |   90 +
 include/dwarf2.def                              |    4 +
 include/elf/intelgt.h                           |   39 +
 ld/configure.tgt                                |    2 +
 opcodes/configure                               |    1 +
 opcodes/configure.ac                            |    1 +
 115 files changed, 11351 insertions(+), 146 deletions(-)
---
base-commit: 9af083a959a03ef068e1b7869263dddb4fb913c3
change-id: 20241213-upstream-intelgt-mvp-684d5f2f6730

Best regards,
-- 
Tankut Baris Aktemur <[email protected]>

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

Reply via email to