Package: rocblas
Version: 5.5.1+dfsg-5
Tags: patch

When compiling llama.cpp with ROCm support and running it, I get a
illegal instruction crash in the binary.  The cause seem to be that
rocblas is built with -mf16c.

I built llama.cpp using this command line:

  HIPCXX=clang-17 cmake -H. -Bbuild -DGGML_HIPBLAS=ON 
-DCMAKE_HIP_ARCHITECTURES="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102"
 -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=ON

I see the crash after downloading a model from huggingface and starting
bin/llama-cli using this model.  Using valgrind, I get this report from
the crash:

   ==27243== Warning: set address range perms: large range [0x221c55000,
   0x231e56000) (noaccess)
   llama_kv_cache_init:      ROCm0 KV buffer size =   256,00 MiB
   llama_new_context_with_model: KV self size  =  256,00 MiB, K (f16):  128,00
   MiB, V (f16):  128,00 MiB
   llama_new_context_with_model:  ROCm_Host  output buffer size =     0,12 MiB
   llama_new_context_with_model:      ROCm0 compute buffer size =   164,00 MiB
   llama_new_context_with_model:  ROCm_Host compute buffer size =    12,01 MiB
   llama_new_context_with_model: graph nodes  = 1030
   llama_new_context_with_model: graph splits = 2
   vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x79 0x13 0xC0 0xC5
   0xF0 0x57 0xC9 0xC5
   vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
   vex amd64->IR:   VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38
   vex amd64->IR:   PFX.66=1 PFX.F2=0 PFX.F3=0
   ==27243== valgrind: Unrecognised instruction at address 0x1331a8a8.
   ==27243==    at 0x1331A8A8: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13326E28: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13157CBA: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13155D51: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1314DB31: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1314B477: rocblas_gemm_batched_ex (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1305CD09: hipblasGemmBatchedEx (in
   /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1)
   ==27243==    by 0x4AA55CD:
   ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor
   const*, ggml_tensor const*, ggml_tensor*) (in
   /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*,
   ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in
   /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x48D1A32: llama_decode (in
   /home/pere/src/ki/llama.cpp/build/src/libllama.so)
   ==27243==    by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in
   /home/pere/src/ki/llama.cpp/build/bin/llama-cli)
   ==27243== Your program just tried to execute an instruction that Valgrind
   ==27243== did not recognise.  There are two possible reasons for this.
   ==27243== 1. Your program has a bug and erroneously jumped to a non-code
   ==27243==    location.  If you are running Memcheck and you just saw a
   ==27243==    warning about a bad jump, it's probably your program's fault.
   ==27243== 2. The instruction is legitimate but Valgrind doesn't handle it,
   ==27243==    i.e. it's Valgrind's fault.  If you think this is the case or
   ==27243==    you are not sure, please let us know and we'll try to fix it.
   ==27243== Either way, Valgrind will now raise a SIGILL signal which will
   ==27243== probably kill your program.
   ==27243== 
   ==27243== Process terminating with default action of signal 4 (SIGILL)
   ==27243==  Illegal opcode at address 0x1331A8A8
   ==27243==    at 0x1331A8A8: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13326E28: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13157CBA: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13155D51: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1314DB31: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1314B477: rocblas_gemm_batched_ex (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1305CD09: hipblasGemmBatchedEx (in
   /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1)
   ==27243==    by 0x4AA55CD:
   ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor
   const*, ggml_tensor const*, ggml_tensor*) (in
   /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*,
   ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in
   /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x48D1A32: llama_decode (in
   /home/pere/src/ki/llama.cpp/build/src/libllama.so)
   ==27243==    by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in
   /home/pere/src/ki/llama.cpp/build/bin/llama-cli)
   ==27243== 
   ==27243== HEAP SUMMARY:
   ==27243==     in use at exit: 659,260,263 bytes in 3,380,913 blocks
   ==27243==   total heap usage: 19,712,537 allocs, 16,331,624 frees,
   5,271,145,975 bytes allocated
   ==27243== 
   ==27243== LEAK SUMMARY:
   ==27243==    definitely lost: 120 bytes in 3 blocks
   ==27243==    indirectly lost: 2,422 bytes in 45 blocks
   ==27243==      possibly lost: 18,964 bytes in 160 blocks
   ==27243==    still reachable: 659,238,757 bytes in 3,380,705 blocks
   ==27243==                       of which reachable via heuristic:
   ==27243==                         multipleinheritance: 1,056 bytes in 12
   blocks
   ==27243==         suppressed: 0 bytes in 0 blocks
   ==27243== Rerun with --leak-check=full to see details of leaked memory
   ==27243== 
   ==27243== For lists of detected and suppressed errors, rerun with: -s
   ==27243== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
   Ulovlig instruksjon (SIGILL)

Accoring to Cory Bloor, The disassembly of those bytes show that it is
the vcvtph2ps instruction causing the crash:

  0:  c4 e2 79 13 c0          vcvtph2ps xmm0,xmm0
  5:  c5 f0 57 c9             vxorps xmm1,xmm1,xmm1
  9:  c5                      .byte 0xc5

I managed to avoid the crash and get llama.cpp working by applying the
following patch and rebuilding rocblas:

--- rocblas-5.5.1+dfsg.orig/library/src/CMakeLists.txt
+++ rocblas-5.5.1+dfsg/library/src/CMakeLists.txt
@@ -411,7 +411,7 @@ endif()
 #  -fno-gpu-rdc compiler option was used with hcc, so revisit feature at some 
point
 
 # GCC or hip-clang needs specific flags to turn on f16c intrinsics
-target_compile_options( rocblas PRIVATE -mf16c )
+#target_compile_options( rocblas PRIVATE -mf16c )
 
 # Do not allow Variable Length Arrays (use unique_ptr instead)
 target_compile_options( rocblas PRIVATE -Werror=vla )

Please consider including it in an upload to Debian.

According to https://github.com/ROCm/rocBLAS/issues/1422 and
<URL: 
https://github.com/ROCm/rocBLAS/commit/c6bc09073959a2881a701b88ae1ed9de469354f1 
>,
the issue might already be fixed upstream, but I have not tested that
version.

See also <URL: https://lists.debian.org/debian-ai/2024/07/msg00007.html >.

-- 
Happy hacking
Petter Reinholdtsen

Reply via email to