Package: rocblas Version: 5.5.1+dfsg-5 Tags: patch
When compiling llama.cpp with ROCm support and running it, I get a illegal instruction crash in the binary. The cause seem to be that rocblas is built with -mf16c. I built llama.cpp using this command line: HIPCXX=clang-17 cmake -H. -Bbuild -DGGML_HIPBLAS=ON -DCMAKE_HIP_ARCHITECTURES="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102" -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=ON I see the crash after downloading a model from huggingface and starting bin/llama-cli using this model. Using valgrind, I get this report from the crash: ==27243== Warning: set address range perms: large range [0x221c55000, 0x231e56000) (noaccess) llama_kv_cache_init: ROCm0 KV buffer size = 256,00 MiB llama_new_context_with_model: KV self size = 256,00 MiB, K (f16): 128,00 MiB, V (f16): 128,00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0,12 MiB llama_new_context_with_model: ROCm0 compute buffer size = 164,00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 12,01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 2 vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x79 0x13 0xC0 0xC5 0xF0 0x57 0xC9 0xC5 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38 vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==27243== valgrind: Unrecognised instruction at address 0x1331a8a8. ==27243== at 0x1331A8A8: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13326E28: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13157CBA: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13155D51: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1314DB31: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1314B477: rocblas_gemm_batched_ex (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1305CD09: hipblasGemmBatchedEx (in /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1) ==27243== by 0x4AA55CD: ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x48D1A32: llama_decode (in /home/pere/src/ki/llama.cpp/build/src/libllama.so) ==27243== by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in /home/pere/src/ki/llama.cpp/build/bin/llama-cli) ==27243== Your program just tried to execute an instruction that Valgrind ==27243== did not recognise. There are two possible reasons for this. ==27243== 1. Your program has a bug and erroneously jumped to a non-code ==27243== location. If you are running Memcheck and you just saw a ==27243== warning about a bad jump, it's probably your program's fault. ==27243== 2. The instruction is legitimate but Valgrind doesn't handle it, ==27243== i.e. it's Valgrind's fault. If you think this is the case or ==27243== you are not sure, please let us know and we'll try to fix it. ==27243== Either way, Valgrind will now raise a SIGILL signal which will ==27243== probably kill your program. ==27243== ==27243== Process terminating with default action of signal 4 (SIGILL) ==27243== Illegal opcode at address 0x1331A8A8 ==27243== at 0x1331A8A8: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13326E28: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13157CBA: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13155D51: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1314DB31: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1314B477: rocblas_gemm_batched_ex (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1305CD09: hipblasGemmBatchedEx (in /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1) ==27243== by 0x4AA55CD: ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x48D1A32: llama_decode (in /home/pere/src/ki/llama.cpp/build/src/libllama.so) ==27243== by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in /home/pere/src/ki/llama.cpp/build/bin/llama-cli) ==27243== ==27243== HEAP SUMMARY: ==27243== in use at exit: 659,260,263 bytes in 3,380,913 blocks ==27243== total heap usage: 19,712,537 allocs, 16,331,624 frees, 5,271,145,975 bytes allocated ==27243== ==27243== LEAK SUMMARY: ==27243== definitely lost: 120 bytes in 3 blocks ==27243== indirectly lost: 2,422 bytes in 45 blocks ==27243== possibly lost: 18,964 bytes in 160 blocks ==27243== still reachable: 659,238,757 bytes in 3,380,705 blocks ==27243== of which reachable via heuristic: ==27243== multipleinheritance: 1,056 bytes in 12 blocks ==27243== suppressed: 0 bytes in 0 blocks ==27243== Rerun with --leak-check=full to see details of leaked memory ==27243== ==27243== For lists of detected and suppressed errors, rerun with: -s ==27243== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Ulovlig instruksjon (SIGILL) Accoring to Cory Bloor, The disassembly of those bytes show that it is the vcvtph2ps instruction causing the crash: 0: c4 e2 79 13 c0 vcvtph2ps xmm0,xmm0 5: c5 f0 57 c9 vxorps xmm1,xmm1,xmm1 9: c5 .byte 0xc5 I managed to avoid the crash and get llama.cpp working by applying the following patch and rebuilding rocblas: --- rocblas-5.5.1+dfsg.orig/library/src/CMakeLists.txt +++ rocblas-5.5.1+dfsg/library/src/CMakeLists.txt @@ -411,7 +411,7 @@ endif() # -fno-gpu-rdc compiler option was used with hcc, so revisit feature at some point # GCC or hip-clang needs specific flags to turn on f16c intrinsics -target_compile_options( rocblas PRIVATE -mf16c ) +#target_compile_options( rocblas PRIVATE -mf16c ) # Do not allow Variable Length Arrays (use unique_ptr instead) target_compile_options( rocblas PRIVATE -Werror=vla ) Please consider including it in an upload to Debian. According to https://github.com/ROCm/rocBLAS/issues/1422 and <URL: https://github.com/ROCm/rocBLAS/commit/c6bc09073959a2881a701b88ae1ed9de469354f1 >, the issue might already be fixed upstream, but I have not tested that version. See also <URL: https://lists.debian.org/debian-ai/2024/07/msg00007.html >. -- Happy hacking Petter Reinholdtsen