Hi,

Seems to work with my Radeon card.

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (RADV NAVI21) (radv) | uma: 0 | fp16: 1 
| warp size: 64 | matrix cores: none

llama-cli -m ./mistral-7b-instruct-v0.2.Q8_0.gguf
[..]
llama_perf_sampler_print:    sampling time =       7.18 ms /   235 runs   (    
0.03 ms per token, 32720.69 tokens per second)
llama_perf_context_print:        load time =    1298.46 ms
llama_perf_context_print: prompt eval time =    4730.92 ms /    23 tokens (  
205.69 ms per token,     4.86 tokens per second)
llama_perf_context_print:        eval time =   33557.39 ms /   221 runs   (  
151.84 ms per token,     6.59 tokens per second)
llama_perf_context_print:       total time =  157345.06 ms /   244 tokens


Using ngl to offload some layers to GPU :

llama-cli -m ./mistral-7b-instruct-v0.2.Q8_0.gguf -ngl 32
[...]
llama_perf_sampler_print:    sampling time =       7.44 ms /   257 runs   (    
0.03 ms per token, 34543.01 tokens per second)
llama_perf_context_print:        load time =    2210.50 ms
llama_perf_context_print: prompt eval time =    3417.91 ms /    22 tokens (  
155.36 ms per token,     6.44 tokens per second)
llama_perf_context_print:        eval time =    4990.24 ms /   244 runs   (   
20.45 ms per token,    48.90 tokens per second)
llama_perf_context_print:       total time =   11833.75 ms /   266 tokens



On Sat, Feb 1, 2025, at 05:31, Chris Cappuccio wrote:
> This might be a way to do it. Does anyone have a card to test against?
>
> Index: Makefile
> ===================================================================
> RCS file: /cvs/ports/misc/llama.cpp/Makefile,v
> retrieving revision 1.1
> diff -u -p -u -r1.1 Makefile
> --- Makefile  30 Jan 2025 22:55:11 -0000      1.1
> +++ Makefile  1 Feb 2025 04:20:02 -0000
> @@ -10,6 +10,7 @@ SHARED_LIBS +=              ggml-cpu 0.0
>  SHARED_LIBS +=               ggml 0.0
>  SHARED_LIBS +=               llama 0.0
>  SHARED_LIBS +=               llava_shared 0.0
> +SHARED_LIBS +=               ggml-vulkan 0.0
> 
>  CATEGORIES =         misc
> 
> @@ -18,11 +19,15 @@ HOMEPAGE =                https://github.com/ggerganov
>  # MIT
>  PERMIT_PACKAGE =     Yes
> 
> -WANTLIB +=           m pthread ${COMPILER_LIBCXX}
> +WANTLIB +=           m pthread vulkan ${COMPILER_LIBCXX}
> 
>  MODULES =            devel/cmake
> 
> +LIB_DEPENDS =                graphics/vulkan-loader
> +BUILD_DEPENDS =              graphics/shaderc
> +
>  CONFIGURE_ARGS =     -DGGML_CCACHE=Off \
> -                     -DGGML_NATIVE=Off
> +                     -DGGML_NATIVE=Off \
> +                     -DGGML_VULKAN=On
> 
>  .include <bsd.port.mk>
> Index: pkg/PLIST
> ===================================================================
> RCS file: /cvs/ports/misc/llama.cpp/pkg/PLIST,v
> retrieving revision 1.1
> diff -u -p -u -r1.1 PLIST
> --- pkg/PLIST 30 Jan 2025 22:55:11 -0000      1.1
> +++ pkg/PLIST 1 Feb 2025 04:20:02 -0000
> @@ -58,6 +58,7 @@ bin/convert_hf_to_gguf.py
>  @bin bin/test-tokenizer-0
>  @bin bin/test-tokenizer-1-bpe
>  @bin bin/test-tokenizer-1-spm
> +@bin bin/vulkan-shaders-gen
>  include/ggml-alloc.h
>  include/ggml-backend.h
>  include/ggml-blas.h
> @@ -74,7 +75,6 @@ include/ggml.h
>  include/gguf.h
>  include/llama-cpp.h
>  include/llama.h
> -lib/cmake/
>  lib/cmake/ggml/
>  lib/cmake/ggml/ggml-config.cmake
>  lib/cmake/ggml/ggml-version.cmake
> @@ -83,6 +83,7 @@ lib/cmake/llama/llama-config.cmake
>  lib/cmake/llama/llama-version.cmake
>  @lib lib/libggml-base.so.${LIBggml-base_VERSION}
>  @lib lib/libggml-cpu.so.${LIBggml-cpu_VERSION}
> +@lib lib/libggml-vulkan.so.${LIBggml-vulkan_VERSION}
>  @lib lib/libggml.so.${LIBggml_VERSION}
>  @lib lib/libllama.so.${LIBllama_VERSION}
>  @lib lib/libllava_shared.so.${LIBllava_shared_VERSION}

Reply via email to