On 2025/11/03 15:15, Kirill A. Korinsky wrote:
> We don't have GPU but with -t 32 I had run Qwen3 VL 30B model on CPU only at
> AMD Ryzen 9 7950X3D with acceptable to use speed like 2 tokens/second which
> more or leass useble. But it requires memory. 120G as :datasize is enough.
>
> Because we uses libggml as dedicated port, it must to be updated to the last
> version, and it contains a bug which brokes large models under large number
> of threads: https://github.com/ggml-org/llama.cpp/issues/16960
i was hoping to hold off updating llama until there was a new ggml
release (implying they think it's stable-ish) rather than follow the
bleeding edge, but if you want then do it... please keep an eye on the
repo for fixes for any breakages though.
(of course with GH_COMMIT we also lose portroach notifications of
new versions).
> +GH_TAGNAME = b6934
> PKGNAME = llama.cpp-0.0.${GH_TAGNAME:S/b//}
>
> SHARED_LIBS += llama 2.0
usual case with C++ library updates, many symbol changes +/-.
please bump llama/mtmd major.
> GH_ACCOUNT= ggml-org
> GH_PROJECT= ggml
> -GH_TAGNAME= v0.9.4
> +GH_COMMIT= 09aa758381718f7731c148238574a7e169001f13
> +DISTNAME= ggml-0.9.4.20251101
please use '0.9.4pl20251101' so that in the event of an 0.9.4.1
release we don't need to mess with EPOCH.
> SHARED_LIBS += ggml 2.0
there are new functions in libggml-base, and new enums affecting at
least libggml. at least a minor bump is needed, but I don't want
to read enough code to decide whether minor is enough, so I'd go
for major in this case.
whisper still works, so with those changes it's ok with me.