Package: wnpp Severity: wishlist * Package name : vosk-api Version : 0.3.45 Upstream Contact: Alpha Cephei <https://github.com/alphacep/> * URL : https://alphacephei.com/vosk/ * License : Apache-2.0 Programming Lang: Jupyter, C++, Python, Java, ... Description : Offline speech recognition API
Vosk is an offline open source speech recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech, Polish. Vosk models are small (50 Mb) but provide continuous large vocabulary transcription, zero-latency response with streaming API, reconfigurable vocabulary and speaker identification. Speech recognition bindings implemented for various programming languages like Python, Java, Node.JS, C#, C++, Rust, Go and others. Vosk supplies speech recognition for chatbots, smart home appliances, virtual assistants. It can also create subtitles for movies, transcription for lectures and interviews. Vosk scales from small devices like Raspberry Pi or Android smartphone to big clusters. ---- Debian has been shipping speech recognition software for a while, mostly in the form of Sphinx, which is... well, it's not as good as one would imagine those things to be. Historically, such programs used to be extremely inaccurate and largely in the realm of sci-fi and play things, but recent advances in machine learning have shown tremendous progress in this area, which makes it possible make use of (free!) software to enable voice-driven applications of all sorts. vosk is an API layer that can be used by other programs to implement such solutions, and I think it would be a great addition to Debian. The models are small and all free although the licenses vary: https://alphacephei.com/vosk/models Also, it could be possible to just package the API bits without shipping the models in Debian, which of course would be less useful, but more useful than nothing. I'm not exactly sure what our policy is on models, actually: the license of the models above is "free" in the sense that you can get the binary and do what you want with it, but i'm not sure it would pass the smell test of "wait, but where's the training data" kind of stuff. I leave that to people more familiar with those sticky issues and focus this RFP on the software side of things. Also bewarned that I'm only peripherally familiar with ML and current developments in AI. I mostly fell (again) on vosk because of Numen: https://numenvoice.org/ There are other models out there, that might be better targets. For example, <https://ggml.ai/> "is a tensor library for machine learning to enable large models and high performance on commodity hardware. It is used by llama.cpp and whisper.cpp". But the latter two are on relatively shakier legal grounds, as far as Debian is concerned. There's also Mozilla's <https://github.com/mozilla/DeepSpeech>, "an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers." And there's a whole area of "home assistants" that have their own way of doing things. This is just one of them, and I would be happy to hear what's the best solution for this problem space.