https://bugs.kde.org/show_bug.cgi?id=502530
Bug ID: 502530 Summary: When trying to use the transcribe feature I get an error and no subtitles Classification: Applications Product: kdenlive Version: 24.12.3 Platform: Homebrew (macOS) OS: macOS Status: REPORTED Severity: normal Priority: NOR Component: Title Clips & Subtitles Assignee: j...@kdenlive.org Reporter: roy432002...@gmail.com Target Milestone: --- SUMMARY I wanted to use the transcribe feature to add subtitles to my home movie, but I keep getting "No speech detected" and an error when I press "Show log" I had to download the whisper model myself since the downloader from Kdenlive seems to be stuck; I don't know if this might be relevant. STEPS TO REPRODUCE 1. Go to a sequence 2. Select a clip in it 3. Press "Transcribe" in the "Speech Editor" menu OBSERVED RESULT "No speech detected" and the following error: /Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py:75: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint = torch.load(fp, map_location=device) Traceback (most recent call last): File "/Users/<My Username>/Library/Application Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 58, in load_audio out = run(cmd, capture_output=True, check=True).stdout File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', '/private/var/folders/_t/3_t8tnnx3cb0j7bdgw3hsdd40000gn/T/kdenlive-ZcKKVn.wav', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 183. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py", line 176, in <module> sys.exit(main()) File "/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py", line 158, in main result = run_whisper(source, model, device, task, language) File "/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py", line 140, in run_whisper result = loadedModel.transcribe(source, **transcribe_kwargs) File "/Users/<My Username>/Library/Application Support/kdenlive/venv/lib/python3.9/site-packages/whisper/transcribe.py", line 133, in transcribe mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES) File "/Users/<My Username>/Library/Application Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 140, in log_mel_spectrogram audio = load_audio(audio) File "/Users/<My Username>/Library/Application Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 60, in load_audio raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e RuntimeError: Failed to load audio: ffmpeg version 7.1 Copyright (c) 2000-2024 the FFmpeg developers built with Apple clang version 15.0.0 (clang-1500.3.9.4) configuration: --enable-libmp3lame --cc=/usr/bin/clang --cxx=/usr/bin/clang++ --enable-libopus --enable-libvorbis --enable-libvpx --enable-libass --enable-libaom --enable-libdav1d --enable-libzimg --arch=arm64 --disable-debug --disable-doc --enable-gpl --enable-version3 --enable-nonfree --enable-openssl --disable-xlib --disable-libxcb --enable-libx264 --enable-libx265 --enable-rpath --install-name-dir='@rpath' --prefix=/Users/gitlab/ws/builds/GZwHuM5x/0/sysadmin/ci-management/macos-arm-clang --libdir=/Users/gitlab/ws/builds/GZwHuM5x/0/sysadmin/ci-management/macos-arm-clang/lib --disable-static --enable-shared libavutil 59. 39.100 / 59. 39.100 libavcodec 61. 19.100 / 61. 19.100 libavformat 61. 7.100 / 61. 7.100 libavdevice 61. 3.100 / 61. 3.100 libavfilter 10. 4.100 / 10. 4.100 libswscale 8. 3.100 / 8. 3.100 libswresample 5. 3.100 / 5. 3.100 libpostproc 58. 3.100 / 58. 3.100 [in#0 @ 0x60000313c200] Error opening input: Invalid data found when processing input Error opening input file /private/var/folders/_t/3_t8tnnx3cb0j7bdgw3hsdd40000gn/T/kdenlive-ZcKKVn.wav. Error opening input files: Invalid data found when processing input EXPECTED RESULT Some subtitles for my clip ADDITIONAL INFORMATION Even though I have downloaded the model manually, I did pass the "Check model integrity" test in the "Manage models" menu, and I have "Check Configuration" and have updated dependencies, all in Kdenlive directly -- You are receiving this mail because: You are watching all bug changes.