Linux command
vosk 命令
文本
复制后可按需替换文件名、目录或参数。
常用示例
Transcribe audio to text file
vosk-transcriber -i [audio.mp4] -o [output.txt]
Generate SRT subtitles
vosk-transcriber -i [video.mp4] -t srt -o [subtitles.srt]
Transcribe in specific language
vosk-transcriber -l [fr] -i [audio.m4a] -o [output.txt]
List available languages
vosk-transcriber --list-languages
Transcribe from microphone (vosk-cli)
vosk mic -o [output.txt] -t english
Transcribe audio file (vosk-cli)
vosk rec -c english [audio.mp3]
说明
vosk is an offline speech recognition toolkit supporting 20+ languages including English, German, French, Spanish, Chinese, Russian, and many others. It provides automatic speech recognition without requiring internet connectivity. The vosk-transcriber CLI processes audio and video files, automatically downloading appropriate language models on first use. Output formats include plain text and subtitle formats (SRT, VTT) with timestamps. The toolkit can also run as a WebSocket server for real-time streaming recognition. Models range from small (50MB) for mobile/embedded use to large models for maximum accuracy. The underlying engine supports continuous transcription, speaker identification, and vocabulary customization.
参数
- -i _file_
- Input audio or video file.
- -o _file_
- Output file for transcription.
- -t _format_
- Output format: txt, srt, vtt.
- -l _lang_
- Language code (en, fr, de, es, ru, etc.).
- --list-languages
- Show available language models.
- --model _path_
- Use custom model directory.
- --show-words
- Include word-level timestamps.
- --server
- Start WebSocket recognition server.
FAQ
What is the vosk command used for?
vosk is an offline speech recognition toolkit supporting 20+ languages including English, German, French, Spanish, Chinese, Russian, and many others. It provides automatic speech recognition without requiring internet connectivity. The vosk-transcriber CLI processes audio and video files, automatically downloading appropriate language models on first use. Output formats include plain text and subtitle formats (SRT, VTT) with timestamps. The toolkit can also run as a WebSocket server for real-time streaming recognition. Models range from small (50MB) for mobile/embedded use to large models for maximum accuracy. The underlying engine supports continuous transcription, speaker identification, and vocabulary customization.
How do I run a basic vosk example?
Run `vosk-transcriber -i [audio.mp4] -o [output.txt]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -i _file_ do in vosk?
Input audio or video file.