vosk Command: Examples, Options, and Usage

常用示例

Transcribe audio to text file

vosk-transcriber -i [audio.mp4] -o [output.txt]

Generate SRT subtitles

vosk-transcriber -i [video.mp4] -t srt -o [subtitles.srt]

Transcribe in specific language

vosk-transcriber -l [fr] -i [audio.m4a] -o [output.txt]

List available languages

vosk-transcriber --list-languages

Transcribe from microphone (vosk-cli)

vosk mic -o [output.txt] -t english

Transcribe audio file (vosk-cli)

vosk rec -c english [audio.mp3]

说明

vosk is an offline speech recognition toolkit supporting 20+ languages including English, German, French, Spanish, Chinese, Russian, and many others. It provides automatic speech recognition without requiring internet connectivity. The vosk-transcriber CLI processes audio and video files, automatically downloading appropriate language models on first use. Output formats include plain text and subtitle formats (SRT, VTT) with timestamps. The toolkit can also run as a WebSocket server for real-time streaming recognition. Models range from small (50MB) for mobile/embedded use to large models for maximum accuracy. The underlying engine supports continuous transcription, speaker identification, and vocabulary customization.

参数

-i _file_: Input audio or video file.
-o _file_: Output file for transcription.
-t _format_: Output format: txt, srt, vtt.
-l _lang_: Language code (en, fr, de, es, ru, etc.).
--list-languages: Show available language models.
--model _path_: Use custom model directory.
--show-words: Include word-level timestamps.
--server: Start WebSocket recognition server.

FAQ

What is the vosk command used for?

vosk is an offline speech recognition toolkit supporting 20+ languages including English, German, French, Spanish, Chinese, Russian, and many others. It provides automatic speech recognition without requiring internet connectivity. The vosk-transcriber CLI processes audio and video files, automatically downloading appropriate language models on first use. Output formats include plain text and subtitle formats (SRT, VTT) with timestamps. The toolkit can also run as a WebSocket server for real-time streaming recognition. Models range from small (50MB) for mobile/embedded use to large models for maximum accuracy. The underlying engine supports continuous transcription, speaker identification, and vocabulary customization.

How do I run a basic vosk example?

Run `vosk-transcriber -i [audio.mp4] -o [output.txt]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -i _file_ do in vosk?

Input audio or video file.

vosk 命令

常用示例

说明

参数

FAQ

相关命令