← 返回命令列表

Linux command

vosk 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

Transcribe audio to text file

vosk-transcriber -i [audio.mp4] -o [output.txt]

Generate SRT subtitles

vosk-transcriber -i [video.mp4] -t srt -o [subtitles.srt]

Transcribe in specific language

vosk-transcriber -l [fr] -i [audio.m4a] -o [output.txt]

List available languages

vosk-transcriber --list-languages

Transcribe from microphone (vosk-cli)

vosk mic -o [output.txt] -t english

Transcribe audio file (vosk-cli)

vosk rec -c english [audio.mp3]

说明

vosk is an offline speech recognition toolkit supporting 20+ languages including English, German, French, Spanish, Chinese, Russian, and many others. It provides automatic speech recognition without requiring internet connectivity. The vosk-transcriber CLI processes audio and video files, automatically downloading appropriate language models on first use. Output formats include plain text and subtitle formats (SRT, VTT) with timestamps. The toolkit can also run as a WebSocket server for real-time streaming recognition. Models range from small (50MB) for mobile/embedded use to large models for maximum accuracy. The underlying engine supports continuous transcription, speaker identification, and vocabulary customization.

参数

-i _file_
Input audio or video file.
-o _file_
Output file for transcription.
-t _format_
Output format: txt, srt, vtt.
-l _lang_
Language code (en, fr, de, es, ru, etc.).
--list-languages
Show available language models.
--model _path_
Use custom model directory.
--show-words
Include word-level timestamps.
--server
Start WebSocket recognition server.

FAQ

What is the vosk command used for?

vosk is an offline speech recognition toolkit supporting 20+ languages including English, German, French, Spanish, Chinese, Russian, and many others. It provides automatic speech recognition without requiring internet connectivity. The vosk-transcriber CLI processes audio and video files, automatically downloading appropriate language models on first use. Output formats include plain text and subtitle formats (SRT, VTT) with timestamps. The toolkit can also run as a WebSocket server for real-time streaming recognition. Models range from small (50MB) for mobile/embedded use to large models for maximum accuracy. The underlying engine supports continuous transcription, speaker identification, and vocabulary customization.

How do I run a basic vosk example?

Run `vosk-transcriber -i [audio.mp4] -o [output.txt]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -i _file_ do in vosk?

Input audio or video file.