faster-whisper Command: Examples, Options, and Usage

常用示例

Transcribe an audio file

faster-whisper [audio.mp3]

Transcribe with a specific model

faster-whisper [audio.mp3] --model [large-v3]

Transcribe with language hint

faster-whisper [audio.mp3] --language [en]

Output as SRT subtitles

faster-whisper [audio.mp3] --output_format srt

Translate to English

faster-whisper [audio.mp3] --task translate

Save output to directory

faster-whisper [audio.mp3] --output_dir [/path/to/output]

Transcribe with word timestamps

faster-whisper [audio.mp3] --word_timestamps true

说明

faster-whisper is a reimplementation of OpenAI's Whisper using CTranslate2, a fast inference engine for Transformer models. It provides up to 4x faster transcription than the original Whisper while using less memory. The tool supports all Whisper model sizes. Larger models are more accurate but slower. The compute type parameter controls precision: int8 is fastest and most memory-efficient, float16 is a good balance on GPU, float32 is highest precision. Voice activity detection (VAD) filtering skips silent sections, improving both speed and accuracy. Language detection is automatic but specifying the language avoids detection overhead. The base library is installed via `pip install faster-whisper` (Python API only). For CLI usage, install a wrapper such as `pip install faster-whisper-cli` or `pip install whisper-ctranslate2`. CTranslate2 handles model conversion automatically. GPU acceleration requires CUDA toolkit.

参数

--model _SIZE_: Model size: tiny, base, small, medium, large-v1, large-v2, large-v3 (default: small).
--language _LANG_: Language code (en, de, fr, etc.) or auto-detect.
--task _TASK_: Task: transcribe or translate.
--output_format _FORMAT_: Output format: txt, vtt, srt, tsv, json, all.
--output_dir _DIR_: Output directory for results.
--word_timestamps _BOOL_: Include word-level timestamps.
--device _DEVICE_: Device: cpu, cuda, auto (default: auto).
--compute_type _TYPE_: Compute type: int8, float16, float32 (default: int8 on CPU).
--beam_size _N_: Beam search size (default: 5).
--vad_filter _BOOL_: Enable voice activity detection filter (uses Silero VAD).
--initial_prompt _TEXT_: Optional text to provide as initial prompt for the decoder.
--threads _N_: Number of CPU threads.

FAQ

What is the faster-whisper command used for?

faster-whisper is a reimplementation of OpenAI's Whisper using CTranslate2, a fast inference engine for Transformer models. It provides up to 4x faster transcription than the original Whisper while using less memory. The tool supports all Whisper model sizes. Larger models are more accurate but slower. The compute type parameter controls precision: int8 is fastest and most memory-efficient, float16 is a good balance on GPU, float32 is highest precision. Voice activity detection (VAD) filtering skips silent sections, improving both speed and accuracy. Language detection is automatic but specifying the language avoids detection overhead. The base library is installed via `pip install faster-whisper` (Python API only). For CLI usage, install a wrapper such as `pip install faster-whisper-cli` or `pip install whisper-ctranslate2`. CTranslate2 handles model conversion automatically. GPU acceleration requires CUDA toolkit.

How do I run a basic faster-whisper example?

Run `faster-whisper [audio.mp3]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does --model _SIZE_ do in faster-whisper?

Model size: tiny, base, small, medium, large-v1, large-v2, large-v3 (default: small).

faster-whisper 命令

常用示例

说明

参数

FAQ

相关命令