Linux command
whisper 命令
文本
复制后可按需替换文件名、目录或参数。
常用示例
Transcribe audio file
whisper [audio.mp3]
Transcribe with specific model
whisper --model [medium] [audio.mp3]
Transcribe with language hint
whisper --language [en] [audio.mp3]
Output specific format
whisper --output_format [srt] [audio.mp3]
Translate to English
whisper --task translate [audio.mp3]
Output to specific directory
whisper --output_dir [/path/to/output] [audio.mp3]
Transcribe multiple files
whisper [audio1.mp3] [audio2.wav]
Use GPU with float16
whisper --device cuda --fp16 True [audio.mp3]
说明
Whisper is OpenAI's automatic speech recognition (ASR) system. It transcribes audio in many languages and can translate to English. Model sizes trade accuracy for speed: tiny runs fastest, large is most accurate. The turbo model (default) offers a good balance, running ~8x faster than large with minor quality loss. The .en suffix (tiny.en, base.en) denotes English-only models, slightly better for English. The turbo model is not trained for translation tasks. Language detection is automatic but can be hinted. For non-English audio, specifying the language improves accuracy. Translation mode transcribes any language to English text. Output formats include plain text, subtitles (SRT, VTT), and JSON with timing data. Word-level timestamps enable karaoke-style highlighting. Processing uses GPU (CUDA) when available, significantly faster than CPU. The --fp16 flag enables half-precision math on compatible GPUs. Audio preprocessing handles various formats via FFmpeg. Long files are processed in segments with context maintained across segments.
参数
- --model _SIZE_
- Model size: tiny, base, small, medium, large, turbo (default: turbo). English-only variants: tiny.en, base.en, small.en, medium.en.
- --language _LANG_
- Language code (en, de, fr, etc.) or auto.
- --task _TASK_
- Task: transcribe or translate.
- --output_format _FORMAT_
- Output format: txt, vtt, srt, tsv, json, all.
- --output_dir _DIR_
- Output directory.
- --device _DEVICE_
- Device: cpu, cuda.
- --fp16 / --no-fp16
- Use float16 (GPU) or float32.
- --temperature _TEMP_
- Sampling temperature.
- --best_of _NUM_
- Number of candidates.
- --beam_size _NUM_
- Beam search size.
- --word_timestamps _BOOL_
- Include word-level timestamps.
- --condition_on_previous_text _BOOL_
- Use previous output as context.
- --verbose _BOOL_
- Show progress and transcription.
- --threads _NUM_
- CPU threads.
- --model_dir _DIR_
- Directory to save and load models (default: ~/.cache/whisper).
- --initial_prompt _TEXT_
- Optional text to provide as prompt for the first window.
- --clip_timestamps _TIMESTAMPS_
- Comma-separated start/end timestamps to process specific audio segments.
FAQ
What is the whisper command used for?
Whisper is OpenAI's automatic speech recognition (ASR) system. It transcribes audio in many languages and can translate to English. Model sizes trade accuracy for speed: tiny runs fastest, large is most accurate. The turbo model (default) offers a good balance, running ~8x faster than large with minor quality loss. The .en suffix (tiny.en, base.en) denotes English-only models, slightly better for English. The turbo model is not trained for translation tasks. Language detection is automatic but can be hinted. For non-English audio, specifying the language improves accuracy. Translation mode transcribes any language to English text. Output formats include plain text, subtitles (SRT, VTT), and JSON with timing data. Word-level timestamps enable karaoke-style highlighting. Processing uses GPU (CUDA) when available, significantly faster than CPU. The --fp16 flag enables half-precision math on compatible GPUs. Audio preprocessing handles various formats via FFmpeg. Long files are processed in segments with context maintained across segments.
How do I run a basic whisper example?
Run `whisper [audio.mp3]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does --model _SIZE_ do in whisper?
Model size: tiny, base, small, medium, large, turbo (default: turbo). English-only variants: tiny.en, base.en, small.en, medium.en.