Linux command
deepspeech 命令
文本
复制后可按需替换文件名、目录或参数。
常用示例
Transcribe an audio file
deepspeech --model [model.pbmm] --audio [audio.wav]
Transcribe with scorer
deepspeech --model [model.pbmm] --scorer [scorer.scorer] --audio [audio.wav]
Transcribe with extended output
deepspeech --model [model.pbmm] --audio [audio.wav] --extended
Transcribe using TFLite model
deepspeech --model [model.tflite] --audio [audio.wav]
Set beam width
deepspeech --model [model.pbmm] --audio [audio.wav] --beam_width [500]
说明
DeepSpeech is an open-source speech-to-text engine based on deep learning. It uses an end-to-end neural network architecture to convert audio into text transcriptions. The system requires a trained model and optionally an external scorer (language model) for improved accuracy. Pre-trained English models are available, and the toolkit supports training custom models for other languages or domains. Audio input must be 16kHz, 16-bit, mono WAV format. The tool supports both batch transcription of files and real-time streaming transcription through its API.
参数
- --model _file_
- Path to the model file (.pbmm or .tflite).
- --scorer _file_
- Path to external scorer/language model.
- --audio _file_
- Audio file to transcribe (16kHz, 16-bit, mono WAV).
- --extended
- Output word timing and confidence.
- --json
- Output results as JSON.
- --beam_width _n_
- Beam width for the CTC decoder.
- --lm_alpha _value_
- Language model weight. If not specified, uses default from the scorer package.
- --lm_beta _value_
- Word insertion bonus. If not specified, uses default from the scorer package.
- --candidate_transcripts _n_
- Number of candidate transcripts to include in JSON output (default: 3).
- --hot_words _words_
- Hot-words and their probability boosts.
- --version
- Print version and exit.
FAQ
What is the deepspeech command used for?
DeepSpeech is an open-source speech-to-text engine based on deep learning. It uses an end-to-end neural network architecture to convert audio into text transcriptions. The system requires a trained model and optionally an external scorer (language model) for improved accuracy. Pre-trained English models are available, and the toolkit supports training custom models for other languages or domains. Audio input must be 16kHz, 16-bit, mono WAV format. The tool supports both batch transcription of files and real-time streaming transcription through its API.
How do I run a basic deepspeech example?
Run `deepspeech --model [model.pbmm] --audio [audio.wav]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does --model _file_ do in deepspeech?
Path to the model file (.pbmm or .tflite).