deepspeech Command: Examples, Options, and Usage

常用示例

Transcribe an audio file

deepspeech --model [model.pbmm] --audio [audio.wav]

Transcribe with scorer

deepspeech --model [model.pbmm] --scorer [scorer.scorer] --audio [audio.wav]

Transcribe with extended output

deepspeech --model [model.pbmm] --audio [audio.wav] --extended

Transcribe using TFLite model

deepspeech --model [model.tflite] --audio [audio.wav]

Set beam width

deepspeech --model [model.pbmm] --audio [audio.wav] --beam_width [500]

说明

DeepSpeech is an open-source speech-to-text engine based on deep learning. It uses an end-to-end neural network architecture to convert audio into text transcriptions. The system requires a trained model and optionally an external scorer (language model) for improved accuracy. Pre-trained English models are available, and the toolkit supports training custom models for other languages or domains. Audio input must be 16kHz, 16-bit, mono WAV format. The tool supports both batch transcription of files and real-time streaming transcription through its API.

参数

--model _file_: Path to the model file (.pbmm or .tflite).
--scorer _file_: Path to external scorer/language model.
--audio _file_: Audio file to transcribe (16kHz, 16-bit, mono WAV).
--extended: Output word timing and confidence.
--json: Output results as JSON.
--beam_width _n_: Beam width for the CTC decoder.
--lm_alpha _value_: Language model weight. If not specified, uses default from the scorer package.
--lm_beta _value_: Word insertion bonus. If not specified, uses default from the scorer package.
--candidate_transcripts _n_: Number of candidate transcripts to include in JSON output (default: 3).
--hot_words _words_: Hot-words and their probability boosts.
--version: Print version and exit.

FAQ

What is the deepspeech command used for?

DeepSpeech is an open-source speech-to-text engine based on deep learning. It uses an end-to-end neural network architecture to convert audio into text transcriptions. The system requires a trained model and optionally an external scorer (language model) for improved accuracy. Pre-trained English models are available, and the toolkit supports training custom models for other languages or domains. Audio input must be 16kHz, 16-bit, mono WAV format. The tool supports both batch transcription of files and real-time streaming transcription through its API.

How do I run a basic deepspeech example?

Run `deepspeech --model [model.pbmm] --audio [audio.wav]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does --model _file_ do in deepspeech?

Path to the model file (.pbmm or .tflite).

deepspeech 命令

常用示例

说明

参数

FAQ

相关命令