← 返回命令列表

Linux command

deepspeech 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

Transcribe an audio file

deepspeech --model [model.pbmm] --audio [audio.wav]

Transcribe with scorer

deepspeech --model [model.pbmm] --scorer [scorer.scorer] --audio [audio.wav]

Transcribe with extended output

deepspeech --model [model.pbmm] --audio [audio.wav] --extended

Transcribe using TFLite model

deepspeech --model [model.tflite] --audio [audio.wav]

Set beam width

deepspeech --model [model.pbmm] --audio [audio.wav] --beam_width [500]

说明

DeepSpeech is an open-source speech-to-text engine based on deep learning. It uses an end-to-end neural network architecture to convert audio into text transcriptions. The system requires a trained model and optionally an external scorer (language model) for improved accuracy. Pre-trained English models are available, and the toolkit supports training custom models for other languages or domains. Audio input must be 16kHz, 16-bit, mono WAV format. The tool supports both batch transcription of files and real-time streaming transcription through its API.

参数

--model _file_
Path to the model file (.pbmm or .tflite).
--scorer _file_
Path to external scorer/language model.
--audio _file_
Audio file to transcribe (16kHz, 16-bit, mono WAV).
--extended
Output word timing and confidence.
--json
Output results as JSON.
--beam_width _n_
Beam width for the CTC decoder.
--lm_alpha _value_
Language model weight. If not specified, uses default from the scorer package.
--lm_beta _value_
Word insertion bonus. If not specified, uses default from the scorer package.
--candidate_transcripts _n_
Number of candidate transcripts to include in JSON output (default: 3).
--hot_words _words_
Hot-words and their probability boosts.
--version
Print version and exit.

FAQ

What is the deepspeech command used for?

DeepSpeech is an open-source speech-to-text engine based on deep learning. It uses an end-to-end neural network architecture to convert audio into text transcriptions. The system requires a trained model and optionally an external scorer (language model) for improved accuracy. Pre-trained English models are available, and the toolkit supports training custom models for other languages or domains. Audio input must be 16kHz, 16-bit, mono WAV format. The tool supports both batch transcription of files and real-time streaming transcription through its API.

How do I run a basic deepspeech example?

Run `deepspeech --model [model.pbmm] --audio [audio.wav]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does --model _file_ do in deepspeech?

Path to the model file (.pbmm or .tflite).