← 返回命令列表

Linux command

yapsnap 命令

文件

复制后可按需替换文件名、目录或参数。

常用示例

Transcribe

yapsnap [path/to/file.mp4]

Transcribe

yapsnap "[https://www.youtube.com/watch?v=...]"

Include timestamps

yapsnap [input.mp4] --timestamps

Write the transcript

yapsnap [input.mp4] -o [path/to/transcript.txt]

Preserve the downloaded audio

yapsnap "[url]" --keep-audio

Disable the audio speedup

yapsnap [input.mp4] --speed 1.0

Use a custom model directory

yapsnap [input.mp4] --model [path/to/model_dir]

说明

yapsnap is a single-command speech-to-text tool that transcribes audio from local files or remote video URLs into plaintext. It runs entirely on CPU using a streaming Zipformer transducer (Kroko English) and downloads the ~80 MB model once on first run, after which it operates fully offline with no API keys and no audio leaving the machine. URL inputs are fetched with yt-dlp and decoded with ffmpeg, so anything those tools can handle (YouTube, YouTube Shorts, X/Twitter, TikTok, Instagram Reels, direct media URLs) is fair game. Local files cover the common containers and codecs (mp3, mp4, m4a, wav, webm, mov, mkv, ...). By default the audio is sped up by 1.5x (pitch-preserved) before transcription, which keeps quality usable while shrinking wall-clock time well below real time on a laptop. Output is a single paragraph of plain text; --timestamps switches to a per-sentence layout.

参数

-o, --output _PATH_
Write the transcript to _PATH_ instead of the default ./transcripts/<input>_transcript.txt.
--timestamps
Emit one sentence per line prefixed with MM:SS instead of a single paragraph.
--speed _FACTOR_
Speed up audio by _FACTOR_ before transcription (default 1.5, pitch preserved). Set to 1.0 to disable.
--keep-audio
Keep the extracted audio file after transcribing a URL.
--model _DIR_
Override the model directory; falls back to the KROKO_MODEL environment variable, then to the cached default.
--help
Show the full option list and exit.

FAQ

What is the yapsnap command used for?

yapsnap is a single-command speech-to-text tool that transcribes audio from local files or remote video URLs into plaintext. It runs entirely on CPU using a streaming Zipformer transducer (Kroko English) and downloads the ~80 MB model once on first run, after which it operates fully offline with no API keys and no audio leaving the machine. URL inputs are fetched with yt-dlp and decoded with ffmpeg, so anything those tools can handle (YouTube, YouTube Shorts, X/Twitter, TikTok, Instagram Reels, direct media URLs) is fair game. Local files cover the common containers and codecs (mp3, mp4, m4a, wav, webm, mov, mkv, ...). By default the audio is sped up by 1.5x (pitch-preserved) before transcription, which keeps quality usable while shrinking wall-clock time well below real time on a laptop. Output is a single paragraph of plain text; --timestamps switches to a per-sentence layout.

How do I run a basic yapsnap example?

Run `yapsnap [path/to/file.mp4]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -o, --output _PATH_ do in yapsnap?

Write the transcript to _PATH_ instead of the default ./transcripts/<input>_transcript.txt.