yapsnap Command: Examples, Options, and Usage

常用示例

Transcribe

yapsnap [path/to/file.mp4]

Transcribe

yapsnap "[https://www.youtube.com/watch?v=...]"

Include timestamps

yapsnap [input.mp4] --timestamps

Write the transcript

yapsnap [input.mp4] -o [path/to/transcript.txt]

Preserve the downloaded audio

yapsnap "[url]" --keep-audio

Disable the audio speedup

yapsnap [input.mp4] --speed 1.0

Use a custom model directory

yapsnap [input.mp4] --model [path/to/model_dir]

说明

yapsnap is a single-command speech-to-text tool that transcribes audio from local files or remote video URLs into plaintext. It runs entirely on CPU using a streaming Zipformer transducer (Kroko English) and downloads the ~80 MB model once on first run, after which it operates fully offline with no API keys and no audio leaving the machine. URL inputs are fetched with yt-dlp and decoded with ffmpeg, so anything those tools can handle (YouTube, YouTube Shorts, X/Twitter, TikTok, Instagram Reels, direct media URLs) is fair game. Local files cover the common containers and codecs (mp3, mp4, m4a, wav, webm, mov, mkv, ...). By default the audio is sped up by 1.5x (pitch-preserved) before transcription, which keeps quality usable while shrinking wall-clock time well below real time on a laptop. Output is a single paragraph of plain text; --timestamps switches to a per-sentence layout.

参数

-o, --output _PATH_: Write the transcript to _PATH_ instead of the default ./transcripts/<input>_transcript.txt.
--timestamps: Emit one sentence per line prefixed with MM:SS instead of a single paragraph.
--speed _FACTOR_: Speed up audio by _FACTOR_ before transcription (default 1.5, pitch preserved). Set to 1.0 to disable.
--keep-audio: Keep the extracted audio file after transcribing a URL.
--model _DIR_: Override the model directory; falls back to the KROKO_MODEL environment variable, then to the cached default.
--help: Show the full option list and exit.

FAQ

What is the yapsnap command used for?

yapsnap is a single-command speech-to-text tool that transcribes audio from local files or remote video URLs into plaintext. It runs entirely on CPU using a streaming Zipformer transducer (Kroko English) and downloads the ~80 MB model once on first run, after which it operates fully offline with no API keys and no audio leaving the machine. URL inputs are fetched with yt-dlp and decoded with ffmpeg, so anything those tools can handle (YouTube, YouTube Shorts, X/Twitter, TikTok, Instagram Reels, direct media URLs) is fair game. Local files cover the common containers and codecs (mp3, mp4, m4a, wav, webm, mov, mkv, ...). By default the audio is sped up by 1.5x (pitch-preserved) before transcription, which keeps quality usable while shrinking wall-clock time well below real time on a laptop. Output is a single paragraph of plain text; --timestamps switches to a per-sentence layout.

How do I run a basic yapsnap example?

Run `yapsnap [path/to/file.mp4]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -o, --output _PATH_ do in yapsnap?

Write the transcript to _PATH_ instead of the default ./transcripts/<input>_transcript.txt.

yapsnap 命令

常用示例

说明

参数

FAQ

相关命令