Linux command
yapsnap 命令
文件
复制后可按需替换文件名、目录或参数。
常用示例
Transcribe
yapsnap [path/to/file.mp4]
Transcribe
yapsnap "[https://www.youtube.com/watch?v=...]"
Include timestamps
yapsnap [input.mp4] --timestamps
Write the transcript
yapsnap [input.mp4] -o [path/to/transcript.txt]
Preserve the downloaded audio
yapsnap "[url]" --keep-audio
Disable the audio speedup
yapsnap [input.mp4] --speed 1.0
Use a custom model directory
yapsnap [input.mp4] --model [path/to/model_dir]
说明
yapsnap is a single-command speech-to-text tool that transcribes audio from local files or remote video URLs into plaintext. It runs entirely on CPU using a streaming Zipformer transducer (Kroko English) and downloads the ~80 MB model once on first run, after which it operates fully offline with no API keys and no audio leaving the machine. URL inputs are fetched with yt-dlp and decoded with ffmpeg, so anything those tools can handle (YouTube, YouTube Shorts, X/Twitter, TikTok, Instagram Reels, direct media URLs) is fair game. Local files cover the common containers and codecs (mp3, mp4, m4a, wav, webm, mov, mkv, ...). By default the audio is sped up by 1.5x (pitch-preserved) before transcription, which keeps quality usable while shrinking wall-clock time well below real time on a laptop. Output is a single paragraph of plain text; --timestamps switches to a per-sentence layout.
参数
- -o, --output _PATH_
- Write the transcript to _PATH_ instead of the default ./transcripts/<input>_transcript.txt.
- --timestamps
- Emit one sentence per line prefixed with MM:SS instead of a single paragraph.
- --speed _FACTOR_
- Speed up audio by _FACTOR_ before transcription (default 1.5, pitch preserved). Set to 1.0 to disable.
- --keep-audio
- Keep the extracted audio file after transcribing a URL.
- --model _DIR_
- Override the model directory; falls back to the KROKO_MODEL environment variable, then to the cached default.
- --help
- Show the full option list and exit.
FAQ
What is the yapsnap command used for?
yapsnap is a single-command speech-to-text tool that transcribes audio from local files or remote video URLs into plaintext. It runs entirely on CPU using a streaming Zipformer transducer (Kroko English) and downloads the ~80 MB model once on first run, after which it operates fully offline with no API keys and no audio leaving the machine. URL inputs are fetched with yt-dlp and decoded with ffmpeg, so anything those tools can handle (YouTube, YouTube Shorts, X/Twitter, TikTok, Instagram Reels, direct media URLs) is fair game. Local files cover the common containers and codecs (mp3, mp4, m4a, wav, webm, mov, mkv, ...). By default the audio is sped up by 1.5x (pitch-preserved) before transcription, which keeps quality usable while shrinking wall-clock time well below real time on a laptop. Output is a single paragraph of plain text; --timestamps switches to a per-sentence layout.
How do I run a basic yapsnap example?
Run `yapsnap [path/to/file.mp4]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -o, --output _PATH_ do in yapsnap?
Write the transcript to _PATH_ instead of the default ./transcripts/<input>_transcript.txt.