Linux command
textsnap 命令
文本
复制后可按需替换文件名、目录或参数。
常用示例
OCR a local image
textsnap [path/to/image.png]
OCR an image at a URL
textsnap [https://example.com/image.png]
OCR a webpage
textsnap [https://example.com/article]
OCR an image already on the clipboard
textsnap
Write output to a specific file
textsnap [image.png] -o [out.txt]
Strip markdown
textsnap [image.png] --plaintext
Cap the decoder
textsnap [image.png] --max-tokens 1024
Use a custom local model directory
textsnap [image.png] --model-dir [path/to/model]
Show progress diagnostics
textsnap -v [image.png]
说明
textsnap is a command-line OCR utility built around the PaddleOCR-VL-1.5 vision-language model exported to ONNX. It reads an image from a file path, a URL, or the system clipboard and writes the recognized text to a file inside ./textsnaps/, printing only the output path on stdout so it composes cleanly in shell pipelines. The model runs entirely on the CPU, no GPU and no cloud calls. Default output is Markdown so structure such as tables, headings, and lists is preserved; --plaintext flattens it for callers that only want raw text. Webpage URLs are rendered before OCR, which makes the tool usable as a "screenshot to text" pipeline for content that is hard to copy directly.
参数
- -o, --output _PATH_
- Write the OCR text to _PATH_. Default: ./textsnaps/_name_\_ocr.txt.
- -v, --verbose
- Print progress diagnostics to stderr.
- --plaintext
- Convert the default Markdown output into plain text (no tables, no headings).
- --model-dir _PATH_
- Use ONNX model files from _PATH_ instead of the cached download.
- --max-tokens _N_
- Cap the decoder at _N_ generated tokens (default 2048).
- --max-pixels _N_
- Limit the vision encoder pixel budget per image to _N_.
- --no-verify
- Skip SHA-256 verification of downloaded model files.
- --generate-checksums
- Re-download the model and rewrite the checksum manifest.
FAQ
What is the textsnap command used for?
textsnap is a command-line OCR utility built around the PaddleOCR-VL-1.5 vision-language model exported to ONNX. It reads an image from a file path, a URL, or the system clipboard and writes the recognized text to a file inside ./textsnaps/, printing only the output path on stdout so it composes cleanly in shell pipelines. The model runs entirely on the CPU, no GPU and no cloud calls. Default output is Markdown so structure such as tables, headings, and lists is preserved; --plaintext flattens it for callers that only want raw text. Webpage URLs are rendered before OCR, which makes the tool usable as a "screenshot to text" pipeline for content that is hard to copy directly.
How do I run a basic textsnap example?
Run `textsnap [path/to/image.png]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -o, --output _PATH_ do in textsnap?
Write the OCR text to _PATH_. Default: ./textsnaps/_name_\_ocr.txt.