← 返回命令列表

Linux command

textsnap 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

OCR a local image

textsnap [path/to/image.png]

OCR an image at a URL

textsnap [https://example.com/image.png]

OCR a webpage

textsnap [https://example.com/article]

OCR an image already on the clipboard

textsnap

Write output to a specific file

textsnap [image.png] -o [out.txt]

Strip markdown

textsnap [image.png] --plaintext

Cap the decoder

textsnap [image.png] --max-tokens 1024

Use a custom local model directory

textsnap [image.png] --model-dir [path/to/model]

Show progress diagnostics

textsnap -v [image.png]

说明

textsnap is a command-line OCR utility built around the PaddleOCR-VL-1.5 vision-language model exported to ONNX. It reads an image from a file path, a URL, or the system clipboard and writes the recognized text to a file inside ./textsnaps/, printing only the output path on stdout so it composes cleanly in shell pipelines. The model runs entirely on the CPU, no GPU and no cloud calls. Default output is Markdown so structure such as tables, headings, and lists is preserved; --plaintext flattens it for callers that only want raw text. Webpage URLs are rendered before OCR, which makes the tool usable as a "screenshot to text" pipeline for content that is hard to copy directly.

参数

-o, --output _PATH_
Write the OCR text to _PATH_. Default: ./textsnaps/_name_\_ocr.txt.
-v, --verbose
Print progress diagnostics to stderr.
--plaintext
Convert the default Markdown output into plain text (no tables, no headings).
--model-dir _PATH_
Use ONNX model files from _PATH_ instead of the cached download.
--max-tokens _N_
Cap the decoder at _N_ generated tokens (default 2048).
--max-pixels _N_
Limit the vision encoder pixel budget per image to _N_.
--no-verify
Skip SHA-256 verification of downloaded model files.
--generate-checksums
Re-download the model and rewrite the checksum manifest.

FAQ

What is the textsnap command used for?

textsnap is a command-line OCR utility built around the PaddleOCR-VL-1.5 vision-language model exported to ONNX. It reads an image from a file path, a URL, or the system clipboard and writes the recognized text to a file inside ./textsnaps/, printing only the output path on stdout so it composes cleanly in shell pipelines. The model runs entirely on the CPU, no GPU and no cloud calls. Default output is Markdown so structure such as tables, headings, and lists is preserved; --plaintext flattens it for callers that only want raw text. Webpage URLs are rendered before OCR, which makes the tool usable as a "screenshot to text" pipeline for content that is hard to copy directly.

How do I run a basic textsnap example?

Run `textsnap [path/to/image.png]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -o, --output _PATH_ do in textsnap?

Write the OCR text to _PATH_. Default: ./textsnaps/_name_\_ocr.txt.