Linux command
pup 命令
文件
涉及管道、覆盖或删除,执行前请先确认路径和参数。
常用示例
Filter elements by selector
cat [file.html] | pup '[selector]'
Extract text content
cat [file.html] | pup '[selector] text{}'
Extract an attribute value
cat [file.html] | pup '[selector] attr{href}'
Read from a file instead of stdin
pup -f [file.html] '[selector]'
Parse HTML fetched from a URL
curl -s [url] | pup '[selector]'
Output matching elements as JSON
cat [file.html] | pup '[selector] json{}'
Number the matching elements
cat [file.html] | pup -n '[selector]'
Pretty-print with 4-space indent and color
cat [file.html] | pup -c --indent 4 '[selector]'
Limit printed nesting depth
cat [file.html] | pup -l [2] '[selector]'
说明
pup is the HTML counterpart to jq — it reads an HTML document from stdin (or a file via `-f`), applies a CSS-style selector to filter elements, and optionally runs a display function (`text{}`, `attr{…}`, `json{}`, `slice{…}`) to project matches into the form you want. It is a single static Go binary with no runtime dependencies, which makes it ideal for scraping pipelines and Makefiles. Because it understands most of CSS3 (including common pseudo-classes), many scraping problems reduce to a single pipe: `curl | pup 'selector json{}' | jq`.
参数
- -f, --file _FILE_
- Read HTML from _FILE_ instead of stdin.
- -c, --color
- Colorize output.
- -p, --plain
- Do not HTML-escape the output.
- --pre
- Preserve whitespace (useful inside `<pre>`/`<code>`).
- -i, --indent _N_|_CHAR_
- Indent by _N_ spaces (or by the given character).
- -l, --limit _N_
- Limit output nesting depth to _N_ levels.
- -n, --number
- Print the number of matching elements instead of the elements themselves.
- --charset _ENCODING_
- Force input character encoding (default: auto-detect).
- -h, --help
- Show help.
- --version
- Show version.
FAQ
What is the pup command used for?
pup is the HTML counterpart to jq — it reads an HTML document from stdin (or a file via `-f`), applies a CSS-style selector to filter elements, and optionally runs a display function (`text{}`, `attr{…}`, `json{}`, `slice{…}`) to project matches into the form you want. It is a single static Go binary with no runtime dependencies, which makes it ideal for scraping pipelines and Makefiles. Because it understands most of CSS3 (including common pseudo-classes), many scraping problems reduce to a single pipe: `curl | pup 'selector json{}' | jq`.
How do I run a basic pup example?
Run `cat [file.html] | pup '[selector]'` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -f, --file _FILE_ do in pup?
Read HTML from _FILE_ instead of stdin.