← 返回命令列表

Linux command

pup 命令

文件

涉及管道、覆盖或删除,执行前请先确认路径和参数。

常用示例

Filter elements by selector

cat [file.html] | pup '[selector]'

Extract text content

cat [file.html] | pup '[selector] text{}'

Extract an attribute value

cat [file.html] | pup '[selector] attr{href}'

Read from a file instead of stdin

pup -f [file.html] '[selector]'

Parse HTML fetched from a URL

curl -s [url] | pup '[selector]'

Output matching elements as JSON

cat [file.html] | pup '[selector] json{}'

Number the matching elements

cat [file.html] | pup -n '[selector]'

Pretty-print with 4-space indent and color

cat [file.html] | pup -c --indent 4 '[selector]'

Limit printed nesting depth

cat [file.html] | pup -l [2] '[selector]'

说明

pup is the HTML counterpart to jq — it reads an HTML document from stdin (or a file via `-f`), applies a CSS-style selector to filter elements, and optionally runs a display function (`text{}`, `attr{…}`, `json{}`, `slice{…}`) to project matches into the form you want. It is a single static Go binary with no runtime dependencies, which makes it ideal for scraping pipelines and Makefiles. Because it understands most of CSS3 (including common pseudo-classes), many scraping problems reduce to a single pipe: `curl | pup 'selector json{}' | jq`.

参数

-f, --file _FILE_
Read HTML from _FILE_ instead of stdin.
-c, --color
Colorize output.
-p, --plain
Do not HTML-escape the output.
--pre
Preserve whitespace (useful inside `<pre>`/`<code>`).
-i, --indent _N_|_CHAR_
Indent by _N_ spaces (or by the given character).
-l, --limit _N_
Limit output nesting depth to _N_ levels.
-n, --number
Print the number of matching elements instead of the elements themselves.
--charset _ENCODING_
Force input character encoding (default: auto-detect).
-h, --help
Show help.
--version
Show version.

FAQ

What is the pup command used for?

pup is the HTML counterpart to jq — it reads an HTML document from stdin (or a file via `-f`), applies a CSS-style selector to filter elements, and optionally runs a display function (`text{}`, `attr{…}`, `json{}`, `slice{…}`) to project matches into the form you want. It is a single static Go binary with no runtime dependencies, which makes it ideal for scraping pipelines and Makefiles. Because it understands most of CSS3 (including common pseudo-classes), many scraping problems reduce to a single pipe: `curl | pup 'selector json{}' | jq`.

How do I run a basic pup example?

Run `cat [file.html] | pup '[selector]'` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -f, --file _FILE_ do in pup?

Read HTML from _FILE_ instead of stdin.