← 返回命令列表

Linux command

htmlq 命令

文本

涉及管道、覆盖或删除,执行前请先确认路径和参数。

常用示例

Extract elements by selector

cat [page.html] | htmlq [.class-name]

Get attribute value

htmlq -a href [a.link] < [page.html]

Get text content

htmlq -t [p] < [page.html]

Pretty print HTML

htmlq -p [body] < [page.html]

Remove nodes before extracting

htmlq --remove-nodes [.unwanted] [div.content] < [page.html]

From URL via curl

curl -s [url] | htmlq [selector]

说明

htmlq extracts data from HTML using CSS selectors. It's like jq for HTML, providing command-line HTML parsing. The tool reads HTML from stdin or files and outputs matching elements. It supports extracting text, attributes, and formatted HTML.

参数

-a, --attribute _ATTR_
Only return this attribute's value from selected elements.
-t, --text
Output only the text content of selected elements.
-p, --pretty
Pretty-print the serialised HTML output.
-b, --base _URL_
Use this URL as the base for relative links.
-B, --detect-base
Detect the base URL from the document's `<base>` tag.
-f, --filename _FILE_
Input file (defaults to stdin).
-o, --output _FILE_
Output file (defaults to stdout).
-r, --remove-nodes _SELECTOR_
Remove matching nodes before output. May be specified multiple times.
-w, --ignore-whitespace
When printing text nodes, ignore whitespace-only nodes.
-h, --help
Display help information.
-V, --version
Show version information.

FAQ

What is the htmlq command used for?

htmlq extracts data from HTML using CSS selectors. It's like jq for HTML, providing command-line HTML parsing. The tool reads HTML from stdin or files and outputs matching elements. It supports extracting text, attributes, and formatted HTML.

How do I run a basic htmlq example?

Run `cat [page.html] | htmlq [.class-name]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -a, --attribute _ATTR_ do in htmlq?

Only return this attribute's value from selected elements.