htmlq Command: Examples, Options, and Usage

常用示例

Extract elements by selector

cat [page.html] | htmlq [.class-name]

Get attribute value

htmlq -a href [a.link] < [page.html]

Get text content

htmlq -t [p] < [page.html]

Pretty print HTML

htmlq -p [body] < [page.html]

Remove nodes before extracting

htmlq --remove-nodes [.unwanted] [div.content] < [page.html]

From URL via curl

curl -s [url] | htmlq [selector]

说明

htmlq extracts data from HTML using CSS selectors. It's like jq for HTML, providing command-line HTML parsing. The tool reads HTML from stdin or files and outputs matching elements. It supports extracting text, attributes, and formatted HTML.

参数

-a, --attribute _ATTR_: Only return this attribute's value from selected elements.
-t, --text: Output only the text content of selected elements.
-p, --pretty: Pretty-print the serialised HTML output.
-b, --base _URL_: Use this URL as the base for relative links.
-B, --detect-base: Detect the base URL from the document's `<base>` tag.
-f, --filename _FILE_: Input file (defaults to stdin).
-o, --output _FILE_: Output file (defaults to stdout).
-r, --remove-nodes _SELECTOR_: Remove matching nodes before output. May be specified multiple times.
-w, --ignore-whitespace: When printing text nodes, ignore whitespace-only nodes.
-h, --help: Display help information.
-V, --version: Show version information.

FAQ

What is the htmlq command used for?

htmlq extracts data from HTML using CSS selectors. It's like jq for HTML, providing command-line HTML parsing. The tool reads HTML from stdin or files and outputs matching elements. It supports extracting text, attributes, and formatted HTML.

How do I run a basic htmlq example?

Run `cat [page.html] | htmlq [.class-name]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -a, --attribute _ATTR_ do in htmlq?

Only return this attribute's value from selected elements.

htmlq 命令

常用示例

说明

参数

FAQ

相关命令