Linux command
htmlq 命令
文本
涉及管道、覆盖或删除,执行前请先确认路径和参数。
常用示例
Extract elements by selector
cat [page.html] | htmlq [.class-name]
Get attribute value
htmlq -a href [a.link] < [page.html]
Get text content
htmlq -t [p] < [page.html]
Pretty print HTML
htmlq -p [body] < [page.html]
Remove nodes before extracting
htmlq --remove-nodes [.unwanted] [div.content] < [page.html]
From URL via curl
curl -s [url] | htmlq [selector]
说明
htmlq extracts data from HTML using CSS selectors. It's like jq for HTML, providing command-line HTML parsing. The tool reads HTML from stdin or files and outputs matching elements. It supports extracting text, attributes, and formatted HTML.
参数
- -a, --attribute _ATTR_
- Only return this attribute's value from selected elements.
- -t, --text
- Output only the text content of selected elements.
- -p, --pretty
- Pretty-print the serialised HTML output.
- -b, --base _URL_
- Use this URL as the base for relative links.
- -B, --detect-base
- Detect the base URL from the document's `<base>` tag.
- -f, --filename _FILE_
- Input file (defaults to stdin).
- -o, --output _FILE_
- Output file (defaults to stdout).
- -r, --remove-nodes _SELECTOR_
- Remove matching nodes before output. May be specified multiple times.
- -w, --ignore-whitespace
- When printing text nodes, ignore whitespace-only nodes.
- -h, --help
- Display help information.
- -V, --version
- Show version information.
FAQ
What is the htmlq command used for?
htmlq extracts data from HTML using CSS selectors. It's like jq for HTML, providing command-line HTML parsing. The tool reads HTML from stdin or files and outputs matching elements. It supports extracting text, attributes, and formatted HTML.
How do I run a basic htmlq example?
Run `cat [page.html] | htmlq [.class-name]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -a, --attribute _ATTR_ do in htmlq?
Only return this attribute's value from selected elements.