Linux command
xidel 命令
文件
复制后可按需替换文件名、目录或参数。
常用示例
Extract with XPath
xidel [file.html] -e "//title"
Extract from URL
xidel [https://example.com] -e "//h1"
CSS selector
xidel [file.html] --css "div.content"
Extract JSON
xidel [file.json] -e "$json/key"
Multiple extractions
xidel [file.html] -e "//title" -e "//h1"
Output JSON
xidel [file.html] -e "//a/@href" --output-format=json
Follow links
xidel [url] -f "//a/@href" -e "//title"
说明
xidel is a command-line tool for extracting and querying data from HTML, XML, and JSON documents. It supports multiple query languages including XPath, XQuery, and CSS selectors, making it versatile for a wide range of data extraction tasks from both local files and remote URLs. XPath and XQuery expressions allow precise navigation of document structure, while CSS selectors provide a familiar syntax for those accustomed to web development. For JSON documents, xidel uses a path-based syntax to navigate object hierarchies. Multiple extraction expressions can be combined in a single invocation for complex data gathering. The tool includes a link-following mode that enables web spidering, where xidel can traverse links on pages and apply extraction expressions to each visited page. Output can be formatted as plain text, JSON, or other structured formats, making it suitable for integration into data processing pipelines.
参数
- -e, --extract _EXPR_
- XPath/XQuery expression.
- --css _SELECTOR_
- CSS selector.
- -f, --follow _EXPR_
- Follow links.
- --output-format _FORMAT_
- Output format.
- --input-format _FORMAT_
- Input format.
- -s, --silent
- Suppress status.
- --user-agent _UA_
- User agent.
FAQ
What is the xidel command used for?
xidel is a command-line tool for extracting and querying data from HTML, XML, and JSON documents. It supports multiple query languages including XPath, XQuery, and CSS selectors, making it versatile for a wide range of data extraction tasks from both local files and remote URLs. XPath and XQuery expressions allow precise navigation of document structure, while CSS selectors provide a familiar syntax for those accustomed to web development. For JSON documents, xidel uses a path-based syntax to navigate object hierarchies. Multiple extraction expressions can be combined in a single invocation for complex data gathering. The tool includes a link-following mode that enables web spidering, where xidel can traverse links on pages and apply extraction expressions to each visited page. Output can be formatted as plain text, JSON, or other structured formats, making it suitable for integration into data processing pipelines.
How do I run a basic xidel example?
Run `xidel [file.html] -e "//title"` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -e, --extract _EXPR_ do in xidel?
XPath/XQuery expression.