Linux command
beautifulsoup 命令
文件
涉及管道、覆盖或删除,执行前请先确认路径和参数。
常用示例
Parse HTML and find all links
python3 -c "from bs4 import BeautifulSoup; import requests; print([a['href'] for a in BeautifulSoup(requests.get('[url]').text, 'html.parser').find_all('a', href=True)])"
Extract text from HTML file
python3 -c "from bs4 import BeautifulSoup; print(BeautifulSoup(open('[file.html]'), 'html.parser').get_text())"
Find elements by CSS class
python3 -c "from bs4 import BeautifulSoup; soup=BeautifulSoup(open('[file.html]'), 'html.parser'); print(soup.find_all(class_='[classname]'))"
Find element by ID
python3 -c "from bs4 import BeautifulSoup; soup=BeautifulSoup(open('[file.html]'), 'html.parser'); print(soup.find(id='[element_id]'))"
Select elements with CSS selector
python3 -c "from bs4 import BeautifulSoup; soup=BeautifulSoup(open('[file.html]'), 'html.parser'); print(soup.select('[div.class > p]'))"
Pretty print parsed HTML
python3 -c "from bs4 import BeautifulSoup; print(BeautifulSoup(open('[file.html]'), 'html.parser').prettify())"
说明
Beautiful Soup is a Python library for parsing HTML and XML documents. While not a command-line tool itself, it is commonly used in Python one-liners and scripts for web scraping, data extraction, and HTML manipulation. The library creates a parse tree from HTML documents, allowing navigation, search, and modification of the tree. It works with multiple parsers and handles malformed markup gracefully, making it ideal for scraping real-world websites. Beautiful Soup provides Pythonic idioms for navigating the parse tree, including iteration, attribute access, and CSS selector support. Combined with requests for HTTP, it forms the foundation of most Python web scraping workflows.
FAQ
What is the beautifulsoup command used for?
Beautiful Soup is a Python library for parsing HTML and XML documents. While not a command-line tool itself, it is commonly used in Python one-liners and scripts for web scraping, data extraction, and HTML manipulation. The library creates a parse tree from HTML documents, allowing navigation, search, and modification of the tree. It works with multiple parsers and handles malformed markup gracefully, making it ideal for scraping real-world websites. Beautiful Soup provides Pythonic idioms for navigating the parse tree, including iteration, attribute access, and CSS selector support. Combined with requests for HTTP, it forms the foundation of most Python web scraping workflows.
How do I run a basic beautifulsoup example?
Run `python3 -c "from bs4 import BeautifulSoup; import requests; print([a['href'] for a in BeautifulSoup(requests.get('[url]').text, 'html.parser').find_all('a', href=True)])"` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
Where can I find more beautifulsoup examples?
This page includes 6 examples for beautifulsoup, plus related commands for nearby Linux tasks.