← 返回命令列表

Linux command

camelot 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

Extract tables from a PDF

camelot read -p [1] [document.pdf]

Extract tables and save as CSV

camelot read -p [1] [document.pdf] -o [output.csv]

Extract tables from multiple pages

camelot read -p [1,2,3] [document.pdf]

Extract using stream mode

camelot read -p [1] -flavor stream [document.pdf]

Extract with table area specification

camelot read -p [1] -T [50,700,500,100] [document.pdf]

Generate visual debugging report

camelot read -p [1] -plot text [document.pdf]

Export to multiple formats

camelot read -p [1] -f [json] [document.pdf]

说明

Camelot is a Python library and CLI tool for extracting tabular data from PDF files. It uses computer vision and lattice detection algorithms to identify tables and extract their contents into structured formats. Two extraction methods are available: lattice mode detects tables with visible borders by looking for intersecting lines, while stream mode finds tables based on whitespace patterns, suitable for borderless tables. The tool handles multi-page extraction, merged cells, and various output formats. Visual debugging helps understand how tables are detected and tune extraction parameters for difficult PDFs.

参数

read
Read tables from PDF file.
-p, --pages _pages_
Page numbers to process (e.g., "1", "1-5", "1,3,5").
-o, --output _file_
Output file path.
-f, --format _format_
Output format: csv, excel, html, json, markdown, sqlite.
-flavor _mode_
Extraction mode: lattice (bordered) or stream (borderless).
-T, --table-areas _coords_
Table boundaries as x1,y1,x2,y2.
-C, --columns _coords_
Column separators for stream mode.
-plot _type_
Generate debug plot: text, grid, contour, joint, line.
-compress
Compress output file.
-split
Split output into separate files per table.

FAQ

What is the camelot command used for?

Camelot is a Python library and CLI tool for extracting tabular data from PDF files. It uses computer vision and lattice detection algorithms to identify tables and extract their contents into structured formats. Two extraction methods are available: lattice mode detects tables with visible borders by looking for intersecting lines, while stream mode finds tables based on whitespace patterns, suitable for borderless tables. The tool handles multi-page extraction, merged cells, and various output formats. Visual debugging helps understand how tables are detected and tune extraction parameters for difficult PDFs.

How do I run a basic camelot example?

Run `camelot read -p [1] [document.pdf]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does read do in camelot?

Read tables from PDF file.