camelot Command: Examples, Options, and Usage

常用示例

Extract tables from a PDF

camelot read -p [1] [document.pdf]

Extract tables and save as CSV

camelot read -p [1] [document.pdf] -o [output.csv]

Extract tables from multiple pages

camelot read -p [1,2,3] [document.pdf]

Extract using stream mode

camelot read -p [1] -flavor stream [document.pdf]

Extract with table area specification

camelot read -p [1] -T [50,700,500,100] [document.pdf]

Generate visual debugging report

camelot read -p [1] -plot text [document.pdf]

Export to multiple formats

camelot read -p [1] -f [json] [document.pdf]

说明

Camelot is a Python library and CLI tool for extracting tabular data from PDF files. It uses computer vision and lattice detection algorithms to identify tables and extract their contents into structured formats. Two extraction methods are available: lattice mode detects tables with visible borders by looking for intersecting lines, while stream mode finds tables based on whitespace patterns, suitable for borderless tables. The tool handles multi-page extraction, merged cells, and various output formats. Visual debugging helps understand how tables are detected and tune extraction parameters for difficult PDFs.

参数

read: Read tables from PDF file.
-p, --pages _pages_: Page numbers to process (e.g., "1", "1-5", "1,3,5").
-o, --output _file_: Output file path.
-f, --format _format_: Output format: csv, excel, html, json, markdown, sqlite.
-flavor _mode_: Extraction mode: lattice (bordered) or stream (borderless).
-T, --table-areas _coords_: Table boundaries as x1,y1,x2,y2.
-C, --columns _coords_: Column separators for stream mode.
-plot _type_: Generate debug plot: text, grid, contour, joint, line.
-compress: Compress output file.
-split: Split output into separate files per table.

FAQ

What is the camelot command used for?

Camelot is a Python library and CLI tool for extracting tabular data from PDF files. It uses computer vision and lattice detection algorithms to identify tables and extract their contents into structured formats. Two extraction methods are available: lattice mode detects tables with visible borders by looking for intersecting lines, while stream mode finds tables based on whitespace patterns, suitable for borderless tables. The tool handles multi-page extraction, merged cells, and various output formats. Visual debugging helps understand how tables are detected and tune extraction parameters for difficult PDFs.

How do I run a basic camelot example?

Run `camelot read -p [1] [document.pdf]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does read do in camelot?

Read tables from PDF file.

camelot 命令

常用示例

说明

参数

FAQ

相关命令