Linux command
pdftotext 命令
文件
复制后可按需替换文件名、目录或参数。
常用示例
Extract text
pdftotext [file.pdf] -
Extract text
pdftotext [file.pdf] [output.txt]
Extract text preserving layout
pdftotext -layout [file.pdf] [output.txt]
Extract text from specific pages
pdftotext -f [1] -l [5] [file.pdf] [output.txt]
Extract raw text
pdftotext -raw [file.pdf] [output.txt]
Extract text from a password-protected PDF
pdftotext -upw [password] [file.pdf] [output.txt]
说明
pdftotext converts Portable Document Format (PDF) files to plain text. It extracts the text content from PDF documents while optionally attempting to preserve the visual layout of the original document. The program is part of the poppler-utils package (or xpdf-utils on older systems) and handles most PDF text extraction needs. It can process encrypted PDFs when provided with the appropriate password and supports various output encodings. Common use cases include making PDF content searchable, extracting text for further processing, creating accessible versions of documents, and feeding PDF content into text analysis pipelines.
参数
- -f _number_
- First page to convert (default: 1)
- -l _number_
- Last page to convert (default: last page)
- -layout
- Maintain original physical layout of the text
- -simple
- Simple one-column page layout
- -table
- Table mode, similar to layout but optimized for tables
- -lineprinter
- Line printer mode with fixed-pitch font metrics
- -raw
- Keep strings in content stream order
- -fixed _number_
- Assume fixed-pitch font with specified character width
- -enc _encoding_
- Output text encoding (Latin1, UTF-8, etc.)
- -nopgbrk
- Don't insert page breaks between pages
- -opw _password_
- Owner password for encrypted PDF
- -upw _password_
- User password for encrypted PDF
- -q
- Quiet mode, suppress messages and errors
- -v
- Print version information
- -h
- Print usage information
FAQ
What is the pdftotext command used for?
pdftotext converts Portable Document Format (PDF) files to plain text. It extracts the text content from PDF documents while optionally attempting to preserve the visual layout of the original document. The program is part of the poppler-utils package (or xpdf-utils on older systems) and handles most PDF text extraction needs. It can process encrypted PDFs when provided with the appropriate password and supports various output encodings. Common use cases include making PDF content searchable, extracting text for further processing, creating accessible versions of documents, and feeding PDF content into text analysis pipelines.
How do I run a basic pdftotext example?
Run `pdftotext [file.pdf] -` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -f _number_ do in pdftotext?
First page to convert (default: 1)