← 返回命令列表

Linux command

pdftotext 命令

文件

复制后可按需替换文件名、目录或参数。

常用示例

Extract text

pdftotext [file.pdf] -

Extract text

pdftotext [file.pdf] [output.txt]

Extract text preserving layout

pdftotext -layout [file.pdf] [output.txt]

Extract text from specific pages

pdftotext -f [1] -l [5] [file.pdf] [output.txt]

Extract raw text

pdftotext -raw [file.pdf] [output.txt]

Extract text from a password-protected PDF

pdftotext -upw [password] [file.pdf] [output.txt]

说明

pdftotext converts Portable Document Format (PDF) files to plain text. It extracts the text content from PDF documents while optionally attempting to preserve the visual layout of the original document. The program is part of the poppler-utils package (or xpdf-utils on older systems) and handles most PDF text extraction needs. It can process encrypted PDFs when provided with the appropriate password and supports various output encodings. Common use cases include making PDF content searchable, extracting text for further processing, creating accessible versions of documents, and feeding PDF content into text analysis pipelines.

参数

-f _number_
First page to convert (default: 1)
-l _number_
Last page to convert (default: last page)
-layout
Maintain original physical layout of the text
-simple
Simple one-column page layout
-table
Table mode, similar to layout but optimized for tables
-lineprinter
Line printer mode with fixed-pitch font metrics
-raw
Keep strings in content stream order
-fixed _number_
Assume fixed-pitch font with specified character width
-enc _encoding_
Output text encoding (Latin1, UTF-8, etc.)
-nopgbrk
Don't insert page breaks between pages
-opw _password_
Owner password for encrypted PDF
-upw _password_
User password for encrypted PDF
-q
Quiet mode, suppress messages and errors
-v
Print version information
-h
Print usage information

FAQ

What is the pdftotext command used for?

pdftotext converts Portable Document Format (PDF) files to plain text. It extracts the text content from PDF documents while optionally attempting to preserve the visual layout of the original document. The program is part of the poppler-utils package (or xpdf-utils on older systems) and handles most PDF text extraction needs. It can process encrypted PDFs when provided with the appropriate password and supports various output encodings. Common use cases include making PDF content searchable, extracting text for further processing, creating accessible versions of documents, and feeding PDF content into text analysis pipelines.

How do I run a basic pdftotext example?

Run `pdftotext [file.pdf] -` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -f _number_ do in pdftotext?

First page to convert (default: 1)