← 返回命令列表

Linux command

pdftohtml 命令

文件

复制后可按需替换文件名、目录或参数。

常用示例

Convert

pdftohtml [path/to/file.pdf] [path/to/output_file.html]

Ignore images

pdftohtml -i [path/to/file.pdf] [path/to/output_file.html]

Example

pdftohtml -s [path/to/file.pdf] [path/to/output_file.html]

Example

pdftohtml -xml [path/to/file.pdf] [path/to/output_file.xml]

说明

pdftohtml converts PDF files to HTML, XML, or PNG format. Part of the poppler-utils package, it attempts to preserve the visual layout of PDF pages in the resulting HTML output. By default, it generates one HTML file per page plus a frameset index. The -s option creates a single file containing all pages. Images are extracted as separate PNG files unless -i is specified. The XML output mode provides structured data about text positioning and formatting, useful for further processing or text extraction.

参数

-i
Ignore images
-s
Generate single HTML file for all pages
-xml
Output as XML instead of HTML
-c
Generate complex output (more accurate layout)
-hidden
Force extraction of hidden text
-f _n_
First page to convert
-l _n_
Last page to convert
-zoom _factor_
Zoom factor (default: 1.5)
-noframes
Generate no frames (single page output)
-enc _encoding_
Output encoding (default: UTF-8)

FAQ

What is the pdftohtml command used for?

pdftohtml converts PDF files to HTML, XML, or PNG format. Part of the poppler-utils package, it attempts to preserve the visual layout of PDF pages in the resulting HTML output. By default, it generates one HTML file per page plus a frameset index. The -s option creates a single file containing all pages. Images are extracted as separate PNG files unless -i is specified. The XML output mode provides structured data about text positioning and formatting, useful for further processing or text extraction.

How do I run a basic pdftohtml example?

Run `pdftohtml [path/to/file.pdf] [path/to/output_file.html]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -i do in pdftohtml?

Ignore images