← 返回命令列表

Linux command

csvclean 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

Check for rows with length mismatches

csvclean --length-mismatch [data.csv]

Report length mismatches

csvclean --length-mismatch --omit-error-rows [data.csv]

Report empty columns

csvclean --empty-columns [data.csv]

Enable all checks

csvclean -a [data.csv]

Fix short rows by joining

csvclean --join-short-rows [data.csv]

Fill short rows

csvclean --fill-short-rows --fillvalue "N/A" [data.csv]

Validate with custom delimiter

csvclean --length-mismatch -d "[;]" -e [latin1] [data.csv]

说明

csvclean is part of csvkit that validates and cleans CSV files. It detects common problems like inconsistent column counts, empty columns, and encoding issues. Since csvkit 2.0, csvclean no longer reports or fixes errors by default. You must explicitly enable checks (such as --length-mismatch or --empty-columns) or fixes (such as --join-short-rows or --fill-short-rows). Output is written to standard output and errors to standard error. The tool handles various CSV dialects and can work with files using different delimiters, quote characters, and encodings. It is essential for preprocessing messy data before analysis.

参数

--length-mismatch
Report rows that are shorter or longer than the header row.
--empty-columns
Report empty columns as errors.
-a, --enable-all-checks
Enable all error reporting checks.
--join-short-rows
Merge consecutive short rows into a single row.
--separator _SEPARATOR_
String used to join short rows (default: newline).
--fill-short-rows
Fill short rows with missing values.
--fillvalue _VALUE_
Value used to fill short rows (default: empty string).
--omit-error-rows
Exclude rows containing errors from standard output.
--label _LABEL_
Add a label column to error output for automated workflows.
--header-normalize-space
Strip leading/trailing whitespace and normalize whitespace in headers.
-d _CHAR_, --delimiter _CHAR_
Field delimiter (default: comma).
-t, --tabs
Use tabs as delimiter.
-q _CHAR_, --quotechar _CHAR_
Quote character (default: double quote).
-p _CHAR_, --escapechar _CHAR_
Escape character for the delimiter or quote character.
-e _ENCODING_, --encoding _ENCODING_
Input file encoding.
-S, --no-header-row
File has no header row.
-H
Omit the header row from output.
-K _N_, --skip-lines _N_
Skip the first N lines of the input file.
-v
Verbose error output.

FAQ

What is the csvclean command used for?

csvclean is part of csvkit that validates and cleans CSV files. It detects common problems like inconsistent column counts, empty columns, and encoding issues. Since csvkit 2.0, csvclean no longer reports or fixes errors by default. You must explicitly enable checks (such as --length-mismatch or --empty-columns) or fixes (such as --join-short-rows or --fill-short-rows). Output is written to standard output and errors to standard error. The tool handles various CSV dialects and can work with files using different delimiters, quote characters, and encodings. It is essential for preprocessing messy data before analysis.

How do I run a basic csvclean example?

Run `csvclean --length-mismatch [data.csv]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does --length-mismatch do in csvclean?

Report rows that are shorter or longer than the header row.