Linux command
datamash 命令
文本
涉及管道、覆盖或删除,执行前请先确认路径和参数。
常用示例
Example
seq 3 | datamash max 1 min 1 mean 1 median 1
Example
echo -e '1.0\n2.5\n3.1' | tr '.' ',' | datamash mean 1
Example
echo -e '1\n2\n3' | datamash -R [decimals] mean 1
Example
echo -e '1\n2\nNa\n3\nNaN' | datamash --narm mean 1
说明
datamash performs basic numeric, textual, and statistical operations on input data from the command line. It's designed for quick data analysis tasks that would otherwise require scripting or statistical software, supporting operations like sum, mean, median, standard deviation, variance, and more. Input is read from stdin or files, with columns separated by whitespace or a specified delimiter. The tool can group data by fields and compute aggregate statistics for each group, similar to SQL's GROUP BY functionality. datamash is part of the GNU project and excels at one-liners for data exploration. It's commonly used in pipelines to analyze CSV files, log data, or any tabular text data. The tool can handle both numeric and textual operations, including counting unique values, string operations, and random sampling.
参数
- -R, --round _digits_
- Round numeric output to specified decimals
- --narm
- Ignore NA and NaN values
- -t _char_
- Use specified field separator
- -g, --group _fields_
- Group by specified fields
- -H, --headers
- First line is header
FAQ
What is the datamash command used for?
datamash performs basic numeric, textual, and statistical operations on input data from the command line. It's designed for quick data analysis tasks that would otherwise require scripting or statistical software, supporting operations like sum, mean, median, standard deviation, variance, and more. Input is read from stdin or files, with columns separated by whitespace or a specified delimiter. The tool can group data by fields and compute aggregate statistics for each group, similar to SQL's GROUP BY functionality. datamash is part of the GNU project and excels at one-liners for data exploration. It's commonly used in pipelines to analyze CSV files, log data, or any tabular text data. The tool can handle both numeric and textual operations, including counting unique values, string operations, and random sampling.
How do I run a basic datamash example?
Run `seq 3 | datamash max 1 min 1 mean 1 median 1` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -R, --round _digits_ do in datamash?
Round numeric output to specified decimals