Linux command
vcftools 命令
文件
复制后可按需替换文件名、目录或参数。
常用示例
Filter VCF file
vcftools --vcf [input.vcf] --chr [chr1] --recode --out [output]
Calculate allele frequency
vcftools --vcf [input.vcf] --freq --out [output]
Extract specific individuals
vcftools --vcf [input.vcf] --keep [individuals.txt] --recode --out [output]
Filter by minimum quality score
vcftools --vcf [input.vcf] --minQ [30] --recode --out [output]
Calculate depth statistics
vcftools --vcf [input.vcf] --depth --out [output]
Filter by minor allele frequency
vcftools --vcf [input.vcf] --maf [0.05] --recode --out [output]
Read compressed VCF
vcftools --gzvcf [input.vcf.gz] --freq --out [output]
说明
VCFtools is a suite of utilities for analyzing Variant Call Format (VCF) and Binary Call Format (BCF) files, the standard formats for storing genomic sequence variations. It provides comprehensive tools for filtering, manipulating, and computing statistics from variant data. The tool supports filtering variants by quality scores, allele frequencies, missing data, genomic regions, and individual samples. It calculates population genetics statistics including allele frequencies, nucleotide diversity, Fst, linkage disequilibrium, and relatedness measures. VCFtools can convert between formats, compare VCF files, and extract subsets of data for downstream analysis. Output files use the prefix specified by --out with appropriate extensions for each analysis type.
参数
- --vcf _file_
- Input VCF file (v4.0, v4.1, or v4.2).
- --gzvcf _file_
- Input compressed (gzipped) VCF file.
- --bcf _file_
- Input BCF2 format file.
- --out _prefix_
- Output file prefix. Results are written to prefix.extension.
- --recode
- Output a new VCF file after applying filters.
- --recode-INFO-all
- Retain all INFO fields in recoded output.
- --chr _name_
- Process only variants on specified chromosome.
- --keep _file_
- Retain only individuals listed in file (one ID per line).
- --remove _file_
- Remove individuals listed in file.
- --maf _float_
- Filter by minimum minor allele frequency.
- --minQ _int_
- Minimum variant quality score.
- --freq
- Calculate allele frequencies.
- --depth
- Calculate mean depth per individual.
- --relatedness
- Calculate pairwise relatedness statistics.
- --hap-r2
- Calculate linkage disequilibrium statistics using phased haplotypes.
FAQ
What is the vcftools command used for?
VCFtools is a suite of utilities for analyzing Variant Call Format (VCF) and Binary Call Format (BCF) files, the standard formats for storing genomic sequence variations. It provides comprehensive tools for filtering, manipulating, and computing statistics from variant data. The tool supports filtering variants by quality scores, allele frequencies, missing data, genomic regions, and individual samples. It calculates population genetics statistics including allele frequencies, nucleotide diversity, Fst, linkage disequilibrium, and relatedness measures. VCFtools can convert between formats, compare VCF files, and extract subsets of data for downstream analysis. Output files use the prefix specified by --out with appropriate extensions for each analysis type.
How do I run a basic vcftools example?
Run `vcftools --vcf [input.vcf] --chr [chr1] --recode --out [output]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does --vcf _file_ do in vcftools?
Input VCF file (v4.0, v4.1, or v4.2).