Linux command
tabix 命令
文件
复制后可按需替换文件名、目录或参数。
常用示例
Index a VCF file
tabix -p vcf [file.vcf.gz]
Index a BED file
tabix -p bed [file.bed.gz]
Index a GFF file
tabix -p gff [file.gff.gz]
Query a region
tabix [file.vcf.gz] [chr1:1000000-2000000]
Query with header output
tabix -h [file.vcf.gz] [chr1:1000000-2000000]
List chromosomes in index
tabix -l [file.vcf.gz]
Query regions from file
tabix -R [regions.bed] [file.vcf.gz]
Create CSI index
tabix -C -p vcf [file.vcf.gz]
说明
tabix is a generic indexer for TAB-delimited genome position files. It creates an index that enables fast retrieval of data lines overlapping specified genomic regions. Input files must be position-sorted and compressed with bgzip. The index file (.tbi or .csi) enables random access to compressed data without decompressing the entire file. Common applications include indexing VCF variant files, BED annotation files, and GFF/GTF gene annotation files. The tool is essential for working with large genomic datasets in bioinformatics pipelines. Region queries use 1-based inclusive coordinates in the format chr:start-end.
参数
- -p, --preset _format_
- Input format preset: gff, bed, sam, vcf.
- -s, --sequence _col_
- Column of sequence name (default: 1).
- -b, --begin _col_
- Column of start position (default: 4).
- -e, --end _col_
- Column of end position (default: 5).
- -S, --skip-lines _n_
- Skip first n lines.
- -c, --comment _char_
- Skip lines starting with character (default: #).
- -0, --zero-based
- Positions are 0-based half-open.
- -C, --csi
- Create CSI index instead of TBI.
- -f, --force
- Overwrite existing index.
- -h, --print-header
- Print header lines with output.
- -H, --only-header
- Print only header/meta lines.
- -l, --list-chroms
- List sequence names stored in the index file.
- -r, --reheader _file_
- Replace the header with the content of file.
- -R, --regions _file_
- Query regions from BED or TAB-delimited file.
- -T, --targets _file_
- Similar to -R but reads input sequentially.
- -m, --min-shift _INT_
- Set minimal interval size for CSI indices to 2^INT (default: 14).
- -D
- Do not download index file before opening (remote files only).
- --separate-regions
- Insert region name before each group in output.
- --cache _INT_
- Set BGZF block cache size in megabytes (default: 10).
FAQ
What is the tabix command used for?
tabix is a generic indexer for TAB-delimited genome position files. It creates an index that enables fast retrieval of data lines overlapping specified genomic regions. Input files must be position-sorted and compressed with bgzip. The index file (.tbi or .csi) enables random access to compressed data without decompressing the entire file. Common applications include indexing VCF variant files, BED annotation files, and GFF/GTF gene annotation files. The tool is essential for working with large genomic datasets in bioinformatics pipelines. Region queries use 1-based inclusive coordinates in the format chr:start-end.
How do I run a basic tabix example?
Run `tabix -p vcf [file.vcf.gz]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -p, --preset _format_ do in tabix?
Input format preset: gff, bed, sam, vcf.