← 返回命令列表

Linux command

dvc 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

Initialize DVC repository

dvc init

Track a data file or directory

dvc add [data/dataset.csv]

Push tracked data to remote storage

dvc push

Pull tracked data from remote storage

dvc pull

Reproduce a pipeline

dvc repro

Show pipeline DAG

dvc dag

Configure remote storage

dvc remote add -d [myremote] [s3://bucket/path]

Show differences in tracked data

dvc diff

说明

DVC (Data Version Control) is a version control system for machine learning projects. It tracks large files, datasets, and models alongside Git, without storing them in the Git repository. DVC stores file metadata (.dvc files) in Git while the actual data goes to configurable remote storage (S3, GCS, Azure, SSH, etc.). This enables versioning large files and sharing datasets across teams. The pipeline feature defines reproducible ML workflows, tracking dependencies and outputs for experiment management.

参数

init
Initialize DVC in a Git repository.
add _FILE_
Track a file or directory with DVC.
push
Upload tracked data to remote storage.
pull
Download tracked data from remote storage.
repro
Reproduce pipeline stages.
diff
Show changes in tracked data between commits.
fetch
Download tracked data from remote without checkout.
checkout
Checkout data files matching current .dvc files.
gc
Garbage-collect unused cache files.
remote add _NAME_ _URL_
Add remote storage.
config _OPTION_ _VALUE_
Get or set DVC configuration options.
dag
Visualize pipeline stages as a directed acyclic graph.
destroy
Remove all DVC files and directories from the project.
--cd _dir_
Change to directory before executing the command.
-v, --verbose
Increase output verbosity.
-q, --quiet
Suppress output.
--version
Show DVC version.
-h, --help
Display help information.

FAQ

What is the dvc command used for?

DVC (Data Version Control) is a version control system for machine learning projects. It tracks large files, datasets, and models alongside Git, without storing them in the Git repository. DVC stores file metadata (.dvc files) in Git while the actual data goes to configurable remote storage (S3, GCS, Azure, SSH, etc.). This enables versioning large files and sharing datasets across teams. The pipeline feature defines reproducible ML workflows, tracking dependencies and outputs for experiment management.

How do I run a basic dvc example?

Run `dvc init` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does init do in dvc?

Initialize DVC in a Git repository.