dvc Command: Examples, Options, and Usage

常用示例

Initialize DVC repository

dvc init

Track a data file or directory

dvc add [data/dataset.csv]

Push tracked data to remote storage

dvc push

Pull tracked data from remote storage

dvc pull

Reproduce a pipeline

dvc repro

Show pipeline DAG

dvc dag

Configure remote storage

dvc remote add -d [myremote] [s3://bucket/path]

Show differences in tracked data

dvc diff

说明

DVC (Data Version Control) is a version control system for machine learning projects. It tracks large files, datasets, and models alongside Git, without storing them in the Git repository. DVC stores file metadata (.dvc files) in Git while the actual data goes to configurable remote storage (S3, GCS, Azure, SSH, etc.). This enables versioning large files and sharing datasets across teams. The pipeline feature defines reproducible ML workflows, tracking dependencies and outputs for experiment management.

参数

init: Initialize DVC in a Git repository.
add _FILE_: Track a file or directory with DVC.
push: Upload tracked data to remote storage.
pull: Download tracked data from remote storage.
repro: Reproduce pipeline stages.
diff: Show changes in tracked data between commits.
fetch: Download tracked data from remote without checkout.
checkout: Checkout data files matching current .dvc files.
gc: Garbage-collect unused cache files.
remote add _NAME_ _URL_: Add remote storage.
config _OPTION_ _VALUE_: Get or set DVC configuration options.
dag: Visualize pipeline stages as a directed acyclic graph.
destroy: Remove all DVC files and directories from the project.
--cd _dir_: Change to directory before executing the command.
-v, --verbose: Increase output verbosity.
-q, --quiet: Suppress output.
--version: Show DVC version.
-h, --help: Display help information.

FAQ

What is the dvc command used for?

DVC (Data Version Control) is a version control system for machine learning projects. It tracks large files, datasets, and models alongside Git, without storing them in the Git repository. DVC stores file metadata (.dvc files) in Git while the actual data goes to configurable remote storage (S3, GCS, Azure, SSH, etc.). This enables versioning large files and sharing datasets across teams. The pipeline feature defines reproducible ML workflows, tracking dependencies and outputs for experiment management.

How do I run a basic dvc example?

Run `dvc init` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does init do in dvc?

Initialize DVC in a Git repository.

dvc 命令

常用示例

说明

参数

FAQ

相关命令