Linux command
dvc 命令
文本
复制后可按需替换文件名、目录或参数。
常用示例
Initialize DVC repository
dvc init
Track a data file or directory
dvc add [data/dataset.csv]
Push tracked data to remote storage
dvc push
Pull tracked data from remote storage
dvc pull
Reproduce a pipeline
dvc repro
Show pipeline DAG
dvc dag
Configure remote storage
dvc remote add -d [myremote] [s3://bucket/path]
Show differences in tracked data
dvc diff
说明
DVC (Data Version Control) is a version control system for machine learning projects. It tracks large files, datasets, and models alongside Git, without storing them in the Git repository. DVC stores file metadata (.dvc files) in Git while the actual data goes to configurable remote storage (S3, GCS, Azure, SSH, etc.). This enables versioning large files and sharing datasets across teams. The pipeline feature defines reproducible ML workflows, tracking dependencies and outputs for experiment management.
参数
- init
- Initialize DVC in a Git repository.
- add _FILE_
- Track a file or directory with DVC.
- push
- Upload tracked data to remote storage.
- pull
- Download tracked data from remote storage.
- repro
- Reproduce pipeline stages.
- diff
- Show changes in tracked data between commits.
- fetch
- Download tracked data from remote without checkout.
- checkout
- Checkout data files matching current .dvc files.
- gc
- Garbage-collect unused cache files.
- remote add _NAME_ _URL_
- Add remote storage.
- config _OPTION_ _VALUE_
- Get or set DVC configuration options.
- dag
- Visualize pipeline stages as a directed acyclic graph.
- destroy
- Remove all DVC files and directories from the project.
- --cd _dir_
- Change to directory before executing the command.
- -v, --verbose
- Increase output verbosity.
- -q, --quiet
- Suppress output.
- --version
- Show DVC version.
- -h, --help
- Display help information.
FAQ
What is the dvc command used for?
DVC (Data Version Control) is a version control system for machine learning projects. It tracks large files, datasets, and models alongside Git, without storing them in the Git repository. DVC stores file metadata (.dvc files) in Git while the actual data goes to configurable remote storage (S3, GCS, Azure, SSH, etc.). This enables versioning large files and sharing datasets across teams. The pipeline feature defines reproducible ML workflows, tracking dependencies and outputs for experiment management.
How do I run a basic dvc example?
Run `dvc init` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does init do in dvc?
Initialize DVC in a Git repository.