Linux command
streambed 命令
网络
需要网络或远程资源。
常用示例
Sync
streambed sync --source-url=[postgres://user:pass@host:5432/db] --s3-bucket=[bucket] --s3-endpoint=[https://s3] --s3-prefix=[path] --query-addr=:5433
Backfill
streambed resync --source-url=[postgres://...] --s3-bucket=[bucket] --s3-prefix=[path]
Serve queries
streambed query --s3-bucket=[bucket] --s3-prefix=[path] --query-addr=:5433
Delete
streambed cleanup --s3-bucket=[bucket] --s3-prefix=[path] --table=[name]
Inspect
streambed sync --help
说明
streambed is a change data capture (CDC) tool written in Go. It tails the PostgreSQL write-ahead log using logical replication, writes the resulting changes as Parquet files to S3, and commits Apache Iceberg metadata so the data is queryable as a versioned analytical table. The `sync` subcommand runs as a long-lived daemon and can optionally expose a query endpoint compatible with the PostgreSQL wire protocol, so existing Postgres clients can read Iceberg tables without changes. `resync` performs a one-shot backfill under a consistent snapshot using `COPY`. `query` runs the wire-protocol server alone against tables that were already populated. `cleanup` deletes S3 objects and tracking state for a given table. Streambed targets the use case of offloading analytical workloads from a production Postgres instance to an Iceberg-on-S3 lakehouse while still letting tools that speak the Postgres protocol query the result.
参数
- sync
- Stream WAL changes, write Iceberg, optionally serve queries.
- resync
- One-shot backfill via `COPY` under a consistent snapshot.
- query
- Postgres-wire query server for existing Iceberg tables.
- cleanup
- Delete S3 objects and state for a table.
- --source-url _URL_
- Postgres connection string (or `STREAMBED_SOURCE_URL`).
- --s3-bucket _NAME_
- Destination S3 bucket (or `STREAMBED_S3_BUCKET`).
- --s3-endpoint _URL_
- S3 endpoint (use MinIO or other S3-compatible storage).
- --s3-prefix _PATH_
- Key prefix within the bucket.
- --query-addr _HOST:PORT_
- Bind address for the Postgres-wire query server.
FAQ
What is the streambed command used for?
streambed is a change data capture (CDC) tool written in Go. It tails the PostgreSQL write-ahead log using logical replication, writes the resulting changes as Parquet files to S3, and commits Apache Iceberg metadata so the data is queryable as a versioned analytical table. The `sync` subcommand runs as a long-lived daemon and can optionally expose a query endpoint compatible with the PostgreSQL wire protocol, so existing Postgres clients can read Iceberg tables without changes. `resync` performs a one-shot backfill under a consistent snapshot using `COPY`. `query` runs the wire-protocol server alone against tables that were already populated. `cleanup` deletes S3 objects and tracking state for a given table. Streambed targets the use case of offloading analytical workloads from a production Postgres instance to an Iceberg-on-S3 lakehouse while still letting tools that speak the Postgres protocol query the result.
How do I run a basic streambed example?
Run `streambed sync --source-url=[postgres://user:pass@host:5432/db] --s3-bucket=[bucket] --s3-endpoint=[https://s3] --s3-prefix=[path] --query-addr=:5433` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does sync do in streambed?
Stream WAL changes, write Iceberg, optionally serve queries.