Changelog

The changelog summarizes bugfixes that are deemed relevant for users and package maintainers. Developers should consult the git commit log or GitHub issue tracker.

Next Release

The plan for the next release is to revisit how zrepl does snapshot management. High-level goals:

  • Make it easy to decouple snapshot management (snapshotting, pruning) from replication.

  • Ability to include/exclude snapshots from replication. This is useful for aforementioned decoupling, e.g., separate snapshot prefixes for local & remote replication. Also, it makes explicit that by default, zrepl replicates all snapshots, and that replication has no concept of “zrepl-created snapshots”, which is a common misconception.

  • Use of zfs snapshot comma syntax or channel programs to take snapshots of multiple datasets atomically.

  • Provide an alternative to the grid pruning policy. Most likely something based on hourly/daily/weekly/monthly “trains” plus a count.

  • Ability to prune at the granularity of the group of snapshots created at a given time, as opposed to the individual snapshots within a dataset. Maybe this will be addressed by the alternative to the grid pruning policy, as it will likely be more predictable.

Those changes will likely come with some breakage in the config. However, I want to avoid breaking use cases that are satisfied by the current design. There will be beta/RC releases to give users a chance to evaluate.

0.6.1

  • [FEATURE] add metric to detect filesystems rules that don’t match any local dataset (thanks, @gmekicaxcient).

  • [BUG] zrepl status: hide progress bar once all filesystems reach terminal state (thanks, @0x3333).

  • [BUG] handling of tenative cursor presence if protection strategy doesn’t use it (issue #714).

  • [DOCS] address setup with two or more external disks (thanks, @se-jaeger).

  • [DOCS] document replication and conflict_resolution options (thanks, @InsanePrawn).

  • [DOCS] docs: talks: add note on keep_bookmarks option (thanks, @skirmess).

  • [MAINT] dist: add openrc service file (thanks, @gramosg).

  • [MAINT] grafana: update dashboard to Grafana 9.3.6.

  • [MAINT] run platform tests as part of CI.

  • [MAINT] build: upgrade to Go 1.21 and update golangci-lint; minimum Go version for builds is now 1.20

Note

zrepl is a spare-time project primarily developed by Christian Schwarz.
You can support maintenance and feature development through one of the following services:
Donate via Patreon Donate via GitHub Sponsors Donate via Liberapay Donate via PayPal
Note that PayPal processing fees are relatively high for small donations.
For SEPA wire transfer and commercial support, please contact Christian directly.

0.6

  • [FEATURE] Schedule-based snapshotting using cron syntax instead of an interval.

  • [FEATURE] Configurable initial replication policy. When a filesystem is first replicated to a receiver, this control whether just the newest snapshot will be replicated vs. all existing snapshots. Learn more in the docs.

  • [FEATURE] Configurable timestamp format for snapshot names via timestamp_format (Thanks, @ydylla).

  • [FEATURE] Add ZREPL_DESTROY_MAX_BATCH_SIZE env var (default 0=unlimited) (Thanks, @3nprob).

  • [FEATURE] Add zrepl configcheck --skip-cert-check flag (Thanks, @cole-h).

  • [BUG] Fix resuming from interrupted replications that use send.raw on unencrypted datasets.

    • The send options introduced in zrepl 0.4 allow users to specify additional zfs send flags for zrepl to use. Before this fix, when setting send.raw=true on a job that replicates unencrypted datasets, zrepl would not allow an interrupted replication to resume. The reason were overly cautious checks to support the send.encrypted option.

    • This bugfix removes these checks from the replication planner. This makes send.encrypted a sender-side-only concern, much like all other send.* flags.

    • However, this means that the zrepl status UI no longer indicates whether a replication step uses encrypted sends or not. The setting is still effective though.

  • [BREAK] convert Prometheus metric zrepl_version_daemon to zrepl_start_time metric

    • The metric still reports the zrepl version in a label. But the metric value is now the Unix timestamp at the time the daemon was started. The Grafana dashboard in dist/grafana has been updated.

  • [BUG] transient zrepl status error: Post "http://unix/status": EOF

  • [BUG] don’t treat receive-side bookmarks as a replication conflict. This facilitates chaining of replication jobs. See issue #490.

  • [BUG] workaround for Go/gRPC problem on Illumos where zrepl would crash when using the local transport type (issue #598).

  • [BUG] fix active child tasks panic that cold occur during replication plannig (issue #193abbe)

  • [BUG] zrepl status off-by-one error in display of completed step count (commit ce6701f)

  • [BUG] Allow using day & week units for snapshotting.interval (commit ffb1d89)

  • [DOCS] docs/overview improvements (Thanks, @jtagcat).

  • [MAINT] Update to Go 1.19.

0.5

  • [FEATURE] Bandwidth limiting (Thanks, Prominic.NET, Inc.)

  • [FEATURE] zrepl status: use a * to indicate which filesystem is currently replicating

  • [FEATURE] include daemon environment variables in zrepl status (currently only in --raw)

  • [BUG] fix encrypt-on-receive + placeholders use case (issue #504)

    • Before this fix, plain sends to a receiver with an encrypted root_fs could be received unencrypted if zrepl needed to create placeholders on the receiver.

    • Existing zrepl users should read the docs and check zfs get -r encryption,zrepl:placeholder PATH_TO_ROOTFS on the receiver.

    • Thanks to @mologie and @razielgn for reporting and testing!

  • [BUG] Rename mis-spelled send option embbeded_data to embedded_data.

  • [BUG] zrepl status: replication step numbers should start at 1

  • [BUG] incorrect bandwidth averaging in zrepl status.

  • [BUG] FreeBSD with OpenZFS 2.0: zrepl would wait indefinitely for zfs send to exit on timeouts.

  • [BUG] fix strconv.ParseInt: value out of range bug (and use the control RPCs).

  • [DOCS] improve description of multiple pruning rules.

  • [DOCS] document platform tests.

  • [DOCS] quickstart: make users aware that prune rules apply to all snapshots.

  • [MAINT] some platformtests were broken.

  • [MAINT] FreeBSD: release armv7 and arm64 binaries.

  • [MAINT] apt repo: update instructions due to apt-key deprecation.

Note to all users: please read up on the following OpenZFS bugs, as you might be affected:

Finally, I’d like to point you to the GitHub discussion about which bugfixes and features should be prioritized in zrepl 0.6 and beyond!

0.4.0

  • [FEATURE] support setting zfs send / recv flags in the config (send: -wLcepbS , recv: -ox ). Config docs here and here .

  • [FEATURE] parallel replication is now configurable (disabled by default, config docs here ).

  • [FEATURE] New zrepl status UI:

    • Interactive job selection.

    • Interactively zrepl signal jobs.

    • Filter filesystems in the job view by name.

    • An approximation of the old UI is still included as –mode legacy but will be removed in a future release of zrepl.

  • [BUG] Actually use concurrency when listing zrepl abstractions & doing size estimation. These operations were accidentally made sequential in zrepl 0.3.

  • [BUG] Job hang-up during second replication attempt.

  • [BUG] Data races conditions in the dataconn rpc stack.

  • [MAINT] Update to protobuf v1.25 and grpc 1.35.

For users who skipped the 0.3.1 update: please make sure your pruning grid config is correct. The following bugfix in 0.3.1 caused problems for some users:

  • [BUG] pruning: grid: add all snapshots that do not match the regex to the rule’s destroy list.

0.3.1

Mostly a bugfix release for zrepl 0.3.

  • [FEATURE] pruning: add optional regex field to last_n rule

  • [DOCS] pruning: grid : improve documentation and add an example

  • [BUG] pruning: grid: add all snapshots that do not match the regex to the rule’s destroy list. This brings the implementation in line with the docs.

  • [BUG] easyrsa script in docs

  • [BUG] platformtest: fix skipping encryption-only tests on systems that don’t support encryption

  • [BUG] replication: report AttemptDone if no filesystems are replicated

  • [FEATURE] status + replication: warning if replication succeeeded without any filesystem being replicated

  • [DOCS] update multi-job & multi-host setup section

  • RPM Packaging

  • CI infrastructure rework

  • Continuous deployment of that new stable branch to zrepl.github.io.

0.3

This is a big one! Headlining features:

  • Resumable Send & Recv Support No knobs required, automatically used where supported.

  • Encrypted Send & Recv Support for OpenZFS native encryption, configurable at the job level, i.e., for all filesystems a job is responsible for.

  • Replication Guarantees Automatic use of ZFS holds and bookmarks to protect a replicated filesystem from losing synchronization between sender and receiver. By default, zrepl guarantees that incremental replication will always be possible and interrupted steps will always be resumable.

Tip

We highly recommend studying the updated overview section of the configuration chapter to understand how replication works.

Tip

Go 1.15 changed the default TLS validation policy to require Subject Alternative Names (SAN) in certificates. The openssl commands we provided in the quick-start guides up to and including the zrepl 0.3 docs seem not to work properly. If you encounter certificate validation errors regarding SAN and wish to continue to use your old certificates, start the zrepl daemon with env var GODEBUG=x509ignoreCN=0. Alternatively, generate new certificates with SANs (see both options int the TLS transport docs ).

Quick-start guides:

Additional changelog:

  • [BREAK] Go 1.15 TLS changes mentioned above.

  • [BREAK] [CONFIG] more restrictive job names than in prior zrepl versions Starting with this version, job names are going to be embedded into ZFS holds and bookmark names (see this section for details). Therefore you might need to adjust your job names. Note that jobs cannot be renamed easily once you start using zrepl 0.3.

  • [BREAK] [MIGRATION] replication cursor representation changed

    • zrepl now manages the replication cursor bookmark per job-filesystem tuple instead of a single replication cursor per filesystem. In the future, this will permit multiple sending jobs to send from the same filesystems.

    • ZFS does not allow bookmark renaming, thus we cannot migrate the old replication cursors.

    • zrepl 0.3 will automatically create cursors in the new format for new replications, and warn if it still finds ones in the old format.

    • Run zrepl migrate replication-cursor:v1-v2 to safely destroy old-format cursors. The migration will ensure that only those old-format cursors are destroyed that have been superseeded by new-format cursors.

  • [FEATURE] New option listen_freebind (tcp, tls, prometheus listener)

  • [FEATURE] issue #341 Prometheus metric for failing replications + corresponding Grafana panel

  • [FEATURE] issue #265 transport/tcp: support for CIDR masks in client IP whitelist

  • [FEATURE] documented subcommand to generate bash and zsh completions

  • [FEATURE] issue #307 chrome://trace -compatible activity tracing of zrepl daemon activity

  • [FEATURE] logging: trace IDs for better log entry correlation with concurrent replication jobs

  • [FEATURE] experimental environment variable for parallel replication (see issue #306 )

  • [BUG] missing logger context vars in control connection handlers

  • [BUG] improved error messages on zfs send errors

  • [BUG] [DOCS] snapshotting: clarify sync-up behavior and warn about filesystems

  • [BUG] transport/ssh: do not leak zombie ssh process on connection failures that will not be snapshotted until the sync-up phase is over

  • [DOCS] Installation: FreeBSD jail with iocage

  • [DOCS] Document new replication features in the config overview and replication/design.md.

  • [MAINTAINER NOTICE] New platform tests in this version, please make sure you run them for your distro!

  • [MAINTAINER NOTICE] Please add the shell completions to the zrepl packages.

0.2.1

  • [FEATURE] Illumos (and Solaris) compatibility and binary builds (thanks, MNX.io )

  • [FEATURE] 32bit binaries for Linux and FreeBSD (untested, though)

  • [BUG] better error messages in ssh+stdinserver transport

  • [BUG] systemd + ssh+stdinserver: automatically create /var/run/zrepl/stdinserver

  • [BUG] crash if Prometheus listening socket cannot be opened

  • [MAINTAINER NOTICE] Makefile refactoring, see commit 080f2c0

0.2

  • [FEATURE] Pre- and Post-Snapshot Hooks with built-in support for MySQL and Postgres checkpointing as well as custom scripts (thanks, @overhacked!)

  • [FEATURE] Use zfs destroy pool/fs@snap1,snap2,... CLI feature if available

  • [FEATURE] Linux ARM64 Docker build support & binary builds

  • [FEATURE] zrepl status now displays snapshotting reports

  • [FEATURE] zrepl status --job <JOBNAME> filter flag

  • [BUG] i386 build

  • [BUG] early validation of host:port tuples in config

  • [BUG] zrepl status now supports TERM=screen (tmux on FreeBSD / FreeNAS)

  • [BUG] ignore connection reset by peer errors when shutting down connections

  • [BUG] correct error messages when receive-side pool or root_fs dataset is not imported

  • [BUG] fail fast for misconfigured local transport

  • [BUG] race condition in replication report generation would crash the daemon when running zrepl status

  • [BUG] rpc goroutine leak in push mode if zfs recv fails on the sink side

  • [MAINTAINER NOTICE] Go modules for dependency management both inside and outside of GOPATH (lazy.sh and Makefile force GO111MODULE=on)

  • [MAINTAINER NOTICE] make platformtest target to check zrepl’s ZFS abstractions (screen scraping, etc.). These tests only work on a system with ZFS installed, and must be run as root because they create a file-backed pool for each test case. The pool name zreplplatformtest is reserved for this use case. Only run make platformtest on test systems, e.g. a FreeBSD VM image.

0.1.1

  • [BUG] issue #162 commit d6304f4 : fix I/O timeout errors on variable receive rate

    • A significant reduction or sudden stall of the receive rate (e.g. recv pool has other I/O to do) would cause a writev I/O timeout error after approximately ten seconds.

0.1

This release is a milestone for zrepl and required significant refactoring if not rewrites of substantial parts of the application. It breaks both configuration and transport format, and thus requires manual intervention and updates on both sides of a replication setup.

Danger

The changes in the pruning system for this release require you to explicitly define keep rules: for any snapshot that you want to keep, at least one rule must match. This is different from previous releases where pruning only affected snapshots with the configured snapshotting prefix. Make sure that snapshots to be kept or ignored by zrepl are covered, e.g. by using the regex keep rule. Learn more in the config docs…

Notes to Package Maintainers

  • Notify users about config changes and migrations (see changes attributed with [BREAK] and [MIGRATION] below)

  • If the daemon crashes, the stack trace produced by the Go runtime and possibly diagnostic output of zrepl will be written to stderr. This behavior is independent from the stdout outlet type. Please make sure the stderr output of the daemon is captured somewhere. To conserve precious stack traces, make sure that multiple service restarts do not directly discard previous stderr output.

  • Make it obvious for users how to set the GOTRACEBACK environment variable to GOTRACEBACK=crash. This functionality will cause SIGABRT on panics and can be used to capture a coredump of the panicking process. To that extend, make sure that your package build system, your OS’s coredump collection and the Go delve debugger work together. Use your build system to package the Go program in this tutorial on Go coredumps and the delve debugger , and make sure the symbol resolution etc. work on coredumps captured from the binary produced by your build system. (Special focus on symbol stripping, etc.)

  • Consider using the zrepl configcheck subcommand in startup scripts to abort a restart that would fail due to an invalid config.

Changes

  • [BREAK] [MIGRATION] Placeholder property representation changed

    • The placeholder property now uses on|off as values instead of hashes of the dataset path. This permits renames of the sink filesystem without updating all placeholder properties.

    • Relevant for 0.0.X-0.1-rc* to 0.1 migrations

    • Make sure your config is valid with zrepl configcheck

    • Run zrepl migrate 0.0.X:0.1:placeholder

  • [FEATURE] issue #55 : Push replication (see push job and sink job)

  • [FEATURE] TCP Transport

  • [FEATURE] TCP + TLS client authentication transport

  • [FEATURE] issue #111: RPC protocol rewrite

    • [BREAK] Protocol breakage; Update and restart of all zrepl daemons is required.

    • Use gRPC for control RPCs and a custom protocol for bulk data transfer.

    • Automatic retries for network-temporary errors

      • Limited to errors during replication for this release. Addresses the common problem of ISP-forced reconnection at night, but will become way more useful with resumable send & recv support. Pruning errors are handled per FS, i.e., a prune RPC is attempted at least once per FS.

  • [FEATURE] Proper timeout handling for the SSH transport

    • [BREAK] Requires Go 1.11 or later.

  • [BREAK] [CONFIG]: mappings are no longer supported

    • Receiving sides (pull and sink job) specify a single root_fs. Received filesystems are then stored per client in ${root_fs}/${client_identity}. See Jobs & How They Work Together for details.

  • [FEATURE] [BREAK] [CONFIG] Manual snapshotting + triggering of replication

    • [FEATURE] issue #69: include manually created snapshots in replication

    • [CONFIG] manual and periodic snapshotting types

    • [FEATURE] zrepl signal wakeup JOB subcommand to trigger replication + pruning

    • [FEATURE] zrepl signal reset JOB subcommand to abort current replication + pruning

  • [FEATURE] [BREAK] [CONFIG] New pruning system

    • The active side of a replication (pull or push) decides what to prune for both sender and receiver. The RPC protocol is used to execute the destroy operations on the remote side.

    • New pruning policies (see configuration documentation )

      • The decision what snapshots shall be pruned is now made based on keep rules

      • [FEATURE] issue #68: keep rule not_replicated prevents divergence of sender and receiver

    • [FEATURE] [BREAK] Bookmark pruning is no longer necessary

      • Per filesystem, zrepl creates a single bookmark (#zrepl_replication_cursor) and moves it forward with the most recent successfully replicated snapshot on the receiving side.

      • Old bookmarks created by prior versions of zrepl (named like their corresponding snapshot) must be deleted manually.

      • [CONFIG] keep_bookmarks parameter of the grid keep rule has been removed

  • [FEATURE] zrepl status for live-updating replication progress (it’s really cool!)

  • [FEATURE] Snapshot- & pruning-only job type (for local snapshot management)

  • [FEATURE] issue #67: Expose Prometheus metrics via HTTP (config docs)

    • Compatible Grafana dashboard shipping in dist/grafana

  • [CONFIG] Logging outlet types must be specified using the type instead of outlet key

  • [BREAK] issue #53: CLI: zrepl control * subcommands have been made direct subcommands of zrepl *

  • [BUG] Goroutine leak on ssh transport connection timeouts

  • [BUG] issue #81 issue #77 : handle failed accepts correctly (source job)

  • [BUG] issue #100: fix incompatibility with ZoL 0.8

  • [FEATURE] issue #115: logging: configurable syslog facility

  • [FEATURE] Systemd unit file in dist/systemd

Previous Releases

Note

Due to limitations in our documentation system, we only show the changelog since the last release and the time this documentation is built. For the changelog of previous releases, use the version selection in the hosted version of these docs at zrepl.github.io.