Changelog
The changelog summarizes bugfixes that are deemed relevant for users and package maintainers. Developers should consult the git commit log or GitHub issue tracker.
Next Release
The plan for the next release is to revisit how zrepl does snapshot management. High-level goals:
Make it easy to decouple snapshot management (snapshotting, pruning) from replication.
Ability to include/exclude snapshots from replication. This is useful for aforementioned decoupling, e.g., separate snapshot prefixes for local & remote replication. Also, it makes explicit that by default, zrepl replicates all snapshots, and that replication has no concept of “zrepl-created snapshots”, which is a common misconception.
Use of
zfs snapshot
comma syntax or channel programs to take snapshots of multiple datasets atomically.Provide an alternative to the
grid
pruning policy. Most likely something based on hourly/daily/weekly/monthly “trains” plus a count.Ability to prune at the granularity of the group of snapshots created at a given time, as opposed to the individual snapshots within a dataset. Maybe this will be addressed by the alternative to the
grid
pruning policy, as it will likely be more predictable.
Those changes will likely come with some breakage in the config. However, I want to avoid breaking use cases that are satisfied by the current design. There will be beta/RC releases to give users a chance to evaluate.
0.6.1
[FEATURE] add metric to detect filesystems rules that don’t match any local dataset (thanks, @gmekicaxcient).
[BUG]
zrepl status
: hide progress bar once all filesystems reach terminal state (thanks, @0x3333).[BUG] handling of tenative cursor presence if protection strategy doesn’t use it (issue #714).
[DOCS] address setup with two or more external disks (thanks, @se-jaeger).
[DOCS] document
replication
andconflict_resolution
options (thanks, @InsanePrawn).[DOCS] docs: talks: add note on keep_bookmarks option (thanks, @skirmess).
[MAINT] dist: add openrc service file (thanks, @gramosg).
[MAINT] grafana: update dashboard to Grafana 9.3.6.
[MAINT] run platform tests as part of CI.
[MAINT] build: upgrade to Go 1.21 and update golangci-lint; minimum Go version for builds is now 1.20
Note
0.6
[FEATURE] Schedule-based snapshotting using
cron
syntax instead of an interval.[FEATURE] Configurable initial replication policy. When a filesystem is first replicated to a receiver, this control whether just the newest snapshot will be replicated vs. all existing snapshots. Learn more in the docs.
[FEATURE] Configurable timestamp format for snapshot names via timestamp_format (Thanks, @ydylla).
[FEATURE] Add
ZREPL_DESTROY_MAX_BATCH_SIZE
env var (default 0=unlimited) (Thanks, @3nprob).[FEATURE] Add
zrepl configcheck --skip-cert-check
flag (Thanks, @cole-h).[BUG] Fix resuming from interrupted replications that use
send.raw
on unencrypted datasets.The send options introduced in zrepl 0.4 allow users to specify additional zfs send flags for zrepl to use. Before this fix, when setting
send.raw=true
on a job that replicates unencrypted datasets, zrepl would not allow an interrupted replication to resume. The reason were overly cautious checks to support thesend.encrypted
option.This bugfix removes these checks from the replication planner. This makes
send.encrypted
a sender-side-only concern, much like all othersend.*
flags.However, this means that the
zrepl status
UI no longer indicates whether a replication step uses encrypted sends or not. The setting is still effective though.
[BREAK] convert Prometheus metric
zrepl_version_daemon
tozrepl_start_time
metricThe metric still reports the zrepl version in a label. But the metric value is now the Unix timestamp at the time the daemon was started. The Grafana dashboard in dist/grafana has been updated.
[BUG] transient zrepl status error:
Post "http://unix/status": EOF
[BUG] don’t treat receive-side bookmarks as a replication conflict. This facilitates chaining of replication jobs. See issue #490.
[BUG] workaround for Go/gRPC problem on Illumos where zrepl would crash when using the
local
transport type (issue #598).[BUG] fix active child tasks panic that cold occur during replication plannig (issue #193abbe)
[BUG]
zrepl status
off-by-one error in display of completed step count (commit ce6701f)[BUG] Allow using day & week units for
snapshotting.interval
(commit ffb1d89)[DOCS]
docs/overview
improvements (Thanks, @jtagcat).[MAINT] Update to Go 1.19.
0.5
[FEATURE] Bandwidth limiting (Thanks, Prominic.NET, Inc.)
[FEATURE] zrepl status: use a
*
to indicate which filesystem is currently replicating[FEATURE] include daemon environment variables in zrepl status (currently only in
--raw
)[BUG] fix encrypt-on-receive + placeholders use case (issue #504)
Before this fix, plain sends to a receiver with an encrypted
root_fs
could be received unencrypted if zrepl needed to create placeholders on the receiver.Existing zrepl users should read the docs and check
zfs get -r encryption,zrepl:placeholder PATH_TO_ROOTFS
on the receiver.
[BUG] Rename mis-spelled send option
embbeded_data
toembedded_data
.[BUG] zrepl status: replication step numbers should start at 1
[BUG] incorrect bandwidth averaging in
zrepl status
.[BUG] FreeBSD with OpenZFS 2.0: zrepl would wait indefinitely for zfs send to exit on timeouts.
[BUG] fix
strconv.ParseInt: value out of range
bug (and use the control RPCs).[DOCS] improve description of multiple pruning rules.
[DOCS] document platform tests.
[DOCS] quickstart: make users aware that prune rules apply to all snapshots.
[MAINT] some platformtests were broken.
[MAINT] FreeBSD: release armv7 and arm64 binaries.
[MAINT] apt repo: update instructions due to
apt-key
deprecation.
Note to all users: please read up on the following OpenZFS bugs, as you might be affected:
Various bugs with encrypted send/recv (Leadership meeting notes)
Finally, I’d like to point you to the GitHub discussion about which bugfixes and features should be prioritized in zrepl 0.6 and beyond!
0.4.0
[FEATURE] support setting zfs send / recv flags in the config (send:
-wLcepbS
, recv:-ox
). Config docs here and here .[FEATURE] parallel replication is now configurable (disabled by default, config docs here ).
[FEATURE] New
zrepl status
UI:Interactive job selection.
Interactively
zrepl signal
jobs.Filter filesystems in the job view by name.
An approximation of the old UI is still included as –mode legacy but will be removed in a future release of zrepl.
[BUG] Actually use concurrency when listing zrepl abstractions & doing size estimation. These operations were accidentally made sequential in zrepl 0.3.
[BUG] Job hang-up during second replication attempt.
[BUG] Data races conditions in the dataconn rpc stack.
[MAINT] Update to protobuf v1.25 and grpc 1.35.
For users who skipped the 0.3.1 update: please make sure your pruning grid config is correct. The following bugfix in 0.3.1 caused problems for some users:
[BUG] pruning:
grid
: add all snapshots that do not match the regex to the rule’s destroy list.
0.3.1
Mostly a bugfix release for zrepl 0.3.
[FEATURE] pruning: add optional
regex
field tolast_n
rule[DOCS] pruning:
grid
: improve documentation and add an example[BUG] pruning:
grid
: add all snapshots that do not match the regex to the rule’s destroy list. This brings the implementation in line with the docs.[BUG]
easyrsa
script in docs[BUG] platformtest: fix skipping encryption-only tests on systems that don’t support encryption
[BUG] replication: report AttemptDone if no filesystems are replicated
[FEATURE] status + replication: warning if replication succeeeded without any filesystem being replicated
[DOCS] update multi-job & multi-host setup section
RPM Packaging
CI infrastructure rework
Continuous deployment of that new stable branch to zrepl.github.io.
0.3
This is a big one! Headlining features:
Resumable Send & Recv Support No knobs required, automatically used where supported.
Encrypted Send & Recv Support for OpenZFS native encryption, configurable at the job level, i.e., for all filesystems a job is responsible for.
Replication Guarantees Automatic use of ZFS holds and bookmarks to protect a replicated filesystem from losing synchronization between sender and receiver. By default, zrepl guarantees that incremental replication will always be possible and interrupted steps will always be resumable.
Tip
We highly recommend studying the updated overview section of the configuration chapter to understand how replication works.
Tip
Go 1.15 changed the default TLS validation policy to require Subject Alternative Names (SAN) in certificates.
The openssl commands we provided in the quick-start guides up to and including the zrepl 0.3 docs seem not to work properly.
If you encounter certificate validation errors regarding SAN and wish to continue to use your old certificates, start the zrepl daemon with env var GODEBUG=x509ignoreCN=0
.
Alternatively, generate new certificates with SANs (see both options int the TLS transport docs ).
Quick-start guides:
We have added another quick-start guide for a typical workstation use case for zrepl. Check it out to learn how you can use zrepl to back up your workstation’s OpenZFS natively-encrypted root filesystem to an external disk.
Additional changelog:
[BREAK] Go 1.15 TLS changes mentioned above.
[BREAK] [CONFIG] more restrictive job names than in prior zrepl versions Starting with this version, job names are going to be embedded into ZFS holds and bookmark names (see this section for details). Therefore you might need to adjust your job names. Note that jobs cannot be renamed easily once you start using zrepl 0.3.
[BREAK] [MIGRATION] replication cursor representation changed
zrepl now manages the replication cursor bookmark per job-filesystem tuple instead of a single replication cursor per filesystem. In the future, this will permit multiple sending jobs to send from the same filesystems.
ZFS does not allow bookmark renaming, thus we cannot migrate the old replication cursors.
zrepl 0.3 will automatically create cursors in the new format for new replications, and warn if it still finds ones in the old format.
Run
zrepl migrate replication-cursor:v1-v2
to safely destroy old-format cursors. The migration will ensure that only those old-format cursors are destroyed that have been superseeded by new-format cursors.
[FEATURE] New option
listen_freebind
(tcp, tls, prometheus listener)[FEATURE] issue #341 Prometheus metric for failing replications + corresponding Grafana panel
[FEATURE] issue #265 transport/tcp: support for CIDR masks in client IP whitelist
[FEATURE] documented subcommand to generate
bash
andzsh
completions[FEATURE] issue #307
chrome://trace
-compatible activity tracing of zrepl daemon activity[FEATURE] logging: trace IDs for better log entry correlation with concurrent replication jobs
[FEATURE] experimental environment variable for parallel replication (see issue #306 )
[BUG] missing logger context vars in control connection handlers
[BUG] improved error messages on
zfs send
errors[BUG] [DOCS] snapshotting: clarify sync-up behavior and warn about filesystems
[BUG] transport/ssh: do not leak zombie ssh process on connection failures that will not be snapshotted until the sync-up phase is over
[DOCS] Installation: FreeBSD jail with iocage
[DOCS] Document new replication features in the config overview and replication/design.md.
[MAINTAINER NOTICE] New platform tests in this version, please make sure you run them for your distro!
[MAINTAINER NOTICE] Please add the shell completions to the zrepl packages.
0.2.1
[FEATURE] Illumos (and Solaris) compatibility and binary builds (thanks, MNX.io )
[FEATURE] 32bit binaries for Linux and FreeBSD (untested, though)
[BUG] better error messages in
ssh+stdinserver
transport[BUG] systemd +
ssh+stdinserver
: automatically create/var/run/zrepl/stdinserver
[BUG] crash if Prometheus listening socket cannot be opened
[MAINTAINER NOTICE]
Makefile
refactoring, see commit 080f2c0
0.2
[FEATURE] Pre- and Post-Snapshot Hooks with built-in support for MySQL and Postgres checkpointing as well as custom scripts (thanks, @overhacked!)
[FEATURE] Use
zfs destroy pool/fs@snap1,snap2,...
CLI feature if available[FEATURE] Linux ARM64 Docker build support & binary builds
[FEATURE]
zrepl status
now displays snapshotting reports[FEATURE]
zrepl status --job <JOBNAME>
filter flag[BUG] i386 build
[BUG] early validation of host:port tuples in config
[BUG]
zrepl status
now supportsTERM=screen
(tmux on FreeBSD / FreeNAS)[BUG] ignore connection reset by peer errors when shutting down connections
[BUG] correct error messages when receive-side pool or
root_fs
dataset is not imported[BUG] fail fast for misconfigured local transport
[BUG] race condition in replication report generation would crash the daemon when running
zrepl status
[BUG] rpc goroutine leak in
push
mode if zfs recv fails on thesink
side[MAINTAINER NOTICE] Go modules for dependency management both inside and outside of GOPATH (
lazy.sh
andMakefile
forceGO111MODULE=on
)[MAINTAINER NOTICE]
make platformtest
target to check zrepl’s ZFS abstractions (screen scraping, etc.). These tests only work on a system with ZFS installed, and must be run as root because they create a file-backed pool for each test case. The pool namezreplplatformtest
is reserved for this use case. Only runmake platformtest
on test systems, e.g. a FreeBSD VM image.
0.1.1
[BUG] issue #162 commit d6304f4 : fix I/O timeout errors on variable receive rate
A significant reduction or sudden stall of the receive rate (e.g. recv pool has other I/O to do) would cause a
writev I/O timeout
error after approximately ten seconds.
0.1
This release is a milestone for zrepl and required significant refactoring if not rewrites of substantial parts of the application. It breaks both configuration and transport format, and thus requires manual intervention and updates on both sides of a replication setup.
Danger
The changes in the pruning system for this release require you to explicitly define keep rules:
for any snapshot that you want to keep, at least one rule must match.
This is different from previous releases where pruning only affected snapshots with the configured snapshotting prefix.
Make sure that snapshots to be kept or ignored by zrepl are covered, e.g. by using the regex
keep rule.
Learn more in the config docs…
Notes to Package Maintainers
Notify users about config changes and migrations (see changes attributed with [BREAK] and [MIGRATION] below)
If the daemon crashes, the stack trace produced by the Go runtime and possibly diagnostic output of zrepl will be written to stderr. This behavior is independent from the
stdout
outlet type. Please make sure the stderr output of the daemon is captured somewhere. To conserve precious stack traces, make sure that multiple service restarts do not directly discard previous stderr output.Make it obvious for users how to set the
GOTRACEBACK
environment variable toGOTRACEBACK=crash
. This functionality will cause SIGABRT on panics and can be used to capture a coredump of the panicking process. To that extend, make sure that your package build system, your OS’s coredump collection and the Go delve debugger work together. Use your build system to package the Go program in this tutorial on Go coredumps and the delve debugger , and make sure the symbol resolution etc. work on coredumps captured from the binary produced by your build system. (Special focus on symbol stripping, etc.)Consider using the
zrepl configcheck
subcommand in startup scripts to abort a restart that would fail due to an invalid config.
Changes
[BREAK] [MIGRATION] Placeholder property representation changed
The placeholder property now uses
on|off
as values instead of hashes of the dataset path. This permits renames of the sink filesystem without updating all placeholder properties.Relevant for 0.0.X-0.1-rc* to 0.1 migrations
Make sure your config is valid with
zrepl configcheck
Run
zrepl migrate 0.0.X:0.1:placeholder
[FEATURE] issue #55 : Push replication (see push job and sink job)
[FEATURE] TCP Transport
[FEATURE] issue #111: RPC protocol rewrite
[BREAK] Protocol breakage; Update and restart of all zrepl daemons is required.
Use gRPC for control RPCs and a custom protocol for bulk data transfer.
Automatic retries for network-temporary errors
Limited to errors during replication for this release. Addresses the common problem of ISP-forced reconnection at night, but will become way more useful with resumable send & recv support. Pruning errors are handled per FS, i.e., a prune RPC is attempted at least once per FS.
[FEATURE] Proper timeout handling for the SSH transport
[BREAK] Requires Go 1.11 or later.
[BREAK] [CONFIG]: mappings are no longer supported
Receiving sides (
pull
andsink
job) specify a singleroot_fs
. Received filesystems are then stored per client in${root_fs}/${client_identity}
. See Jobs & How They Work Together for details.
[FEATURE] [BREAK] [CONFIG] Manual snapshotting + triggering of replication
[FEATURE] issue #69: include manually created snapshots in replication
[CONFIG]
manual
andperiodic
snapshotting types[FEATURE]
zrepl signal wakeup JOB
subcommand to trigger replication + pruning[FEATURE]
zrepl signal reset JOB
subcommand to abort current replication + pruning
[FEATURE] [BREAK] [CONFIG] New pruning system
The active side of a replication (pull or push) decides what to prune for both sender and receiver. The RPC protocol is used to execute the destroy operations on the remote side.
New pruning policies (see configuration documentation )
The decision what snapshots shall be pruned is now made based on keep rules
[FEATURE] issue #68: keep rule
not_replicated
prevents divergence of sender and receiver
[FEATURE] [BREAK] Bookmark pruning is no longer necessary
Per filesystem, zrepl creates a single bookmark (
#zrepl_replication_cursor
) and moves it forward with the most recent successfully replicated snapshot on the receiving side.Old bookmarks created by prior versions of zrepl (named like their corresponding snapshot) must be deleted manually.
[CONFIG]
keep_bookmarks
parameter of thegrid
keep rule has been removed
[FEATURE]
zrepl status
for live-updating replication progress (it’s really cool!)[FEATURE] Snapshot- & pruning-only job type (for local snapshot management)
[FEATURE] issue #67: Expose Prometheus metrics via HTTP (config docs)
Compatible Grafana dashboard shipping in
dist/grafana
[CONFIG] Logging outlet types must be specified using the
type
instead ofoutlet
key[BREAK] issue #53: CLI:
zrepl control *
subcommands have been made direct subcommands ofzrepl *
[BUG] Goroutine leak on ssh transport connection timeouts
[BUG] issue #81 issue #77 : handle failed accepts correctly (
source
job)[BUG] issue #100: fix incompatibility with ZoL 0.8
[FEATURE] issue #115: logging: configurable syslog facility
[FEATURE] Systemd unit file in
dist/systemd
Previous Releases
Note
Due to limitations in our documentation system, we only show the changelog since the last release and the time this documentation is built. For the changelog of previous releases, use the version selection in the hosted version of these docs at zrepl.github.io.