Creating the Pool

You need a volume attached to the droplet; you can’t just rely on the size of the droplet since you need to reformat the block device. If you have DO format the block storage to ext4 during creation then you need to add the -f option below to force overwriting another filesystem.

This is for the sending computer

zpool create -o ashift=12 -O compression=lz4 -O encryption=on -O keyformat=passphrase tank /dev/sda

This is for the receiving computer

zpool create -o ashift=12 -m none tank /dev/sda

The receiving is less because zfs send will set all the requisite properties.

Options:

-o Sets pool properties

-O Sets root dataset properties.

Im not entirely sure what ashift does. I think it configures the block size (like how hardrives tend to have a 4k block size). Anyway, internet says to use ashift=12.

I want compression because it seems that PSTs have raw text which is highly compressible. As for the compression type, I tried lz4 and gzip (at level 6 out of 9). I went with lz4 because it compressed mostly as well but was like 10x faster.

The encryption is on because I want to use netcat instead of ssh to send the data. Reason being, netcat is like twice as fast.

Sending

Pregame

The first thing that needs to happen is making a snapshot. You only send snapshots not actual datasets.

zfs snapshot tank@first

Now you gotta get the amount of data you’re gonna hurl over the line.

zfs send -wnvP tank@first

-w: Send the raw blocks. If this wasn’t there it would unencrypt the data before sending. It would also uncompress the data too (unless -c was passed).

-nvP: The n is for --dry-run. The v is for verbose. The P changes “10G” to “10000000000”. All together it prints out how much data it would send over the line if I did zfs send for real.

Actually sending

Receiving side:

nc -l -p 6969 | zfs recv -s tank/data

This opens up port 6969 and listens. Anything netcat hears on that port is piped into zfs.

You might notice that I don’t use tank but instead opt for tank/data. This is because I zfs does not like changing the root dataset.

-s This flag is what allows resuming.

Sending side:

zfs send -w tank@first | pv -s 29257892360 | nc 143.198.188.138 6969

Launches all that zfs send data down to the specified ip address and port. The pv command gives a progress bar. Refer to pregame for the rest.

Resuming send:

On the receiving end use this to get the token.

zfs get receive_resume_token tank/data

Then copy and paste that token into this zfs send on the sending side to get the size.

zfs send -nPt <token>

Finally, use the same snippet of code as before on the receive end and send with this:

zfs send -t <token> | pv -s 21257438360 | nc 143.198.188.138 6969

You’ll notice that the only argument to this send is the token. That’s because it takes care of all the other command line arguments as well as the pool/dataset.

Performance

Tool Source Destination Bandwidth (MB/s)
rclone b2 droplet 90 (high of 120)
rclone b2 zfs (gzip-6) 35MB/s
cp droplet zfs (gzip-6) 28MB/s
cp droplet zfs (lz4) 321MB/s
cp droplet zfs (lz4 + encryption) 299MB/s
zfs send/recv (ssh) zfs (lz4) zfs 105MB/s
zfs send/recv (nc) zfs (lz4) zfs 210MB/s
  • It should be noted that zfs with gzip only uses 1 core for compression.

  • Gzip had a compression ratio of about 1.3 whereas lz4 had 1.2. I think lz4 should definitely be used here over gzip.

  • If you change a file just a little then try to zfs send again, it has to resend the whole thing.

Questions

  • How much compression before the speed gets too shitty?

  • How to snapshots work?

  • How do I send and receive?

  • How do I do this when shit fucks up real bad

  • Ctrl C in the middle

  • Sending/receiving Droplet dies

  • If I send Outlook.pst then copy a new Outlook.pst (slightly changed) does it have to resend the whole thing?

  • Does compression effect this?

  • Encryption?