tlaronde%polynum.com@localhost writes: >> I am using bup for this, and others use borgbackup. Surely there are >> others. > > Thanks! for the pointers to bup and borgbackup. I was precisely looking > for a deduplication storage or backup facility (in some sense, when you > can have text files, cvs diff like/ed scripts are a diff and history > facility; with binary data, you are out of luck). You are not at all out of luck with binary. bup does this and I think borg does too. The basic idea is a rolling hash, similar to rsync, where bytes are accumulated until some number of least significant bits are either all 0 (or all 1, I forget, doesn't matter). bup uses 13 bits by default and I use 16. Then those bytes from start to the byte that has 13 zeros are put into a blob and stored. In this way, large binary files that are mostly the same end up with mostly the same blobs. And, perhaps obvious, but this scheme leads to 100% deduplication of whole files that have not changed. And this is pretty common case. I know this sounds a bit crazy, and I may not have described it quite right, but it really works. On the server that handles my mail -- so a lot of coming and going -- there might be 12G of backed up data total and maybe 300M of new blobs every week. On a machine that just runs a server without a lot of data changing, it can be far less. The rolling checksum scheme really matters for VM images, and database storage. > FWIW, Plan9 has/had a WORM filesystem: Write Once Read Many, where > storage was made with deduplication of blocks, meaning one could > have too history of files, only saving the differences. Furthermore, > in such a system, an attack from ransomware would be useless: data > is never changed once written, just a new version added; this > protects from blunder deletions or malignity. Unfortunately, this > part of Plan9 did not find its way in the Unix world the same as other > bits of it did... To protect against ransomware there needs to NOT be an administrative interface to clean up old versions. And for long-term usability you do need such an interface. I don't really purge bup backups. Instead I just get new, bigger disks every few years and start over and set the old ones aside.
Attachment:
signature.asc
Description: PGP signature