File Compression methods

Dan Egli ddavidegli at gmail.com
Sat Oct 12 02:23:41 MDT 2013


On October 10, 2013, Nicholas Leippe wrote:



> Why are you cloning drives over GigE? And Why 600GB--that's a lot. I
should

> think most distros could install in under 20GB easily.



As to why cloning over GigE, that's the best method I know of to do what
this guy wants. Now if you know of a better way to clone the drive without
going to the hassle of installing the drive in one machine, duping over the
SATA bus, then removing the drive and placing in another machine (or using
one of those HDD duping machines) I'm all ears.



And I grant that 600GB is a lot. But that's not just the distribution. It's
the distro plus the data files plus the extra software that would be
involved. A lot of it would compress REALLY well though, which is why I was
thinking of using the archiver/compression idea.



> Anyways, there are two distinct issues here:

> - archive format

> - compression format



True, but zip and rar don't work because (as far as I know) neither can do
either of my two requirements, namely handling unix special files
(symlinks, devices, etc) and permissions, or compressing from/decompressing
to stdin/stdout. cpio, tar, and shar will handle the files and permissions,
but have zero compression.



Dedicated remote FS won't work properly in this case because while each
machine will start identical, they will slowly drift apart in operations. I
was planning on using something like NFS to store the archive that would be
decompressed/extracted to the local HDD, but I don't think that a pure NFS
setup would work.



Although, this does bring up a point for a personal project. I was thinking
I'd so something similar for my project, using this setup to deploy some
virtual HDD images for KVM or another hypervisor. But now that I think on
it, if it's possible to use copy-on-write with the hdd files, then I could
(potentially) store the original image on an NFS share, then symlink the
file to the hdd image that the hypervisor would use. Then the hypervisor
would just store changes in the COW file locally. Anyone know if that's a
viable idea? If so, that would save me some massive headaches on my
personal setup. :)



So, on the non-personal front, so far, it looks like I'm being pointed back
to bzip2? With all the compression engines out there these days, you'd
think that at least one tighter compression engine would support piping the
input/output to/from the program like compress, gzip, and bzip2 do. I'll
keep looking for a little while, but unless I find something better, I
guess I'll setup the system using bzip2.



--- Dan


On Thu, Oct 10, 2013 at 7:34 PM, Nicholas Leippe <nick at leippe.com> wrote:

> Why are you cloning drives over GigE? And why 600GB--that's a lot. I should
> think most distros could install in under 20GB easily.
>
> Anyways, there are two distinct issues here:
>
> - archive format
> - compression method
>
> Both can be solved independently.
>
> Some archive formats are:
> - zip
> - tar
> - cpio
> - rar
> - shar
>
> Some compression formats are:
> - compress
> - zip
> - rar
> - bzip2
> - gzip
>
> You might also consider simply doing rsync.
>
> Or get fancier: boot the first time with an md raid1, with the two mirrors
> being the local drive and a drbd of the remote drive set as mostly-write,
> with the bitmap option enabled. It will sync and track it's progress in the
> case it doesn't finish the first time the machine is turned on. Once it has
> finished, you could remove the drbd remote half out of it. This however
> would copy every block across the network.
>
> Or, just use a dedicated remote fs such as NFS.
>
> Or, clone the drives on a local machine using faster, native drive
> interfaces (SATA or a USB3/firewire enclosure) before deploying them--could
> be up to 3 times faster depending on your drive capabilities.
>
> /*
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
> */
>


More information about the PLUG mailing list