File Compression methods
dfussell at byu.edu
Thu Oct 10 11:58:22 MDT 2013
On 10/10/2013 11:20 AM, Levi Pearson wrote:
> On Thu, Oct 10, 2013 at 10:53 AM, Rich <rich at dranek.com> wrote:
>> On Thu, Oct 10, 2013 at 10:48:09AM -0600, Rich wrote:
>>> On Thu, Oct 10, 2013 at 01:55:03PM +0530, Dan Egli wrote:
>>>> And a two
>>>> step process is unfortunately out of the question. The machines will only
>>>> have either 750GB or 1TB hdds, which obviously won't work for extracting
>>>> the tar to disk then extracting from the tar on disk. tar's extraction
>>>> process would run out of space before it finished.
>> Whoops, I misunderstood what you meant, forget what I said about that
>> (except that it's still true, it just doesn't address your concern).
> Because tar and gzip can both take input from stdin and write output
> to stdout, you can compose them in such a way that they become,
> effectively, a single step. And then you can compose them with
> rsh/ssh/etc. in order to eliminate the entire intermediate file. So,
> the composition of an archiving codec, a compression codec, and a
> remote shell process is *effectively* a single-step image transfer
> system, at least as long as you choose codecs that are capable of
> operating in a chunked manner rather than requiring random access or
> the entire input/output at once.
> This idea is one of the pillars of the UNIX programming philosophy,
> and also is given a more general and rigorous treatment in the basis
> of functional programming.
Furthermore, if you are on a trusted network (and I'm assuming you are),
you can skip the ssh pipe and use netcat (nc). Just set one of the
machines to listen, and the other to send and you're set. If you are
moving a ton of small files across the network, you'll find a tar over
netcat stream is faster than using NFS because the stream is no longer
suffering from frequent network round trips to stat/open/close files.
There's a few more modern variants of netcat, one of which will does
multicast, which works best if you are imaging the machines all at the
same time. If not, the multicast would be harmful as most switches turn
multicast into broadcast. Higher end switches will do IGMP snooping and
send multicast packets only to the ports that are participating. If one
of the machines fails in multicast though, you'll have to image it
If this a one-time thing, the other guys are right on; just use an eSATA
or USB3 drive (not USB1/2, the polling will kill you) and sneakernet
it. Then use rsync if things needs to be updated periodically.
More information about the PLUG