File Compression methods

Daniel Fussell dfussell at byu.edu
Thu Oct 10 11:58:22 MDT 2013


On 10/10/2013 11:20 AM, Levi Pearson wrote:
> On Thu, Oct 10, 2013 at 10:53 AM, Rich <rich at dranek.com> wrote:
>> On Thu, Oct 10, 2013 at 10:48:09AM -0600, Rich wrote:
>>> On Thu, Oct 10, 2013 at 01:55:03PM +0530, Dan Egli wrote:
>>>> And a two
>>>> step process is unfortunately out of the question. The machines will only
>>>> have either 750GB or 1TB hdds, which obviously won't work for extracting
>>>> the tar to disk then extracting from the tar on disk. tar's extraction
>>>> process would run out of space before it finished.
>>
>> Whoops, I misunderstood what you meant, forget what I said about that
>> (except that it's still true, it just doesn't address your concern).
> Because tar and gzip can both take input from stdin and write output
> to stdout, you can compose them in such a way that they become,
> effectively, a single step. And then you can compose them with
> rsh/ssh/etc. in order to eliminate the entire intermediate file. So,
> the composition of an archiving codec, a compression codec, and a
> remote shell process is *effectively* a single-step image transfer
> system, at least as long as you choose codecs that are capable of
> operating in a chunked manner rather than requiring random access or
> the entire input/output at once.
>
> This idea is one of the pillars of the UNIX programming philosophy,
> and also is given a more general and rigorous treatment in the basis
> of functional programming.
>
>
Furthermore, if you are on a trusted network (and I'm assuming you are), 
you can skip the ssh pipe and use netcat (nc).  Just set one of the 
machines to listen, and the other to send and you're set.  If you are 
moving a ton of small files across the network, you'll find a tar over 
netcat stream is faster than using NFS because the stream is no longer 
suffering from frequent network round trips to stat/open/close files.

There's a few more modern variants of netcat, one of which will does 
multicast, which works best if you are imaging the machines all at the 
same time.  If not, the multicast would be harmful as most switches turn 
multicast into broadcast.  Higher end switches will do IGMP snooping and 
send multicast packets only to the ports that are participating.  If one 
of the machines fails in multicast though, you'll have to image it 
one-off style.

If this a one-time thing, the other guys are right on; just use an eSATA 
or USB3 drive (not USB1/2, the polling will kill you) and sneakernet 
it.  Then use rsync if things needs to be updated periodically.

Grazie,
;-Daniel Fussell




More information about the PLUG mailing list