File Compression methods

Dan Egli ddavidegli at
Sat Oct 12 02:21:36 MDT 2013

On October 10, 2013, Lloyd Brown wrote:

> Note sure about the exact implementation, but gzip is fast/responsive

> enough to be put in a pipeline involving networks. It won't get as high

> of a compression ratio as bzip2, but it will still do fairly well, and

> be quite a lot faster than bzip2. A great deal will depend on how

> compressible you'd data is.

It's fairly compressible. But the extraction speed isn't the issue. It's
the network transfer speed that is the core of the issue. Heck, simple Unix
compress (the old .Z format) would be sufficient SPEED wise. But it's going
to produce one of the biggest archives out there. What I'm after is
reducing the amount of data that must be transferred though the network
chips. Once the data is transferred to local memory, we really don't care
how long it takes to decompress because it's still going to be tons faster
than transferring the entire image uncompressed.

As to your other items:

> - is it really necessary to send all that data to each device, or would

> you be well served to store it in a central storage location and access

> it over the network as needed.

See my last e-mail to Nicholas where I mention that very point. Basic
recap, yes I need to send all the data, as this doing the initial
population of files (including O/S) onto the hdd. Before this process, the
HDD will contain an ext4 formatted partition. Nothing else.

> - If you really do need to send the data out, how much variation is

> there from one destination to another? IF they're identical, or even

> similar, and your imaging them at the same time, maybe something based

> on multicast or peer-to-peer (e.g.. bittorrent-like) traffic patterns

> would be helpful.

An interesting idea. Got any sites I could check out? I have heard of
multi-cast, but that's about it. It certainly has potential for dealing
with multiple machines. Of course, I'd still want to compress the data
first, because I really don't like the idea of spending such a long time
transferring the data, regardless of if it's going to one machine or twenty
machines at the same time.

As far as bittorrent like, I just don't think that would work. My
understanding of Bittorrent is that it reads not only from the primary
source, but other sources running a bittorrent client as well. None of the
machines would be running bittorrent clients (or anything even remotely
similar). The initial boot is, as I mentioned, with a very minimalistic
image (just enough to connect to a network via nfs or similar, and then get
a bash prompt). Anything beyond that image has to be stored on the server.
And I don't think he'd go for the idea of peer-to-peer anyway, since I
don't think he plans on having the other machines run the client all the

> - How much chance would there be that the data is already on the

> device, and just might need updated?

Unfortunately, Zero. Rsync was one of my first thoughts, but that's not
going to save any time here. See above. Drive is basically blank before
this happens.

Thanks for the ideas though.

--- Dan

On Sat, Oct 12, 2013 at 1:50 PM, Dan Egli <ddavidegli at> wrote:

> On October 10, 2013, Daniel Fussel wrote:
> > Furthermore, if you are on a trusted network (and I'm assuming you are)
> you can skip the ssh pipe and use netcat (nc).
> > Just set one of the machines to listen, and the other to send, and
> you're set. If you are moving a ton of small files across
> > the network, you'll find a tar over netcat stream us faster than using
> NFS because the stream is no longer suffering from
> > frequent network round trips to stat/open/close files.
> I never got what was supposed to come before the Furthermore. So don't
> know what you said before, but as to this:
> That's kind of what I was after in the first place, before dealing with
> the compression. I hadn't heard of netcat, but the approach is nearly
> identical to what I was trying to do. What I wanted was (as I said in a
> previous message) was to shrink the amount of physical data sent across the
> network. That's what the compression/archiving was for. The NFS wasn't
> supposed to copy each file across. The idea was to either have something
> like a .tar.bz2 file but with better compression that gets opened from the
> remote NFS directory and extracted to the local HDD.
> > if this is a one-time thing, the other guys are right on; just use an
> eSATA or USB3 drive (not USB1/2, the polling will kill
> > you) and sneakernet it. Then use rsync if things need to be updated
> periodically.
> Sneakernet is exactly what I was trying to AVOID. The goal was to have
> something that I can boot say from a thumb drive, then run a script on the
> drive that does all the work for me. :)
> And while a multicast netcat type thing would work, I still need it to be
> compressed, hence my first question about what would be the best compressor
> that either supports piping or unix permissions and special files.
> Thanks though. I'd like to hear more about this multicasting netcat idea.
> --- Dan ;A� 1 o , 0� A� 't just about our laws; this is about who we are
> as a people. This is about whether we value one another, whether we embrace
> our differences rather than allowing them to become a source of animus."
> Earlier this year tow psychologists in Canada declared that pedophilia is
> a sexual orientation just like homosexuality or heterosexuality. Van
> Gijseghem, psychologist and retired professor of the University of
> Montreal, told members of Parliament, "Pedophiles are not simply people who
> commit a small offense from time to time but are grappling with what is
> equivalent to a sexual orientation just like another individual may be
> grappling with heterosexuality or even homosexuality."
> He went on to say, "True pedophiles have an exclusive preference for
> children, which is the same as having a sexual orientation. You cannot
> change this person's sexual orientation. He may, however, remain abstinent."
> When asked if he should be comparing pedophiles to homosexuals, Van
> Gijseghem replied, "If, for instance, you were living in a society where
> heterosexuality is proscribed or prohibited and you were told that you had
> to get therapy to change your sexual orientation, you would probably say
> that is slightly crazy. In other words, you would not accept that at all. I
> use this analogy to say that, yes indeed, pedophiles do not change their
> sexual orientation."
> Dr. Quinsey, professor emeritus of psychology at Queen's University in
> Kingston, Ontario, agreed with Van Gijseghem. Quiney said pedophiles'
> sexual interests prefer children and, "there is no evidence that this sort
> of preference can be changed through treatment or through anything else."
> In July, 2010, Harvard health Publications said, "Pedophilia is a sexual
> orientation and unlikely to change. Treatment aims to enable someone to
> resist acting on his sexual urges."
> Linda Harvey, of Mission America, said the push for pedophiles to have
> equal rights will become more and more common as LGBT groups continue to
> assert themselves. It's all a part of a plan to introduce sex to children
> at younger and younger ages; to convince them that normal friendship is
> actually a sexual attraction."
> Milton Diamond, a University of Hawaii professor and director of the
> Pacific Center for Sex and Society, stated that child pornography could be
> beneficial to society because, "Potential sex offenders use child
> pornography as a substitute for sex against children."
> Diamond is a distinguished lecturer for the Institute for the Advanced
> Study of Human Sexuality in San Francisco. The IASHS openly advocated for
> the repeal of the Revolutionary war ban on homosexual serving in the
> military.
> The IASHS lists, on it's web site, a list of "basic sexual rights" that
> includes "the right to engage in sexual acts or activities of any kind
> whatsoever, providing they do not involve nonconsensual acts, violence,
> constraint, coercion or fraud." Another right it to, "be free of
> persecution, condemnation, discrimination, or societal intervention in
> private sexual behavior" and "the freedom of any sexual thought, fantasy or
> desire." the organization also says that no one should be "disadvantaged
> because of age." Sex offender laws protecting children have been challenged
> in several states, including California, Georgia and Iowa. Sex offenders
> claim the laws prohibiting them from living near schools or parks are
> unfair because it penalizes them for life.
> While I understand pedophiles wanting to be recognized as just an
> alternate lifestyle, I'm not sure how I feel about that. On the one hand,
> it seems quite wrong to me. But that could just be my paranoia speaking. I
> see the laws being repealed and then child sex outlets (think prostitution
> establishments) cropping up here and there in the USA. But on the other
> hand, it's true that many people are being punished, not for what they HAVE
> done, but for what they MIGHT HAVE DONE, now OR IN THE FUTURE. That, I
> cannot agree with. You don't punish people for what MIGHT happen! But
> that's EXACTLY what our society is doing!
> Well, I'd be curious to hear your thoughts on the matter.
> Love you!
> --- Dan
> On Thu, Oct 10, 2013 at 10:50 PM, Levi Pearson <levipearson at>wrote:
>> On Thu, Oct 10, 2013 at 10:53 AM, Rich <rich at> wrote:
>> > On Thu, Oct 10, 2013 at 10:48:09AM -0600, Rich wrote:
>> >>
>> >> On Thu, Oct 10, 2013 at 01:55:03PM +0530, Dan Egli wrote:
>> >>>
>> >>> And a two
>> >>> step process is unfortunately out of the question. The machines will
>> only
>> >>> have either 750GB or 1TB hdds, which obviously won't work for
>> extracting
>> >>> the tar to disk then extracting from the tar on disk. tar's extraction
>> >>> process would run out of space before it finished.
>> >
>> >
>> > Whoops, I misunderstood what you meant, forget what I said about that
>> > (except that it's still true, it just doesn't address your concern).
>> Because tar and gzip can both take input from stdin and write output
>> to stdout, you can compose them in such a way that they become,
>> effectively, a single step. And then you can compose them with
>> rsh/ssh/etc. in order to eliminate the entire intermediate file. So,
>> the composition of an archiving codec, a compression codec, and a
>> remote shell process is *effectively* a single-step image transfer
>> system, at least as long as you choose codecs that are capable of
>> operating in a chunked manner rather than requiring random access or
>> the entire input/output at once.
>> This idea is one of the pillars of the UNIX programming philosophy,
>> and also is given a more general and rigorous treatment in the basis
>> of functional programming.
>> /*
>> PLUG:, #utah on
>> Unsubscribe:
>> Don't fear the penguin.
>> */

More information about the PLUG mailing list