Remove x number of lines from beginning of file

Nicholas Leippe nick at leippe.com
Fri Oct 26 10:40:35 MDT 2007


On Friday 26 October 2007, Steve wrote:
> I totally get what you mean about being IO bound, however as I said
> earlier, there is still CPU time involved, saving 1% may not sound
> like a lot but it could add up, and after the initial development cost
> is recouped that 1% becomes free.  However I believe that by not
> incurring the overhead of loading a general purpose utility your
> savings are much more than 1%.  That said my way using streams was
> certainly not ideal, because you do incur the whole stream overhead
> penalty, but I figured that was counterbalanced by the speed at which
> the development could proceed.

Okay, we need to set this straight. IO bound means there is a physical 
hardware limit being hit. Yes, the cpu portion of this process can be 
reduced, but doing so *does not* change the physical limits on IO. Thus all 
it would do is free up cpu for other tasks, not make this task any faster.

>
> Additionally, in what real world application of something like this,
> would you be able to use a file system hack like that, and/or change
> the file system all together?  If he's going to do that, why not
> upgrade the machine, throw in SATA go with a stripped RAID
> configuration and so forth.

Because they don't solve the same problem. One walks around the problem, the 
other just throws resources at it to reduce the wall-time consumed.

>
> Either way the file system hacking bit seems like a cool idea, but how
> does it quantifiably increase IO throughput?  Seems like adding more
> layers of stuff for the IO operation to pass through would actually
> slow things down a bit.

It depends on your application. If you don't actually need to *change* the 
file, just what applications *see*, then masking off the first x bytes of the 
file with a hack such that applications see it as if those bytes weren't 
there totally removes the problem--you no longer have to even do any io to 
alter the file.  This overhead would be very minimal--essentially just add an 
offset to the file pointer behind the scenes.  Compared to having to move GB 
of data, this takes no resources at all.

If what you really want to do is alter the file, then there really is no way 
around moving all the data with current file systems--but you can do so with 
a choice of methods, some slower, some faster.  Unless, you rewrote the 
filesystem to be a rope container, then it could be very trivial operation to 
slice off any subset of a file.  Maybe not a bad idea for some applications 
(databases come to mind).




More information about the PLUG mailing list