If you're going to continue this approach, I would suggest ditching the stream io and using mmap() with memmove()/memcpy(). Block io will be faster than character io for this.