Hard Disk IDs in Linux
torriem at gmail.com
Sat Mar 16 09:09:44 MDT 2013
On 03/16/2013 03:04 AM, Dan Egli wrote:
>> For a home server I recommend RAID1 or RAID10 over RAID6.
> Really? I guess between RAID6 and RAID10 it's not much different, but what
> about someone who has say six or eight disks in the server? I'm curious why
> you'd still recommend RAID10? Hypothetically speaking, let's assume I
> wanted to have a server big enough to hold 1 year of downloaded data from
> the net, downloading at approx 5Mbps (with TCP overhead, that comes to
> approx 1 MB every 2 seconds) 24/7/365. That's nearly 16 TB. A RAID6 could
> handle that with 6 drives, 4TB each. A raid10 would need 8 drives. I admit
> each is possible to throw into a full tower case, but why spend the extra
> money on two more drives, making the two raid10s? I am genuinely curious.
First of all, what do you need all that space for? The kind of data you
store really dictates what kind of setup you need.
Disks are cheap and if the data is that important to you, then buying
twice as many disks as your capacity needs is really not that big of a
deal. The cost of 6 disks vs 8 disks is negligible, compared to the
peace of mind that the RAID-10 can bring you.
Furthermore, if the data is that important to you, you will need a
backup system, which at minimum is at least another set of disks, that
receive a full set of data periodically, and are stored off-line.
Secondly you'll need a very large power supply and a SATA expansion card
if you're really planning to stuff that many disks in a box. There's a
reason why people often buy a SAN array box with its own (redundant)
I'm unsure as to whether you are now talking about your own personal
server which will go into a house, or your work project.
For a home system, I think most people are served well by just two disks
in RAID-1 configuration, plus a set of backup disks. Very large sets of
data like photos, movies, and maybe MythTV recordings, don't need RAID
at all. They don't change often, so a really good backup is much more
important than RAID.
> Well, that's not really an issue because I finally realized I could break
> my boss down by using some basic math. I showed him using basic
> multiplication how long it would take to fill the 120TB array he wanted
> (more than eight years to reach 25% capacity) and he FINALLY agreed that we
> could do it much cheaper and easier by building a full tower PC and filling
> it with Hard Disk Drives. So we're going to order the parts soon. Thank
> goodness for that. I'm still not sure which chassis he wanted. I think he
> was thinking of going to a company like Aberdeen or someone. I have
> insufficent experience to state whether or not that was a good idea, but
> thankfully it's a moot point now. I imagine we can fit about 10 disks in a
> large case (I have to do some research on cases to find the one that will
> let us hold as many hard disks as we can), and make a raid out of them.
There are companies that make cases for disks. Poor-man's arrays. The
cases have lots of rails and a big power supply. Most of them then just
have e-SATA ports on the back that you can connect to a PC's eSATA
adapter cards (which you will need since PC's usually have 4 or less
SATA ports on the motherboard. Also if you use an eSATA adapter, then
the drives are hot-swappable (if not in use!).
Here're a couple of ideas:
For inter-box connections eSATA connectors are better than SATA because
the connector has a clip to keep it plugged in, whereas most SATA cables
are just held in with friction.
>> it for years on Solaris without issue). But I'm not sure of the status
>> of the zfs-on-linux project.
> So what would you use? Be aware that he's REALLY keen on using a file
> system that includes journaling and data-deduplication. I don't know how
> easy it's going to be to change his mind. It took near a week of arguments
> before I got him to abandon the rackmount server idea. I'm well aware of
> many of the advantages of file systems like Ext4 and JFS. But try
> convincing my boss on that. He's one of those people who hears about some
> new idea, likes it, and wants it implemented, despite not knowing how it
> works internally or what would be involved in the implementation.
I did enjoy ZFS features a lot and hope that Linux's home-grown BtrFS
will get stable and mature soon, since BtrFS will pretty much match ZFS
for features when it's done some day. Snapshots are the number one
feature of ZFS and BtrFS! For a file server serving thousands of users,
have very cheap snapshots allowing users to see their own files over the
last 7 days was really slick. I used to snapshot every night for the
last week, then every month for the last year. Because of the COW
nature of ZFS, these snapshots only cost in size the difference between
the snapshot and the current version of the file.
Anyway ZFS is not a journaling filesystem (neither is BtrFS). They
simply don't need journals. They are copy-on-write file systems, which
means they are always consistent and after a failure, all you can lose
are uncommitted blocks.
Deduplication is something you can do at a much higher level. For
example, a script could find identical files and replace them with hard
links. There was an experimental project I saw once called, "opendedup"
that was a FUSE filesystem that you could run any any underlying
filesystem and do block-level deduplication somewhere without having to
have a special physical file system.
Again, the file system you end up choosing is going to depend entirely
on exactly what he's using it for. For example, in a home server, if my
main storage needs were for MythTV, I would eschew any form of RAID and
keep my disks formatted as individual volumes to Ext4 because MythTV
treats all its storage as a big pool so there's no need to have one big
file system across all the devices.
>From what I can see so far, you really have 3 native linux choices:
Ext4, XFS, and BtrFS. Of those, XFS has been used on huge arrays for
many years. BtrFS might be stable enough for your use. Ext4 is very
stable and perfectly capable of being used on multi-terrabyte volumes.
None of them have built in deduplication.
More information about the PLUG