disk burn in testing
Mike Lovell
mike at dev-zero.net
Wed Feb 10 11:31:11 MST 2010
Steven Alligood wrote:
> On 02/09/2010 07:52 PM, Mike Lovell wrote:
>> does anyone have good recommendations as to some tools or utilities to
>> use for exercising or burning in new hard disks? where i work, we buy *a
>> lot* of disks and currently use a utility called thrash [1]. we just
>> have it do a couple million random writes to the disk. but i could use
>> some other tools to test the disks in different ways as well to get a
>> better idea if the disk is going to hold up. i've thought about using
>> bonnie++ or iozone as well. what you any of you use, if anything? thx.
>>
>> mike
>>
>> [1] http://www.csc.liv.ac.uk/~greg/thrash/
>>
>>
>
> Just exactly how many disks do you find that fail with that method,
> and do you end up with less disks failing and needing replacement in
> the first few months of production versus more disks failing at one
> year, two years, etc?
>
> I guess I am asking why you bother to waste man hours thrashing the
> poor disks and removing potential life from them rather than just
> making sure they are all in good RAID sets and replacing them as they
> fail (hot spares and man-hours to replace rather than test)?
>
> My company deploys more than 100 new drives per week, and the testing
> alone would be much more time consuming to find the very few bad
> drives in testing versus replacing them as they fail in those first
> few weeks. Add to that the fact that the testing may reduce the life
> sufficiently that you have more failures at the one and two year
> points, and it seems a waste to test like that.
>
> I am always open to better ways of doing things, so please, if you
> find the thrashing helps, I would love to hear the results.
>
> -Steve
for one, these aren't going into RAID sets. they are used individually.
the thought is that we do some burn in testing up front to get rid of
the disks that would die soon after going into use. disk infant
mortality. it is based on the idea the mortality rates of disks follow
an bathtub curve. there are a lot that fail at the beginning, reduced
numbers during most of the life of the disk, and then increasing failure
rates near the end of the life span. we are wanting to do the burn in to
get to the low point on that curve. but that does assume disks follow a
bathtub curve which i don't know if ours actually follow that.
we also do more than a 100 disks per week. i don't have numbers on how
many we replace due to them dying quickly cause i have been away from
the actual deployment for a while. recently, the topic has come up of do
we need to do more burn in testing on the disks. so i thought i would
ask the group.
mike
More information about the PLUG
mailing list