disk burn in testing

Mike Lovell mike at dev-zero.net
Wed Feb 10 11:31:11 MST 2010


Steven Alligood wrote:
> On 02/09/2010 07:52 PM, Mike Lovell wrote:
>> does anyone have good recommendations as to some tools or utilities to
>> use for exercising or burning in new hard disks? where i work, we buy *a
>> lot* of disks and currently use a utility called thrash [1]. we just
>> have it do a couple million random writes to the disk. but i could use
>> some other tools to test the disks in different ways as well to get a
>> better idea if the disk is going to hold up. i've thought about using
>> bonnie++ or iozone as well. what you any of you use, if anything? thx.
>>
>> mike
>>
>> [1] http://www.csc.liv.ac.uk/~greg/thrash/
>>
>>    
>
> Just exactly how many disks do you find that fail with that method, 
> and do you end up with less disks failing and needing replacement in 
> the first few months of production versus more disks failing at one 
> year, two years, etc?
>
> I guess I am asking why you bother to waste man hours thrashing the 
> poor disks and removing potential life from them rather than just 
> making sure they are all in good RAID sets and replacing them as they 
> fail (hot spares and man-hours to replace rather than test)?
>
> My company deploys more than 100 new drives per week, and the testing 
> alone would be much more time consuming to find the very few bad 
> drives in testing versus replacing them as they fail in those first 
> few weeks.  Add to that the fact that the testing may reduce the life 
> sufficiently that you have more failures at the one and two year 
> points, and it seems a waste to test like that.
>
> I am always open to better ways of doing things, so please, if you 
> find the thrashing helps, I would love to hear the results.
>
> -Steve

for one, these aren't going into RAID sets. they are used individually. 
the thought is that we do some burn in testing up front to get rid of 
the disks that would die soon after going into use. disk infant 
mortality. it is based on the idea the mortality rates of disks follow 
an bathtub curve. there are a lot that fail at the beginning, reduced 
numbers during most of the life of the disk, and then increasing failure 
rates near the end of the life span. we are wanting to do the burn in to 
get to the low point on that curve. but that does assume disks follow a 
bathtub curve which i don't know if ours actually follow that.

we also do more than a 100 disks per week. i don't have numbers on how 
many we replace due to them dying quickly cause i have been away from 
the actual deployment for a while. recently, the topic has come up of do 
we need to do more burn in testing on the disks. so i thought i would 
ask the group.

mike



More information about the PLUG mailing list