HW Raid monitoring

Nicholas Leippe nick at leippe.com
Mon Mar 18 08:59:57 MDT 2013

On Mon, Mar 18, 2013 at 12:05 AM, Dan Egli <ddavidegli at gmail.com> wrote:
> *All this discussion about raid levels and what not has brought to my mind
> a different, if related, question. One of the reasons I like software raid
> is that it's easy to monitor. For example, I could have a cron script that
> runs once every 15 minutes for example and checks the status of the
> /proc/mdstat file to ensure any raid(s) listed show status of Healthy. But
> how do you do something like that for a Hardware raid? How can you tell,
> for example, if drive #3 in a HW raid10 has failed? This is something I
> honestly don't know off my head. I know many of you folks have had
> experience with HW raid and device failures in that array. How do you know?
> There's no file you can check like mdstat is there? I'd think this would be
> especially important for remote hosted/co-located servers.*

IME it's vendor specific. Some of the cards I've used had their own
monitoring software. Others had a utility that you could use to query
and thus write your own monitoring plugin. Some had nothing--they
would just beep and then you'd have to use their access tool (a front
end to their bios software) and navigate their menus to figure it out
and deal with it--*possibly* could be automated via an expect script,
but not easily--navigating an ncurses-type interface.

mdadm has it's own monitoring daemon you can run also, rather than
polling the contents of /proc/mdstat yourself.

More information about the PLUG mailing list