Building mpt-status for CentOS 5.2

At Vexed Digital, we recently migrated a large part of our internal infrastructure from a co-located server to a couple of machines that we lease from our hosting provider. These machines are used for things such as email, our website, FTP etc. and so we have RAID-1 (mirroring) with a hardware controller. Hardware RAID has always worked well for me, but we obviously want to be sure that the underlying physical drives are OK and for that we need to install tools to interrogate the controller for status updates.

Using ‘lspci’ I found that the card in one of the machines is an “LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS” controller, but unfortunately could not find any vendor tools for it. Fortunately though, there is an open source tool developed by originally by Matt Braithwaite and now Roberto Nibali which you can find through Freshmeat.

If like us you run CentOS on your machines, you will find that it doesn’t compile straight away. First of all, you will need to install the kernel source which is easy enough to do if you follow the instructions on the CentOS website. Once you’ve done that you will find that you still can’t compile it unless you perform one more step.

First of all, you will need to edit the Makefile and alter the line for CFLAGS to add the ‘include’ directory of your kernel source tree, eg:

CFLAGS          := -Iincl -Wall -W -O2
-I${KERNEL_PATH}/include
-I${KERNEL_PATH}/drivers/message/fusion
-I${KERNEL_PATH_B}/drivers/message/fusion
-I${KERNEL_PATH_S}/drivers/message/fusion

Here I’ve added the second line “-I${KERNEL_PATH}/include ” so that it can find linux/compiler.h. You are now ready to compile, but make sure that you run ‘make’ with the path to your kernel source tree, eg:

make KERNEL_PATH=/home/ben/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.i686

You should then (almost immediately – it’s not large) get a single binary, mpt-status. Run this and it will tell you in a few lines the key information about the controller, virtual disc and each of the physical discs. It couldn’t find the controller where it expected it when I ran it, but unlike some tools it is actually quite helpful and told me to use the ‘-p’ option to probe the SCSI bus. This told me that the card was on SCSI ID 6, and it suggested the arguments ‘-i 6’ would give me the information I was looking for:

ioc0 vol_id 6 type IM, 2 phy, 231 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 7 ATA      GB0250C8045      HPG2, 232 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 8 ATA      GB0250C8045      HPG1, 232 GB, state ONLINE, flags NONE

The documents that come with the package give some information on what this means, but in the above case the first line shows that the virtual disc is in an OPTIMAL (ie. good) state and is 231GB with type IM which means mirrored (RAID-1). There are two discs, both are ONLINE (so OK).

Of course the next stage is getting the pertinent information into our nagios system so that we get told when they fail. Note I didn’t say if – drives always fail eventually…it’s just a question of how soon 🙂