hp - Why isn't my RAID array rebuilding?

itemprop="text">

Got a notice last night that a drive
failed on a server. Got in this morning to replace it, and we're getting the following.
Controller config report for the array looks fine, with the unusual status
Ready for Rebuild.

 ~ # hpacucli
            controller all show config
Smart Array P400i in Slot 0 (Embedded) (sn:
            XXXXXXXX )
 array A (SAS, Unused Space: 0 MB)
 logicaldrive 1 (341.7
            GB, RAID 5, Ready for Rebuild)
 physicaldrive 1I:1:1 (port 1I:box 1:bay 1,
            SAS, 72 GB, OK)
 physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB,
            OK)
 physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)

            physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK)
 physicaldrive
            2I:1:5 (port 2I:box 1:bay 5, SAS, 72 GB, OK)

 physicaldrive 2I:1:6
            (port 2I:box 1:bay 6, SAS, 72 GB,
            OK)

The logical drive
shows a hint, Parity Initialization Status: Initialization
Failed:

~ #
            hpacucli controller slot=0 logicaldrive 1 show 
Smart Array P400i in Slot 0
            (Embedded)
 array A
 Logical Drive: 1
 Size: 341.7
            GB

 Fault Tolerance: RAID 5
 Heads: 255

            Sectors Per Track: 32
 Cylinders: 65535
 Strip Size: 64
            KB
 Full Stripe Size: 320 KB
 Status: Ready for Rebuild

            Array Accelerator: Enabled
 Parity Initialization Status: Initialization
            Failed
 Unique Identifier: XXXXXXX

 Disk Name:
            /dev/cciss/c0d0
 Mount Points: /boot 191 MB, / 28.6 GB
 OS Status:
            LOCKED
 Logical Drive Label: XXXXX
            6797

Array
configuration if it helps:

 ~ #
            /usr/sbin/hpacucli ctrl slot=0 show
Smart Array P400i in Slot 0
            (Embedded)

 Bus Interface: PCI
 Slot: 0
 Serial
            Number: XXXXXXXX 
 Cache Serial Number: XXXXXXXX
 RAID 6 (ADG)
            Status: Enabled
 Controller Status: OK
 Hardware Revision:
            B
 Firmware Version: 1.18
 Rebuild Priority: Low
 Expand
            Priority: Low

 Surface Scan Delay: 15 secs
 Surface Scan
            Mode: Idle
 Post Prompt Timeout: 0 secs
 Cache Board Present:
            True
 Cache Status: OK
 Accelerator Ratio: 50% Read / 50%
            Write
 Drive Write Cache: Disabled
 Total Cache Size: 256
            MB
 Total Cache Memory Available: 208 MB
 No-Battery Write Cache:
            Disabled

 Cache Backup Power Source: Batteries

            Battery/Capacitor Count: 1
 Battery/Capacitor Status: OK
 SATA NCQ
            Supported: False

How
do I go about debugging
this?

Edit:

All
of the individual drives appear fine:

~ # hpacucli controller all show
            config detail | grep Status
 RAID 6 (ADG) Status: Enabled

            Controller Status: OK
 Cache Status: OK
 Battery/Capacitor Status:
            OK
 Status: OK
 Status: Ready for Rebuild
 Parity
            Initialization Status: Initialization Failed

 OS Status:
            LOCKED
 Status: OK
 Status: OK
 Status: OK

            Status: OK
 Status: OK
 Status:
            OK

edit2:

I'm
debugging some adverse interactions between hpaducli and grsec (also mp-SSH and Ubuntu)
but we do have hpacucli diag results available, and buried in the Logical Drive Status
Flags is Rebuild Aborted From Read Error. What confuses me here
is how a read error during rebuild does not result in marking one of the drives
predictive failure, or worse, but does cause a rebuild to stop.

Answer

Ready for Rebuild is a bad status if you're using a
parity RAID level, like 5 or 6. It means that you likely have read errors on another
drive in the array... e.g. another failing
drive.

If the system is still online your best
option is to recover data or rebuild. There's no good fix for this, and definitely not
much you can do to debug.

See the
following:

href="https://serverfault.com/questions/282282/force-lun-in-a-hp-smart-array-to-rebuild/282286#282286">Force
LUN in a HP Smart Array to
rebuild

href="https://serverfault.com/questions/523868/hp-proliant-ml350-g5-sas-hdd/524001#524001">HP
Proliant ML350 G5 SAS HDD

href="https://serverfault.com/questions/352502/hp-smartarray-p400-how-to-repair-failed-logical-drive/360190#360190">HP
SmartArray P400: How to repair failed logical
drive?

And of course: href="https://serverfault.com/questions/614523/raid-5-two-disks-failed-simultaneously/614534#614534">RAID-5:
Two disks failed simultaneously?

Blog

Search This Blog

hp - Why isn't my RAID array rebuilding?

Comments

Post a Comment

Popular posts from this blog

iLO 3 Firmware Update (HP Proliant DL380 G7)

linux - Awstats - outputting stats for merged Access_logs only producing stats for one server's log

hp proliant - Smart Array P822 with HBA Mode?