I have a HP Server with SmartArray
P400 controller (incl. 256 MB Cache/Battery Backup) with a logicaldrive with replaced
failed physicaldrive that does not
rebuild.
This is how it looked when I
detected the error:
~#
/usr/sbin/hpacucli ctrl slot=0 show config
Smart Array P400 in Slot 0
(Embedded) (sn: XXXX)
array A (SATA, Unused Space: 0
MB)
logicaldrive 1 (698.6 GB, RAID 1, OK)
physicaldrive 1I:1:1
(port 1I:box 1:bay 1, SATA, 750 GB, OK)
physicaldrive 1I:1:2 (port
1I:box 1:bay 2, SATA, 750 GB, OK)
array B (SATA, Unused Space: 0
MB)
logicaldrive 2 (2.7 TB, RAID 5, Failed)
physicaldrive 1I:1:3
(port 1I:box 1:bay 3, SATA, 750 GB, OK)
physicaldrive 1I:1:4 (port 1I:box
1:bay 4, SATA, 750 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA,
750 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 750 GB,
Failed)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 750 GB,
OK)
unassigned
physicaldrive 2I:1:8 (port
2I:box 1:bay 8, SATA, 750 GB, OK)
~#
I thought that I had drive 2I:1:8
configured as a spare for Array A and Array B, but it seems this was not the case :-(. I
noticed the problem due to I/O errors on the host, even if only 1 physicaldrive of the
RAID5 is failed.
Does someone know why this
could happen? The logicaldrive should go into "Degraded" mode but still be fully
accessible from the host os!?
I first tried to
add the unassigned drive 2I:1:8 as a spare to logicaldrive 2, but this was not
possible:
~#
/usr/sbin/hpacucli ctrl slot=0 array B add spares=2I:1:8
Error: This
operation is not supported with the current configuration.
Use the "show"
command on devices to show additional details
about the
configuration.
~#
Interestingly it is possible to
add the unassigned drive to the first array without problems. I thought maybe the
controller put the array into "failed" state due to the missing spare and protects
failed arrays from modification. So I tried was to reenable the logicaldrive (to add the
spare afterwards):
~#
/usr/sbin/hpacucli ctrl slot=0 ld 2 modify reenable
Warning: Any previously
existing data on the logical drive may not
be valid or recoverable.
Continue? (y/n) y
Error: This operation is not supported with the
current configuration.
Use the "show" command on devices to show additional
details
about the configuration.
~#
But as you can see,
re-enabling the logicaldrive this was not
possible.
Now I replaced the failed drive by
hotswapping it with the unassigned drive. The status now looks like
this:
~# /usr/sbin/hpacucli ctrl
slot=0 show config
Smart Array P400 in Slot 0 (Embedded) (sn:
XXXX)
array A (SATA, Unused Space: 0 MB)
logicaldrive 1 (698.6 GB, RAID 1, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay
1, SATA, 750 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 750 GB,
OK)
array B (SATA, Unused Space: 0 MB)
logicaldrive 2
(2.7 TB, RAID 5, Failed)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 750
GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 750 GB,
OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 750 GB,
OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 750 GB,
OK)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 750 GB, OK)
~#
The logical drive is still not
accessible. Why is it not rebuilding?
What can I
do?
FYI, this is the configuration of my
controller:
~#
/usr/sbin/hpacucli ctrl slot=0 show
Smart Array P400 in Slot 0
(Embedded)
Bus Interface: PCI
Slot: 0
Serial Number:
XXXX
Cache Serial Number: XXXX
RAID 6 (ADG) Status:
Enabled
Controller Status: OK
Chassis Slot:
Hardware Revision: Rev E
Firmware Version: 5.22
Rebuild Priority:
Medium
Expand Priority: Medium
Surface Scan Delay: 15
secs
Surface Analysis Inconsistency Notification: Disabled
Raid1
Write Buffering: Disabled
Post Prompt Timeout: 0 secs
Cache Board
Present: True
Cache Status: OK
Accelerator Ratio: 25%
Read / 75% Write
Drive Write Cache: Disabled
Total Cache Size: 256
MB
No-Battery Write Cache: Disabled
Cache Backup Power Source:
Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status:
OK
SATA NCQ Supported: True
~#
Thanks for you help in
advance.
Answer
The answer is not pleasant. There's a high
probability that your array is in a "waiting for rebuild" state, where there's another
failing disk in the RAID5 array set that's preventing the recovery
from completing. href="https://serverfault.com/questions/339128/what-are-the-different-widely-used-raid-levels-and-when-should-i-consider-them">This
is why you should avoid RAID5 these days. It doesn't help that these are SATA
drives... The likelihood of problems is even higher. Try powering the system off
(letting the drives spin down) and powering back on. Follow the prompts at the BIOS
array screen and choose the F2
option to "reenable all logical
drives". This may kickstart the rebuild
process.
Otherwise, it's a rebuild/recovery with
new disks.
Comments
Post a Comment