About hdd URE, I knew these
points:
- For
some reasons, when harddisk reading a sector that the FEC(Foward Error Correction data)
could not correct the errors on that sector, we encontered an
URE. - The rating we encountered an URE is very low, but
still exists. - When reconstructing a RAID 5 array,
sometimes it will happened and the reconstruction progress will
stop.
But I still have
some questions:
- If there
is a single disk, what will happened? Hardware/file system report an error and we lost a
file? Or we got the file with wrong data? - Will
rewrite some data to that URE sector could turn the sector become normal? Or must we use
some utilities provieded by HDD manufactory and remap another reserve
sector? - If it happened when we mirror/re-mirror a RAID
1/10 array, what will the RAID controller do? Stop the mirror progress? Or just copy the
uncorrect data to another
disk?
/>
Thanks for the answer, question 1&2 is solved.
But the 3rd question I mean if encountered URE
when converting a single HDD to RAID 1 array by add another new disk, or replacing a
failure disk in a RAID 1/10 array, there's no redundancy to correct the error. Will it
complete the mirror/re-mirror progress with error data? OR stop the progress like RAID 5
recontruction?
- With a single disk, an unrecoverable error is just that -
it can't be done and it is reported to the filesystem and subsequently to the
application trying to read a file. Generally, it is preferred to get an explicit error
instead of unreliable data. - Writing to an unreadable
sector will either fix the physical sector (for a soft error when e.g. writing was
interrupted by power loss) or the drive will map the logical sector to one of its spare
pool. This happens at the discretion of the drive and isn't usually user/driver
selectable. - The RAID controller will most likely repair
the sector - either from the mirror or by rebuilding the data from the redundancy set.
When during a mirror or rebuild a(nother) read error prevents this repair, the error
sticks and the array is bad. Some RAID sets can repair multiple errors (RAID 6 or some
nested RAIDs) but once errors pile up you're out of
luck.
It's important to
make sure errors don't pile up on rarely used sectors - they can become uncorrectable
errors when sectors are left unread for months or even years. So, make sure you enable
data scrubbing, media patrol, patrol read or whatever it's called on your hardware to
check all data on a regular basis. That way, you ensure a rebuild works when you need
it.
Some people report that during a rebuild,
additional drives start to fail because of the stress but I've found this to be a myth.
The drives just stumble on stale, accumulated errors. You can stress even very old
drives for days without any problems.
Comments
Post a Comment