Navigatie overslaan.

raid

Recovering an inactive and dirty RAID

Last night my computer crashed, and today my RAID5 array wouldn't start. That was extra painfull as the array was already one drive short.

dmesg looked something like this (the device is /dev/md6):

[ 2577.637615] raid5: device sdb5 operational as raid disk 0
[ 2577.637626] raid5: device hde5 operational as raid disk 3
[ 2577.637633] raid5: device hdg5 operational as raid disk 2
[ 2577.637639] raid5: device hdc5 operational as raid disk 1
[ 2577.638576] raid5: allocated 5264kB for md6
[ 2577.638629] raid5: cannot start dirty degraded array for md6
[ 2577.639471] RAID5 conf printout:
[ 2577.639476] --- rd:5 wd:4
[ 2577.639482] disk 0, o:1, dev:sdb5
[ 2577.639488] disk 1, o:1, dev:hdc5
[ 2577.639494] disk 2, o:1, dev:hdg5
[ 2577.639499] disk 3, o:1, dev:hde5
[ 2577.639504] raid5: failed to run raid set md6
[ 2577.639509] md: pers->run() failed ...

I don't have backups for that data, so I was adamant on getting the array back to work. Afterall I knew it was at least 99% correct. I would be more than happy to accept a little corruption to get the majority of my data back.

I was able to force my kernel to believe that the disks were okay using the following commands.
WARNING 1: don't try this unless you are desperate and all alternatives have failed.
WARNING 2: I wrote the following down from memory. My brain does not do raid, and I'm not entirely sure the procedure below is complete and correct. However, as the information below is hard to find, I decided do write it down anyway. Look it up _before_ you use it!

(In this example I'll use /dev/md6 as the raid device)

1. Stop the array:
mdadm -S /dev/md6

2. Make the array read only:
echo 1 > /sys/module/md_mod/parameters/start_ro

3. Force the kernel to believe the array is clean:
echo "clean" > /sys/block/md6/md/array_state
(ignore the error message)

4. Restart the array:
mdadm -A --force /dev/md6

You can now take a look at /proc/mdstat to see if the array has been started, and then try to mount it. You'll probably need to do an fsck.

(Now would be a good time to consider a backup strategy)

Inhoud syndiceren