Recently I had a hard drive fail. It was part of a Linux software RAID 1 (mirrored drives), so we lost no data, and just needed to replace hardware. However, the raid does requires rebuilding. A hardware array would usually automatically rebuild upon drive replacement, but this needed some help.
When you look at a "normal" array, you see something like this:
# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[1] hdb3[0]
262016 blocks [2/2] [UU]
md1 : active raid1 hda2[1] hdb2[0]
119684160 blocks [2/2] [UU]
md0 : active raid1 hda1[1] hdb1[0]
102208 blocks [2/2] [UU]
unused devices: <none>
That's the normal state - what you want it to look like. When a drive has failed and been replaced, it looks like this:
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hda1[1]
102208 blocks [2/1] [_U]
md2 : active raid1 hda3[1]
262016 blocks [2/1] [_U]
md1 : active raid1 hda2[1]
119684160 blocks [2/1] [_U]
unused devices: <none>
Notice that it doesn't list the failed drive parts, and that an underscore appears beside each U. This shows that only one drive is active in these arrays - we have no mirror.
Another command that will show us the state of the raid drives is "mdadm"
# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Thu Aug 21 12:22:43 2003
Raid Level : raid1
Array Size : 102208 (99.81 MiB 104.66 MB)
Device Size : 102208 (99.81 MiB 104.66 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Oct 15 06:25:45 2004
State : dirty, no-errors
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 0 0 0 faulty removed
1 3 1 1 active sync /dev/hda1
UUID : f9401842:995dc86c:b4102b57:f2996278
As this shows, we presently only have one drive in the array.
Although I already knew that /dev/hdb was the other part of the raid array, you can look at /etc/raidtab to see how the raid was defined:
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda2
raid-disk 0
device /dev/hdb2
raid-disk 1
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda1
raid-disk 0
device /dev/hdb1
raid-disk 1
raiddev /dev/md2
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda3
raid-disk 0
device /dev/hdb3
raid-disk 1
To get the mirrored drives working properly again, we need to run fdisk to see what partitions are on the working drive:
# fdisk /dev/hda Command (m for help): p Disk /dev/hda: 255 heads, 63 sectors, 14946 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 fd Linux raid autodetect /dev/hda2 14 14913 119684250 fd Linux raid autodetect /dev/hda3 14914 14946 265072+ fd Linux raid autodetect
Duplicate that on /dev/hdb. Use "n" to create the parttions, and "t" to change their type to "fd" to match. Once this is done, use "raidhotadd":
# raidhotadd /dev/md0 /dev/hdb1 # raidhotadd /dev/md1 /dev/hdb2 # raidhotadd /dev/md2 /dev/hdb3
The rebuilding can be seen in /proc/mdstat:
# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdb1[0] hda1[1]
102208 blocks [2/2] [UU]
md2 : active raid1 hda3[1]
262016 blocks [2/1] [_U]
md1 : active raid1 hdb2[2] hda2[1]
119684160 blocks [2/1] [_U]
[>....................] recovery = 0.2% (250108/119684160) finish=198.8min speed=10004K/sec
unused devices: <none>
The md0, a small array, has already completed rebuilding (UU), while md1 has only begun. After it finishes, it will show:
# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.00
Creation Time : Thu Aug 21 12:21:21 2003
Raid Level : raid1
Array Size : 119684160 (114.13 GiB 122.55 GB)
Device Size : 119684160 (114.13 GiB 122.55 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Oct 15 13:19:11 2004
State : dirty, no-errors
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 3 66 0 active sync /dev/hdb2
1 3 2 1 active sync /dev/hda2
UUID : ede70f08:0fdf752d:b408d85a:ada8922b
I was a little surprised that this process wasn't entirely automatic. There's no reason it couldn't be. This is an older Linux install; I don't know if more modern versions will just automatically rebuild.
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments
---October 24, 2004
The tool already exists.
http://scsirastools.sourceforge.net/
Look at the "sgraidmon" piece. If a mirror fails, it will automatically rebuild your partitions and resync the raid after a new disk is inserted.
---October 24, 2004
That rather specifically mentions SCSI - what about IDE RAID?
--TonyLawrence
---October 24, 2004
I've got a setup with the first partition on each disk as swap. Setting them to the same priority, they will effectivly stream, while the rest of the partitions are mirrored.
That setup is very efficient, but it has two drawbacks:
1:
If one disk dies, the machine will go down, since the streamed swap will suddenly be full of holes (a quick reboot would be all that would be needed to make the swap "whole" on one disk, so I don't consider that a big problem).
2:
I would have to use a manual method of reentering a new disk anyway, since I do not think the makers of the raid-auto-tool (if it exist) would read fstab and raidtab, and set up the swap for me. No biggie - I can live with that.
If you want to know more about how I've set it up, balancing between highest security and speed; offering somewhat uptime in the process, you can read all about it on: http://nalle.no/newnalle.php
There you can also get the reason why you should not use 'dd' to copy the drives, and a short description of the troubles with booting from the RAID. The Linux RAID HOWTO is an extremly good help too - well written and easy to understand (and if you don't you'll get far only by following the recepie).
---October 25, 2004
I don't know whether this is useful, but I found that resyncing the RAID on my machine was very slow because it used only a fraction of the I/O bandwidth. By modifying the
values in /proc/sys/dev/raid/speed_limit_min and
/proc/sys/dev/raid/speed_limit_max the resync was much quicker. Since my machine is a single-user workstation the I/O hit on other applications on the machine was acceptable.
-- Jeremy.Norfolk@thenorfolks.net
---October 25, 2004
What if I'm not so lucky as to have a linux raid, but have been left behind in the SGI IRIX 3rd party scsi raid land? Anyone have good tips?
-kurt
---November 14, 2004
"raidhotadd" is not on FC3, however MDADM can do it. For adding drives back to the array, use:
mdadm [raid-array] -add [drive-to-add], e.g.,
mdadm /dev/md0 -add /dev/sdb1
-JL
---December 31, 2004
I removed hdb (100GB IDE drive) from my system and installed a 300GB IDE drive.
I followed the above steps to rebuild my IDE RAID1 mirror, but the first partition on hdb is not bootable, like it is on hda. Once I can get the partition to be bootable, I plan to expand the disk to 300GB and then install the other 300GB drive and let the system rebuild it.
How can I get the first partition to be bootable? I am using Mitel SME server built on RH7.3.
Thanks - TC
---December 31, 2004
You probably need to run grub to write boot tracks (I think that version uses grub, right?).
Questions should go to the Forum, not comments.
--TonyLawrence
Fri Jun 24 14:58:48 2005: Subject: TonyLawrence
Actuall, hda is slightly larger:
$ expr 255 \* 63 \* 1245
20000925
$ expr 19846 \* 63 \* 16
20004768
$
That's just not going to work unless part of a is not used..
Sun Aug 31 10:23:21 2008: Subject: anonymous
As a relatively new linux user I have found the least hassle way to recover your RAID 1 is using the GUI Webmin tool www.webmin.com. On my home server I disconnected my primary disk to test my RAID worked properly. Rebuilding it was a simple as selecting it on the Webmin management page and adding it back in. Monitoring of the rebuild was still performed by cat /proc/mdstat
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar