Skip to main content

how can I boot linux from a software raid 1 array



I'm trying to make a raid array on an existing linux ubuntu install.



I'm following this tutorial...
http://howtoforge.org/software-raid1-grub-boot-fedora-8




After going through the list of things a million times I finally understand what's going on.
You make the raid device, on your new blank drive, copy your old / drive to it, set up the grub menu.lst, fstab, mtab initrd and grub MBR to all point to the raid device (which I have defined and is working) and then you reboot.
Once you've booted, you now run in the raid device (/dev/md0)
Then you merely hook your original drive up to the raid array, it syncs and voila you're done.



So I set up my menu.lst to primarily load the kernel and initrd from the raid device, and failover to my original (still intact) old disk.
And it always fails over when I reboot.
I boot the machine, run my new grub entry and it says "error 15 file not found."
Lots of stuff on the web about it, none seem to help.




The only thing that's weird is when I go to setup the MBR with grub, you say "root (hd0,0)" which I finally understand what it means, and it's supposed to say Filesystem type is ext2fs, partition type 0xfd or somethingn like that.
Mine says nothing.
But when I run setup (hd0) and setup (hd2) it says it's doing the right thing to the right drive.
So I assume it's working.
but it can't load initrd/the kernel from the md0 device.



The only other thing I'm thinking, is how on earth does grub know what a raid device is.
The kernel hasn't loaded, the software raid modules haven't loaded, how can stupid little grub have any idea at all where to load initrd from?
So I'm thinking, okay there's a mapping somewhere from /dev/md0 to /dev/sdc1 (the new raid drive) but I don't see where that could be happening.
And for kicks, (I did this SO many times in various combinations) I tried setting the grub menu.lst to try and load the initrd and kernel from root=/dev/sdc1 (my new drive) and it still says file not found.

So either the grub mbr setup isn't working, or I'm missing something really simple.



Any ideas?




Here's some more info...

root@io:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdc1[1]

18771840 blocks [2/1] [_U]



root@io:~# fdisk -l

Disk /dev/sda: 20.8 GB, 20847697920 bytes
255 heads, 63 sectors/track, 2534 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x9d949d94


Device Boot Start End Blocks Id System
/dev/sda1 * 1 2337 18771921 83 Linux
/dev/sda2 2338 2434 779152+ 5 Extended
/dev/sda5 2338 2434 779121 82 Linux swap / Solaris

Disk /dev/sdb: 320.0 GB, 320072933376 bytes
16 heads, 63 sectors/track, 620181 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Disk identifier: 0x00000000


Device Boot Start End Blocks Id System
/dev/sdb1 * 1 4064 2048224+ 83 Linux
/dev/sdb2 4065 620181 310522968 83 Linux

Disk /dev/sdc: 20.0 GB, 20020396032 bytes
255 heads, 63 sectors/track, 2434 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000080


Device Boot Start End Blocks Id System
/dev/sdc1 * 1 2337 18771921 fd Linux raid autodetect
/dev/sdc2 2338 2434 779152+ 5 Extended
/dev/sdc5 2338 2434 779121 82 Linux swap / Solaris

Disk /dev/md0: 19.2 GB, 19222364160 bytes
2 heads, 4 sectors/track, 4692960 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x00000000


Disk /dev/md0 doesn't contain a valid partition table



root@io:~# mdadm -E
mdadm: No devices to examine



root@io:~# cat /etc/mdadm.conf

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=5248ed76:cba39cc2:3082255a:649c0d18
root@io:~#



root@io:~# cat /boot/grub/menu.lst

default 0
# 8/14/09 added this
fallback 1


## timeout sec
# Set a timeout, in SEC seconds, before automatically booting the default entry
# (normally the first entry defined).
timeout 3

## hiddenmenu
# Hides the menu by default (press ESC to see the menu)
hiddenmenu


# added this 8/14/09 for raid boot, note this will get blown away on next kernel update
# if it's after the magic marker
# this means we will have to manually update this when there's a kernel upgrade :-(
# in grub land hd0 = /dev/sda and hd1 = /dev/sdb and hd2 = /dev/sdc I hope
# we're putting sdc first for now
title Ubuntu 8.04.3 LTS, kernel 2.6.24-24-generic (raid)
root (hd2,0)
#kernel /boot/vmlinuz-2.6.24-24-generic root=UUID=b11d6b08-fdfe-4b0d-adec-4e263455be23 ro
kernel /boot/vmlinuz-2.6.24-24-generic root=/dev/md0 ro
initrd /boot/initrd.img-2.6.24-24-generic

quiet




title Ubuntu 8.04.3 LTS, kernel 2.6.24-24-generic
root (hd0,0)
kernel /boot/vmlinuz-2.6.24-24-generic root=UUID=d8c402cc-7445-4878-b3aa-c9568b740b51 ro
initrd /boot/initrd.img-2.6.24-24-generic
quiet



title Ubuntu 8.04.3 LTS, kernel 2.6.24-24-generic (recovery mode)
root (hd0,0)
kernel /boot/vmlinuz-2.6.24-24-generic root=UUID=d8c402cc-7445-4878-b3aa-c9568b740b51 ro single
initrd /boot/initrd.img-2.6.24-24-generic



root@io:~# blkid

/dev/sda1: UUID="d8c402cc-7445-4878-b3aa-c9568b740b51" SEC_TYPE="ext2" TYPE="ext3"
/dev/sda5: TYPE="swap" UUID="e0509276-30eb-4dcb-8e17-20f8244f5403"
/dev/sdb1: LABEL="alt" UUID="ea1789eb-9d6f-47a9-a074-18121792b30a" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb2: LABEL="sp" UUID="3b6d1173-f9fd-4a3e-8e5d-249fc682355b" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdc1: UUID="76ed4852-c29c-a3cb-5a25-8230180d9c64" TYPE="mdraid"
/dev/md0: UUID="b11d6b08-fdfe-4b0d-adec-4e263455be23" SEC_TYPE="ext2" TYPE="ext3"



Answer




For anybody else who ends up suffering the error 15 grief that I did, it turns out that the device naming scheme in grub (hd0, hd1, hd2...) ended up being different between when grub boots and when grub is running after the system is up and running.
I spent a week with root (hd2,0) because that's what grub told me the drive I wanted was called.
But when I dropped to the grub shell on bootup I was surprised to find out that what was hd2 when the machine is up, is hd1 on boot. So I changed the menu.lst to use root (hd1,0) and it started working.
I hope to save somebody else lots of hair pulling with that one.


Comments

Popular posts from this blog

linux - iDRAC6 Virtual Media native library cannot be loaded

When attempting to mount Virtual Media on a iDRAC6 IP KVM session I get the following error: I'm using Ubuntu 9.04 and: $ javaws -version Java(TM) Web Start 1.6.0_16 $ uname -a Linux aud22419-linux 2.6.28-15-generic #51-Ubuntu SMP Mon Aug 31 13:39:06 UTC 2009 x86_64 GNU/Linux $ firefox -version Mozilla Firefox 3.0.14, Copyright (c) 1998 - 2009 mozilla.org On Windows + IE it (unsurprisingly) works. I've just gotten off the phone with the Dell tech support and I was told it is known to work on Linux + Firefox, albeit Ubuntu is not supported (by Dell, that is). Has anyone out there managed to mount virtual media in the same scenario?

hp proliant - Smart Array P822 with HBA Mode?

We get an HP DL360 G8 with an Smart Array P822 controller. On that controller will come a HP StorageWorks D2700 . Does anybody know, that it is possible to run the Smart Array P822 in HBA mode? I found only information about the P410i, who can run HBA. If this is not supported, what you think about the LSI 9207-8e controller? Will this fit good in that setup? The Hardware we get is used but all original from HP. The StorageWorks has 25 x 900 GB SAS 10K disks. Because the disks are not new I would like to use only 22 for raid6, and the rest for spare (I need to see if the disk count is optimal or not for zfs). It would be nice if I'm not stick to SAS in future. As OS I would like to install debian stretch with zfs 0.71 as file system and software raid. I have see that hp has an page for debian to. I would like to use hba mode because it is recommend, that zfs know at most as possible about the disk, and I'm independent from the raid controller. For us zfs have many benefits,

apache 2.2 - Server Potentially Compromised -- c99madshell

So, low and behold, a legacy site we've been hosting for a client had a version of FCKEditor that allowed someone to upload the dreaded c99madshell exploit onto our web host. I'm not a big security buff -- frankly I'm just a dev currently responsible for S/A duties due to a loss of personnel. Accordingly, I'd love any help you server-faulters could provide in assessing the damage from the exploit. To give you a bit of information: The file was uploaded into a directory within the webroot, "/_img/fck_uploads/File/". The Apache user and group are restricted such that they can't log in and don't have permissions outside of the directory from which we serve sites. All the files had 770 permissions (user rwx, group rwx, other none) -- something I wanted to fix but was told to hold off on as it wasn't "high priority" (hopefully this changes that). So it seems the hackers could've easily executed the script. Now I wasn't able