====== Overview ====== Long name is Redundant Arrays of Inexpensive Disks - see wikipedia for RAID levels. **IMPORTANT**: a RAID is no replacement for backups! So: make sure to backup the data on the RAID regularly. ====== Setup ====== ===== Raid Setup ===== The mdadm tool handles linux software RAIDs. sudo apt-get install mdadm Prepare the disks: fdisk /dev/sd[abcd] create a primary partition and set its type to Linux raid autodetect (hex code: fd). Do this for all disks you want to combine as raid. Create a raid level 1 device node md0 with 2 hard discs: mdadm --create --verbose /dev/md0 --level=1 --run --raid-devices=2 /dev/sda /dev/sdb Format the new device as ext3 mkfs.ext3 /dev/md0 Write the raid configuration to mdadm's config file mdadm --detail --scan --verbose > /etc/mdadm/mdadm.conf You should add a mail contact to the config so that it finally looks like ARRAY /dev/md0 level=raid6 num-devices=4 UUID=595ee5d4:d8fe61ac:e35eacf0:6e4b8477 devices=/dev/sda,/dev/sdb,/dev/sdc,/dev/sdd MAILADDR mail@bla.org Create a mountpoint and edit /etc/fstab so the new raid can be mounted automatically /dev/md0 /mnt/raid ext3 defaults 1 2 Make sure the raid is mounted at boot. Put into /etc/rc.local mdadm -As mount /mnt/raid mdadm uses the raid configuration provided in the /etc/mdadm/mdadm.conf we created before. ====== Troubleshooting ====== ===== Device or Resource Busy ===== When trying to create a RAID array on Ubuntu Karmic (9.10) you might get an error saying "Device or resource busy". The culprit might be the dm-raid driver having taken control of the RAID devices. #!highlight bash sudo apt-get remove dmraid libdmraid generates a new initrd without the dm-raid driver. Just reboot afterwards, and try mdadm --create again. ===== Problems when assembling ===== If you get error messages when assembling the raid with //mdadm -As// check the config in **/etc/mdadm/mdadm.cfg** . Try manually assembling the RAID using something like #!highlight bash mdadm --assemble --scan /dev/sda /dev/sdb If this works then it is most likely that the UUID in mdadm.cfg is wrong. To find the correct UUID, manually assemble the raid (see above) then use #!highlight bash sudo mdadm --detail /dev/md0 to display the details. Copy the UUID to mdadm.cfg . ====== Restoring a RAID array ====== **IMPORTANT:** DO NOT USE mdadm --create on an existing array. Use //--assemble// (see below). If you have an existing (mdadm) RAID array, you can tell mdadm to automatically find and use it: #!highlight bash sudo mdadm --assemble --scan # scanning tries to guess which partitions are to be assembled Or you may explicitly choose the partitions to use: #!highlight bash sudo mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 ====== Usage ====== ===== Raid monitoring ===== Installing mdadm activates a monitoring daemon which is started at boot. To see if it's running do #!highlight bash ps ax | grep monitor You should see something like 5785 ? Ss 0:00 /sbin/mdadm --monitor --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog If you add a mail address to the mdadm.conf, warning mails will be sent by the daemon in case of raid failures. ===== Access via smb ===== Install Samba server #!highlight bash sudo apt-get install samba }} Edit /etc/samba/smb.conf to make the shares accessible. [DATA] path = /mnt/raid/bla/ browseable = yes read only = no guest ok = no create mask = 0644 directory mask = 0755 force user = rorschach Create the users who should be allowed to access the shares and give them passwords. #!highlight bash sudo useradd -s /bin/true rorschach # linux user who may not login to the system sudo smbpasswd -L -a rorschach #add samba user sudo smbpasswd -L -e rorschach #enable samba user ====== Failures ====== ===== RAID Health ===== #!highlight bash mdadm --detail /dev/md0 shows for a healthy raid /dev/md0: Version : 00.90.03 Creation Time : Thu Apr 17 11:21:06 2008 Raid Level : raid6 Array Size : 781422592 (745.22 GiB 800.18 GB) Used Dev Size : 390711296 (372.61 GiB 400.09 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Fri Apr 18 09:46:39 2008 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Chunk Size : 256K UUID : 595ee5d4:d8fe61ac:e35eacf0:6e4b8477 Events : 0.15 Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/sda 1 8 16 1 active sync /dev/sdb 2 8 32 2 active sync /dev/sdc 3 8 48 3 active sync /dev/sdd ===== Simulated failure ===== #!highlight bash mdadm --manage --set-faulty /dev/md1 /dev/sda to set one disc as faulty. It says mdadm: set /dev/sda faulty in /dev/md0 Check the syslog to see what happens #!highlight bash tail -f /var/log/syslog The event has been detected and a mail has been sent to the admin. Apr 18 10:17:39 INES kernel: [77650.308834] --- rd:4 wd:3 Apr 18 10:17:39 INES kernel: [77650.308836] disk 1, o:1, dev:sdb Apr 18 10:17:39 INES kernel: [77650.308839] disk 2, o:1, dev:sdc Apr 18 10:17:39 INES kernel: [77650.308841] disk 3, o:1, dev:sdd *Apr 18 10:17:39 INES mdadm: Fail event detected on md device /dev/md0, component device /dev/sda* Apr 18 10:17:39 INES postfix/pickup[30816]: 86B902CA824F: uid=0 from= Apr 18 10:17:39 INES postfix/cleanup[32040]: 86B902CA824F: message-id=<20080418081739.86B902CA824F@INES.arfcd.com> Apr 18 10:17:39 INES postfix/qmgr[14269]: 86B902CA824F: from=, size=861, nrcpt=1 (queue active) *Apr 18 10:17:39 INES postfix/smtp[32042]: 86B902CA824F: to=, relay=s0ms2.arc.local[172.24.10.6]:25, delay=0.46, delays=0.22/0.04/0.1/0.1, dsn=2.6.0, status=sent* Apr 18 10:17:39 INES postfix/qmgr[14269]: 86B902CA824F: removed Apr 18 10:18:39 INES mdadm: SpareActive event detected on md device /dev/md0, component device /dev/sda Now the raid details look like this sudo mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Thu Apr 17 11:21:06 2008 Raid Level : raid6 Array Size : 781422592 (745.22 GiB 800.18 GB) Used Dev Size : 390711296 (372.61 GiB 400.09 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Fri Apr 18 10:19:10 2008 *State : clean, degraded* Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Chunk Size : 256K UUID : 595ee5d4:d8fe61ac:e35eacf0:6e4b8477 Events : 0.20 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 16 1 active sync /dev/sdb 2 8 32 2 active sync /dev/sdc 3 8 48 3 active sync /dev/sdd *4 8 0 - faulty spare /dev/sda* ===== "Exchange" disks ===== Remove the old disk from the raid #!highlight bash mdadm /dev/md0 -r /dev/sda Add the new disk to the raid #!highlight bash mdadm /dev/md0 -a /dev/sda Now you should see a recovery sudo mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Thu Apr 17 11:21:06 2008 Raid Level : raid6 Array Size : 781422592 (745.22 GiB 800.18 GB) Used Dev Size : 390711296 (372.61 GiB 400.09 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Fri Apr 18 10:25:41 2008 *State : clean, degraded, recovering* Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Chunk Size : 256K *Rebuild Status : 1% complete* UUID : 595ee5d4:d8fe61ac:e35eacf0:6e4b8477 Events : 0.62 Number Major Minor RaidDevice State *4 8 0 0 spare rebuilding /dev/sda* 1 8 16 1 active sync /dev/sdb 2 8 32 2 active sync /dev/sdc 3 8 48 3 active sync /dev/sdd and #!highlight bash cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sda[4] sdd[3] sdc[2] sdb[1] 781422592 blocks level 6, 256k chunk, algorithm 2 [4/3] [_UUU] [>....................] recovery = 3.9% (15251968/390711296) finish=105.1min speed=59486K/sec unused devices: ===== Real failure ===== To recover data from a RAID1, you can try to mount one of the disks as a separate disk #!highlight bash sudo mount -t ext3 /dev/ # you NEED to specify the filesystem-type manually! ====== Benchmarking ====== #!highlight bash sudo tiobench --size 66000 --threads 1 --threads 8 to test read and write performance with 1 and 8 threads.