Explainer: What is RAID?

Reading about servers and storage you might have wondered: What is RAID? RAID is a method of melting multiple harddrives together to form a (often) larger logical drive. The acronym expands to either Redundant Array of Inexpensive Disks or Redundant Array of Independent Disks.

RAID was defined in a research paper: “A case for Redundant Arrays of Inexpensive Disks (RAID)” (Patterson et al, 1988). Andrew S Tanenbaum uses the latter acronym in his book (“Structured Computer Organization 5th Edition”, Tanenbaum, 2006, p 89). I had access to the 5th edition, from which I reference but you can get the 6th edition from Amazon.com and amazon.de (affiliate links).

In this explainer I will present an overview of RAID arrays, the different RAID levels and the pros and cons of each. I will also discuss the pros and cons of Hardware and Software RAID.

When Patterson et al looked into this issue it was a search for better storage performance in order to keep up with the exponential increase of computational performance.

An important note about RAID is that RAID is NOT Backup. The purpose of a RAID array is to keep your data ready and available – it is about having a continuous access to your data rather than the security that a backup offers. You may place your backup on a RAID array seperate from your production array, but it is the seperation of the data physically from the production files that constitute a backup.

RAID levels

In their paper Patterson et al defined RAID levels 1-5. Only two remain in general use today (1 and 5) but others have emerged. Here we will concentrate on the current levels including some hybrid versions. Later I will cover the obsolete/rare levels.

RAID 0 - Striping

RAID 0 was not in the original Patterson paper and would technically speaking not qualify as a RAID but merely an AID as there is no Redundancy. It spreads the data across 2 or more disks in stripes – hence the name. When writing e.g. 6 blocks on a 2 disk array block 0 is written on disk 0, block 1 on disk 1, block 2 on disk 0, block 3 on disk 1, block 4 on disk 0 and finally block 5 on disk 1. If the array has 3 disks, block 0 is written on disk 0, block 1 on disk 1, block 2 on disk 2, block 3 on disk 0, block 4 on disk 1 and finally block 5 (see illustration below).

This means that both read and write speeds increase almost linearly with the number of drives in the array as both reads and writes can be done in parallel.

It is quite fitting that this level is numbered at zero as it has no reduncancy. If a drive fails the array fails – usually without any chance of recovering the data. Therefore RAID 0 or stripe arrays are often used for scratch drives, where speed is more important than data-retention. This could be video editing where the original data is stored elsewhere and copied over to the scratch drive while it is being worked on.

Please note that the size of the array is not simply the added size of the drives in the array –  it will be the size of the smallest drive multiplied with the number of drives.

A minimum of 2 drives are needed for RAID 0.

RAID 1 - Mirroring

RAID 1 mirrors the data on all the drives in the array. This way the data is replicated to every disk in the array and thus this gives the best data retention. If a RAID 1 setup has n drives in the array, it can suffer n-1 disks failing and still continue to serve although it will be marked degraded.

As with RAID 0 read speeds increases almost linearly with the number of drives as blocks can be read in parallel across the array, but write speeds will be that of a single drive as each block will have to be written on every drive in the array.

As with all RAID levels the smallest disk in the array dictates the size of the array.

A minimum of 2 drives are needed for RAID 1. More drives add both stability and read speed. As data is written once per drive in the array this RAID level is the safest as it can suffer all but one drive fails but it also has the lowest space utilisation ratio – maximum is 50 % with two drives and then declining as each drive is added. That is the price paid for the stability.

RAID 5

RAID 5 tries to balance stability and utilisation of space. It does this by writing the data blocks across the drives minus one and keeping an extra parity block on a drive. In order not to create a bottle neck these parity blocks are spread across the drives in the array. Thus the array can function even if one of the drives are lost, although it will then be in a degraded state and measures to replace the faulty drives needs to be taken before a second drive fails.

An example with 4 drives, writing 6 blocks: Block 0 will be written on disk 0, block 1 will be written on disk 1, block 2 will be written on disk 2, a parity block will be calculated and written on disk 3, block 3 will be written on disk 0, block 4 will be written on disk 1, block 5 will be written on disk 3, and a parity block will be calculated and written on disk 2.

Write speed is faster than RAID 1 as individual blocks can be written in parallel, but the parity block needs to be calculated and written as well. Read speeds are equal to the number of drives in the array as data blocks can be read in parallel. If the array is degraded i.e. a disk has failed, the equivilant parity block is read and the missing block in the stripe is calculated from the parity block and the remaining data blocks.

Read and write speeds as well as the space utilisation ratio gets closer to that of RAID 0 by having more drives in the array. However more drives increases the risk of more than 1 drive failing before it can be replaced.

A minimum of 3 drives are needed for RAID 5

RAID 6

RAID 6 tries to improve stability from RAID 5 by adding a second parity block, hence it can suffer a 2 drive failure and still keep standing altough in a degraded state. It does this by writing the data blocks across the drives minus two and then keeping two extra parity block on different drives. In order not to create a bottle neck these parity blocks are spread across the drives in the array. Even with 2 possible drive failures and continued opretation, measures to replace a faulty drives needs to be taken as soon as possible – if a third drive fails, we have a problem.

An array is extra vulnerable wehn rebuilding/resilvering it self after a failed disk is replaced as it will be running with a very heightened activity as it needs to read ever block in every stripe on the remaining drives in the array in order to calculate the missing data and parity blocks needed to rebuild the lost drive. This goes for all of the RAID levels except RAID 0 as this does not have redundancy data and thus cannot rebuild/recover. Here data has to be

An example with 4 drives, writing 6 blocks: Block 0 will be written on disk 0, block 1 will be written on disk 1, parity block 0 will be calculated and written on disk 2, parity block 1 will be calculated and written on disk 3, block 2 will be written on disk 0, block 3 will be written on disk 3, parity block 2 will be calculated and written on disk 1, parity block 3 will be calculated and written on disk 2, block 4 will be written on disk 2, block 5 will be written on disk 3, and parity block 4 will be calculated and written on disk 0, parity block 6 will be calculated and written on disk 1.

Write speed is faster than RAID 1 as individual blocks can be written in parallel, but the parity blocsk needs to be calculated and written as well. Read speeds are equal to the number of drives in the array as data blocks can be read in parallel. If the array is degraded i.e. on or two disk has failed, the equivilant parity block(s) are read and the missing block(s) in the stripe are calculated from the parity blocks and the remaining data blocks.

Read and write speeds as well as the space utilisation ratio gets closer to that of RAID 0 by having more drives in the array. However more drives increases the risk of more than 1 drive failing before it can be replaced.

A minimum of 4 drives are needed for RAID 6

RAID 1+0 / 10

This is a Hybrid RAID level that gives both speed and redundancy. It is usually two sets of mirrored drives that is then striped across.

Space utilisation is the same as with RAID 1  – a maximum of 50% with a two-way mirroring. In theory any combination is possible – it is possible to mirror across more than 2 drives per striping unit and it is possible to stripe across more than 2 mirrors. The more mirrors in each striping unit the better resilience but with a lower space utilisation ratio – the more striping units gives a larger logical disk with better performance but still with a max of 50 per cent space utilisation.

Resilience depends on the configuration. In theory you can have as many simultanious disk failures as you have stripe units and still have a functional but degraded RAID array. But if you loose all disks in the same mirror set, the array is lost. If speed is not a premium necessity, RAID 6 might be a better option. Here you can loose any two disks and still have a degraded, but functional array

RAID 1+0 and RAID 10 are two names for the same RAID set – the first reflect the hybrid nature better but technically they are the same.

RAID 5+0 / 50

Another hybrid RAID level that functions the same way as RAID 1+0/10. Here a RAID 5 array is striped across a number of mirrors – with the assumption of two disks en each mirror a minimum of 6 drives are needed.

The same caveat as with RAID 1+0 is in play here though with an extra safety – if two drives in the same mirror set fails the RAID 5 array on top of the mirrors is degraded but functional as it has then lost its parity “drive”. So with a minimum array of 6 drives total it can withstand up to 3 drives failing and still be operational even with 2 drives failing in the same mirror pair. A total of 4 drives may fail within a degraded array if they are spread out across the mirror pairs.

RAID 6+0 / 60

Just like the two prior hybrid RAID levels this is also a mirrored solution – this time of a RAID 6 array. As a RAID 6 needs a minimum of 4 drives, RAID needs a multiplum of 4 drives, a minimum of 8 drives is needed. Here each RAID 6 mirror can suffer 2 simultanious drive failueres before the array is threatened. Whether the array has 2 or more mirrors only one mirror has to be functioning for the array to be working. But as with the other RAID levels, a resilvering takes a larger toll on the remaining disks than normal operation does, so it is advisable to replace failed drives as soon as possible.

JBOD

JBOD or Just a Bunch Of Disks is technically not a RAID level, at least not according to Patterson et al. It may or may not have redundancy built in. The pro of a JBOD is that you can put drives of varying sized together and use all the available space. A RAID 5 array of say 5 disks will hold 4 x the storage space of the smallest drive – one “drive” is lost to parity (remember that both RAID 5 and 6 spreads parity blocks across the drives). If the array consists of different capacity size drives, the smallest drive dictates the capacity of the array. Any surplus storage space on the larger drives are lost. This is not the case of JBOD.

Obsolete and rare RAID levels

To come …

Hardware RAID vs Software RAID

To come …