The year was 1987.
The size of a disk drive was measured in megabytes, not gigs or
terabytes. An 80-MB drive was considered a bit of a luxury.
A team of computer scientists at the university of California at
Berkeley had just released a paper suggesting
it might be a good idea to string together a
number of average-size disk drives to derive larger capacities that
would otherwise require high-end and very expensive equipment.
For instance, since there was no such thing as a 800-MB
disk drive, why not stack up ten 80-MB drives in a rack
and design a special disk controller that would make the whole
thing look as if it were a single device with 10 times the capacity?
That, in fact, wasn't rocket science. The technology was
sufficiently advanced to make this happen without too much trouble.
The real problem was the greater probability of hardware failure.
Indeed, every disk drive has a certain probability of failure.
To use simplistic numbers, let's say that a typical disk drive has
a 1% chance of failing within the first year. Now, if you group 10
of these in an array, you now have 10 times the odds that one of
them will fail within a year, or a 10% probability of failure.
To make matters worse, when the array eventually fails, you will have
lost 10 times the amount of data you would have lost with a single drive.
So, the math didn't look good: a disk array would lead to more
frequent data losses, and have greater consequences.
To get around the problem, these very smart scientists (named David
Patterson, Garth Gibson and Randy Katz) came up with a clever way
to organize data on an array of multiple disks so that it would
always be possible to reconstruct the missing data if one of
the disk members in the array were to fail unexpectedly.
We will see in this tutorial how this works.
When such a failure occurs, the array as a whole would continue
to service the applications as if nothing had happened, and the
faulty disk would be replaced at a convenient time without any data
With this breakthrough, the theoretical concept of grouping
several inexpensive disks in an array to simulate one very
large disk was finally practical.
These gentlemen went ahead and published their proposed strategies,
and coined the term "Redundant Array of Inexpensive Disks," or RAID
Over the years, as hardware prices continued to fall as they always
do, the term "Inexpensive" in the original acronym quickly became
ill-suited, so the technology was renamed "Redundant Array
of Independent Disks,"
a more appropriate term for this technology.
RAID technology can be implemented either through a software driver
in the operating system or using a separate hardware device.
In the latter case, a RAID controller is used to control and
manage two or more disk drives.
The RAID controller can be a slide-in
card that is inserted into the computer, or it can be embedded into
a separate drive enclosure that functions as an external disk
Either way, the RAID controller functions independently from the
computer system to which it is attached and is configured
through the controller's panel or configuration software.
As far as the computer is concerned, the controller and its array
of multiple disks looks like one ordinary disk drive.
In the case of software-based RAID, a software module (a "driver")
is incorporated into the operating system to create a new device
node which looks to the rest of the system as an
ordinary disk drive but which actually corresponds to an array of
two or more physical disks which are usually housed within the
Under this scenario, the RAID is configured and monitored using
operating system commands and utilities.
There are pros and cons to each option.
The nice thing about an external RAID cabinet is that it is OS-independent.
All configuration can usually be done through a control panel on
the cabinet itself, disk drives can usually be "hot-swapped" in and
out even when the array is in use, and the whole thing looks to the
target computer as if it were a single large external disk drive.
The target machine can be a Windows box, a Linux or Mac machine, or
even a legacy Unix server; it doesn't matter because the computer
doesn't need to know anything about the intricacies of the RAID
configuration — it simply gets the end result, a very large
and very robust disk volume.
The drawbacks to this solution include:
Some hardware solutions do not involve an external cabinet, but
rather use an internal RAID controller within the computer,
such as a slide-in card or even a built-in controller on the
mainboard. Typically, the controller will attach to
standard internal disk drives or to external USB, eSATA or SCSI disk
drives, usually without a cabinet.
These solutions are not as OS-independent as an external cabinet
since they usually require some configuration software to be installed in
the operating system to manage the RAID controller.
In addition, these solutions also present a
challenge in case of hardware fault: if the RAID
controller were to fail, would you be able to replace it quickly
and easily? If it is built into the mainboard, would you be able
to find an equivalent replacement mainboard in short order?
Will that board still be manufactured in a couple of years?
On the other hand, software-based RAID solutions
are less expensive (free in the Linux world), do not introduce a
new point of failure, and do not present hardware replacement
issues. The only hardware components involved are the disk
drives themselves, which are usually commodity items readily
available in stores.
The drawback to software-based RAID solutions
is that since no external cabinet is involved, you are
generally limited to the number of drive bays in the computer case
and to the number of disk drives interfaces (IDE, SATA or SCSI
ports) available in the system. Generally, this means you
are looking at a fairly small RAID configuration such as a pair of
mirrored disks. However, if disk mirroring is all you are
after, then a software-based RAID solution is definitely the way to go.
When Patterson, Gibson and Katz wrote their original RAID
specification, they documented 5 different ways of achieving
fault tolerance through redundant organization of data on the
storage disks. These five levels were dubbed RAID 1 to
RAID 5, in progressive order of complexity.
Over time, a few non-standard RAID levels were
added to the list, such as RAID level Zero (RAID 0), RAID 6 and
RAID 1+0, to name a few.
We will cover these briefly, but we will see in this tutorial that
only 2 or 3 of all the possible configurations are
actually used in real-life commercial products,
simply because they offer the greatest benefits.
RAID 0 is one of the non-standard levels just mentioned.
It does not really qualify as a RAID level because it does not
feature any redundancy, which is the "R" in RAID.
RAID 0 consists of stringing together a set of disk drives of
arbitrary capacity to combine their size into a larger, single
volume. This technique is called striping and does not
involve any data redundancy. If any member of the array fails, the
entire array fails.
Graphically, you can think of striping as spreading the contents of a file
across a number of disk drives, like this:
The contents of a file can get scattered across
any of the disks in the array, potentially with fragments on each
disk, like this:
The main benefit of this configuration is that disk access
from the array can be much faster than with a single disk since
only small portions of a file need to be saved to, or retrieved
from, each disk simultaneously. Access is always done in parallel
to all devices at the same time, thus increasing disk I/O
The drawback, of course, is that if
any drive in the array should fail for any reason, data integrity
will be lost
for the entire array since big chunks of data will be missing.
For this reason, striping is best suited to large read-only
filesystems of replaceable data, where access speed is more
important than data survivability. Suitable applications for disk
striping might include static websites (frequent access
to read-only information), music libraries, and reference databases
of large files such as maps, videos or other images.
The first official RAID level, RAID 1, documents a technique
called disk mirroring.
Using this technique, two disk drives of
equal capacity are used as a twinned pair, and any transaction saved to
the array is written to both disks, thus maintaining a pair of
Similarly, whenever the system requests data from storage, it can
come from one disk or the other, at the controller's choice. If
the data occupies more than one disk block, the controller will
usually fetch different parts of the file from the two drives in
parallel, resulting in much faster throughput.
If one disk crashes, the system keeps
humming along as if nothing had happened since all read and write
operations continue to take place on the remaining healthy disk.
As far as applications are concerned, however, there is only one
disk drive in the machine – a very fast one that never
Since drive failures generally go unnoticed in a mirrored environment,
RAID controllers or drivers usually feature monitoring
systems that will immediately alert the system administrator
whenever a member of the array has failed or is starting to act
erratically. On the more sophisticated systems, a
"hot spare" will be included in the array and will automatically
kick in to replace the faulty disk when needed. The faulty
drive can usually be hot-swapped and replaced withough interrupting
the operation of the array.
Because disk mirroring greatly reduces the risk of data loss
but does not provide any extra disk space,
it is best suited for small and medium-size
systems where data loss would directly impact revenues or
Basically, this covers virtually all home and office computers.
If a system is used to create any type of document throughout the
day, be it email,
correspondence, software, music or images, it should use
some form of redundant storage.
Regular system backups are not enough: if a disk crashes in
mid-day, all the work done since the last backup (usually the night
before) has been lost. However, with disk mirroring, the system
continues to work normally during a single-disk crash and the
faulty drive can be replaced later without
In addition to safeguarding the data, disk mirroring also
saves users from the downtime associated with
having to reinstall the operating system and all application
software on a computer after a disk crash. With disk mirroring,
the faulty disk can be replaced when convenient and the RAID
software simply populates the new disk with a mirror image of the
data on the surviving disk, usually while the user continues to
Disk mirroring should be given serious consideration for any
mission-critical system. The cost of a second
disk drive is negligible compared to the cost and disruption of
reinstalling and reconfiguring all software on a system, not to
mention the consequences of losing all data that has been entered
since the last backup.
While disk mirroring (RAID 1) gives us a simple, straightforward
way to implement data redundancy, it does nothing to increase the
size of the array beyond the size of a single member.
RAID levels 2 and up address this issue by stringing together a set
of 3 or more disk drives, much like in a RAID 0 configuration,
while also providing fault tolerance in the event that any single
member of the array should fail.
RAID 2 achieves data redundancy by mimicking
the technology used in error-correcting memory (RAM).
This strategy is actually more complex than necessary since an
array of disk drives offers a more beneficial environment than an array of
memory chips when it comes to implementing redundancy.
The developers of RAID knew this, but introduced this level as a
stepping stone towards RAID level 3, which is a more
efficient implementation. No commercial products use RAID 2
so we will skip ahead to a discussion of RAID 3 which
will feature concepts that are critical to your understanding of
fault-tolerant RAID configurations in general.
Once we have covered these important concepts and they are clear in
your mind, we will quickly take one step back to explain
why RAID 2 was abandoned. It will be easier to explain if we
do things in that order.
The real goal of this chapter is to explain how RAID 5 works,
since that is the configuration you will almost always encounter in
real life. However, to get to that point, it is helpful to start
with the simpler RAID 3 configuration.
The critical concept behind all fault-tolerant configurations (RAID
2 and up) is the concept of parity.
To understand parity, let's keep in mind that data on a
disk drive is really nothing more than a long sequence of bits
— ones and zeros. Graphically, we could represent a 4-disk
array like four silos of bits, something like this:
If you were to sample the very first bit of each disk drive, you
would get a "bit stripe," as shown here:
The same could be done for each "row" of bits across all members of
the array, resulting in as many bit stripes as there are bits on
Now comes the issue of parity. In this context, the term "parity"
simply means whether the sum of all the bits in a stripe is even or
odd. If the sum of all bits in a stripe is even, we say the parity
is zero; if the sum is odd, the parity is one.
For example, let's look at stripe #1 in the above illustration. If
you add up all the ones and zeros in that stripe,
you end up with decimal 3, an
odd number, so the parity for this stripe is 1.
Similarly, the parity for stripe #2 would be zero since the sum of
all these bits is even (well, it's zero).
The sum of all bits in stripe #3 is decimal 4, an even number,
so the parity for this stripe is zero.
Now, here is the clever part of the strategy: If we were to store
the parity information on an
extra disk drive in our array, we would have a way to reconstruct
the contents of ANY member of the array if it were to fail.
To illustrated this, let's add a fifth disk drive
to our array and store the
parity bit for each stripe on that drive. We get something like
Now, if any of our 4 data drives should fail, we can deduce the
missing value of each stripe by examining the parity for that stripe.
For instance, let's take out the second drive:
Using the parity information, the RAID controller can figure out
what each missing bit from that drive had to be, simply by
comparing the parity of the remaining bits from each stripe with
the parity that had been saved on the extra disk drive.
This essentially what a RAID 3 configuration is.
Let's do the math together on the first stripe: without disk #2,
we now have 1, 0 and 1 on the remaining data disks,
totalling decimal 2, an even number.
However, the former parity for that stripe was 1, which tells us the
total of all the bits in the original (full) stripe was an odd number.
The only way this can be possible is if the missing bit was a 1,
so the RAID controller
can fake the presence of the missing drive by returning a 1 to the
computer system when it needs this information from the faulty
Similarly, if we add up the remaining numbers from the second
stripe, we get a total of zero. We note that the old parity was
zero, which tells us the missing bit from the second stripe had to
be a zero to retain an even total.
The RAID controller will continue to do this for every stripe
that needs to be reconstructed whenever the computer is requesting
data from the array.
Note that this strategy will work fine regardless of which
drive is taken out of the array, as long as only one drive is
faulty. If two or more drives become faulty, however, there
is no way to reconstruct the missing
information. For this reason, whenever a fault is detected in any
member of the array, that drive should be replaced as soon as possible.
What if the parity drive itself becomes faulty? Well, that's not
a problem as far as our applications are concerned since all
the data is still intact; none of our data storage drives have
However, without a parity drive, we are no longer enjoying any
redundancy in the array, so that drive should also be replaced as
soon as possible.
When a new drive is in place, the RAID controller will repopulate
the new drive with parity information from all the data drives and,
when this process is completed, we will again enjoy the peace of
mind a RAID configuration is designed to provide.
Most commercial RAID systems (either hardware-based or
software-based) will provide for the
presence of a spare drive, spinning and ready to automatically
replace any failing drive in the array.
When the RAID controller detects that a drive is no longer
responding (or is responding erratically), it formally takes it out
of operation and starts reconstructing its data using the method we
just outlined. The data is written to the spare drive which then
assumes the role of the faulty unit.
This is done while the array continues to service read and write
requests from the computer to which it is attached. The only
perceptible difference as far as users and applications are
concerned is that response time may be a little slower than usual
since the array is performing intensive I/O operations while
rebuilding the data on the "hot spare" drive.
Now that we understand how RAID 3 works, we can take a step
back and examine why RAID 2 was abandoned.
RAID 2 works on the same concept as RAID 3 in that is uses
byte-level parity to reconstruct data from a faulty drive. However,
in our study of RAID 3, we have always assumed that we knew which
drive was at fault. This made it fairly straightforward
to deduce the value of the missing bits using the parity
However, what if we didn't know which drive was faulty? What if we
simply knew that one bit was missing in each stripe but didn't know
which one? This is basically what RAID 2 was trying to
address, and it did so by adding more extra disk drives to store
additional parity information that would allow it to triangulate on
the right bit.
However, since disk controllers can detect read failures,
all these extra measures that would be needed in a memory chip
are not really required in a disk array; the
controller can easily determine which drive is at fault without any
fancy calculations. For this reason, RAID 2 is considered
obsolete and is never used in commercial products.
A RAID 4 configuration is very similar to a RAID 3 configuration,
except that the controller reads and writes data in multi-byte
blocks instead of in single bytes.
For instance, if you were to save a 12-byte piece of information
such as the phrase "Hello world!" to a 5-disk RAID 3 array, the first
byte would be saved to drive 1, the second byte to drive 2, and so
on until all data disks have been used, and then the next byte
would be saved to disk #1 again, in sequence, like this:
Note that each byte really expands to a set of 8 bits which
are tracked with corresponding parity bits on the last disk,
as seen in the previous section on RAID 3. However, for
the sake of clarity, the parity information has been omitted
from this illustration since it is not directly relevant to the
concept presented here.
Now, on a RAID 4 configuration, the controller will
save data in larger blocks on each disk, rather than one byte per
disk. For instance, if the RAID controller is configured to save
data in 16-byte blocks on each disk, the entire phrase "Hello
world!" would have been saved entirely on disk #1 in our example.
Again, parity is tracked the same way as explained in the section
on RAID 3. The only difference between RAID 3 and 4 is
in how the RAID controller
reads and writes data to the storage disks.
The size of a data block in RAID 4 is arbitrary and can
be adjusted as desired
to tweak performance for a particular application or environment.
While we used a 16-byte block in our example, the norm in a
real-world situation would be to use the size of a disk sector (512
bytes) or a multiple of that size.
So, why bother with this modification? What's the benefit?
The benefit is that with this technique, simultaneous reads
become possible. For instance, if the application is dealing with
relatively small data items such as database records, it will
not be uncommon for one such record to be located entirely on a single
disk rather than spread across all members of the array.
In this case, when the application needs to retrieve 2 or more
records, it can often do so in parallel if those records are
physically stored on different disks.
Under RAID 3, the application would have to retrieve the first
record in its entirely before it can get to the second one since
all disks would have to be queried in sequence.
This difference in access method is most important in multi-user
environments where response time will be distributed more fairly
across all users using RAID 4 than RAID 3.
There remains one problem with all the fault-tolerant architectures
we have examined so far (RAID 2, 3 and 4): the parity bits are all
located on the same disk drive.
As a result, whenever the application saves data to any of the data
disks, it must also update the parity drive each time.
The great performance benefit of parallel access to multiple data disks
is defeated by the fact that we still must update the single parity
drive with each write operation. That drive is therefore the
bottleneck in the array, effectively eliminating the performance advantage
of parallel writes to the data disks.
For this reason, RAID 4 is not used in commecial products. In
fact, none of the fault-tolerant architectures we have seen so far
(RAID 2 to 4) are used in real life. They were described in that
1987 technical paper mentioned in
the introduction simply as logical steps leading to RAID 5,
which is the configuration you will usually find in real-life commercial
RAID 5 is essentially the same as RAID 4, except that the
parity information is distributed across the data disks instead of
stored on a separate disk.
At first, you would probably think that something like this
shouldn't work. After all, if the
parity is stored on a data disk and that disk crashes, you have
just lost the information you needed to rebuild that disk!
You would be right about that.
However, if we strategically locate the parity information
so that it is physically staggered across the
array members, then it is possible to recover from any
To illustrate, let's break down each disk in our 4-disk array into
segments, each consisting of 25% of the disk.
In the above diagram, we have divided the contents of our array
into 4 segments labeled A to D across all disks. However, unlike
what we did in our previous illustration, the parity information
(shown in yellow) has been staggered
across each disk in a different, non-overlapping area.
Now, what happens if one disk suddenly fails? Let's assume disk #3
For segments A, B and D, we will simply use the parity information
which is still intact on the other disks to reconstruct the missing
information in the usual manner. As for segment B, we may have
lost the parity information but all the data blocks (B1, B2 and B4)
are still intact, so we don't really need it.
As you can see, in all cases, we can either reconstruct the missing
data portions or we haven't lost them.
When the faulty drive is replaced (or if a hot spare kicks in), the
RAID controller will reconstruct the missing information from
segments A, C and D, and will rebuild the parity information for
segment B. Within minutes, the array will be whole again.
In brief, RAID 5 gives us the best of all worlds for most
In this tutorial, we have covered the operating concepts of
Redundant Arrays of Independent Disks, and have determined that
while several configurations are possible, only a few are actually
used on most production systems.
A number of non-standard configurations are also available for
specialized use, such as RAID 6 which is similar to
RAID 5 but can survive the failure of
any two disk drives in the array instead of just 1. This is
achieved at the cost of additional parity blocks and extra
processing overhead. Since
the odds of two disk drives failing at the same time in an
array are very slim, the additional performance hit is not usually
justifyable, so this RAID level is rarely used.
There are also hybrid configurations, such as
RAID 0+1 which is basically two striped arrays mirrored together.
However, in most real-life scenarios, the types of disk arrays
you are most likely to encounter on production
systems are disk mirroring and RAID 5.
Did you find an error on this page or
do you have a comment?