I.T. Consulting
Tutorials
Sysadmin
How RAID Works What's a RAID? Hardware vs Software RAID Striping and Mirroring RAID 2 and 3 RAID 4 and 5 Conclusion
Software RAID on Linux RAID: Quick Recap Software Tools Creating & Using an Array Monitoring an Array Removing & Re-Assembling an Array The mdadm.conf File Deleting an Array Summary & Cheat-Sheet
Network Security
Squid Proxy Server Basic Configuration Controlling Traffic Blocking Access Monitoring Traffic
SSH: Secure Shell Overview Using SSH Encryption Authentication Keys Configuring SSH Advanced Tricks
Implementing HTTPS What Is HTTPS? Setting Up The Server
Linux Skills
The ed Line Editor First Things First Navigating Entering Text Changing Text Line Maneuvers Text Searches Using ed in Real Life Summary
Regular Expressions Text Patterns Extended Expressions
The vi Editor Introduction Operating Modes Navigation Editing Summary
Intermediate vi Power Editing Cut-and-Paste Modifying Text Searches Tips & Tricks The vi Prompt Indenting
Miscellaneous
Creating an eBook Introduction Create an ePub Create a MOBI Create a PDF

How RAID Works

Ever wonder how RAID works? This tutorial will show you. RAID, or "Redundant Array of Independent Disks," is not just for department-size servers anymore. In fact, I believe that every office and home computer should use mirrored disks, which is a RAID configuration.

This tutorial explains the differences between striping, disk mirroring and fault-tolerant configurations, and explains how RAID 5 uses parity bits to reconstruct a failed array member (for the terminally curious). The pros and cons of the various RAID levels are examined, along with the differences between hardware and software RAID solutions.

The Original Idea

The year was 1987. The size of a disk drive was measured in megabytes, not gigs or terabytes. An 80-MB drive was considered a bit of a luxury.

A team of computer scientists at the university of California at Berkeley had just released a paper suggesting it might be a good idea to string together a number of average-size disk drives to derive larger capacities that would otherwise require high-end and very expensive equipment.

For instance, since there was no such thing as a 800-MB disk drive, why not stack up ten 80-MB drives in a rack and design a special disk controller that would make the whole thing look as if it were a single device with 10 times the capacity?

That, in fact, wasn't rocket science. The technology was sufficiently advanced to make this happen without too much trouble.

The real problem was the greater probability of hardware failure. Indeed, every disk drive has a certain probability of failure. To use simplistic numbers, let's say that a typical disk drive has a 1% chance of failing within the first year. Now, if you group 10 of these in an array, you now have 10 times the odds that one of them will fail within a year, or a 10% probability of failure.

To make matters worse, when the array eventually fails, you will have lost 10 times the amount of data you would have lost with a single drive.

So, the math didn't look good: a disk array would lead to more frequent data losses, and have greater consequences.

To get around the problem, these very smart scientists (named David Patterson, Garth Gibson and Randy Katz) came up with a clever way to organize data on an array of multiple disks so that it would always be possible to reconstruct the missing data if one of the disk members in the array were to fail unexpectedly. We will see in this tutorial how this works.

When such a failure occurs, the array as a whole would continue to service the applications as if nothing had happened, and the faulty disk would be replaced at a convenient time without any data loss.

With this breakthrough, the theoretical concept of grouping several inexpensive disks in an array to simulate one very large disk was finally practical. These gentlemen went ahead and published their proposed strategies, and coined the term "Redundant Array of Inexpensive Disks," or RAID for short.

Over the years, as hardware prices continued to fall as they always do, the term "Inexpensive" in the original acronym quickly became ill-suited, so the technology was renamed "Redundant Array of Independent Disks," a more appropriate term for this technology.

 

 

Hardware vs Software RAID

Your Options

RAID technology can be implemented either through a software driver in the operating system or using a separate hardware device.

In the latter case, a RAID controller is used to control and manage two or more disk drives. The RAID controller can be a slide-in card that is inserted into the computer, or it can be embedded into a separate drive enclosure that functions as an external disk drive.

Either way, the RAID controller functions independently from the computer system to which it is attached and is configured through the controller's panel or configuration software. As far as the computer is concerned, the controller and its array of multiple disks looks like one ordinary disk drive.

In the case of software-based RAID, a software module (a "driver") is incorporated into the operating system to create a new device node which looks to the rest of the system as an ordinary disk drive but which actually corresponds to an array of two or more physical disks which are usually housed within the computer cabinet. Under this scenario, the RAID is configured and monitored using operating system commands and utilities.

Pros and Cons

There are pros and cons to each option. The nice thing about an external RAID cabinet is that it is OS-independent. All configuration can usually be done through a control panel on the cabinet itself, disk drives can usually be "hot-swapped" in and out even when the array is in use, and the whole thing looks to the target computer as if it were a single large external disk drive.

The target machine can be a Windows box, a Linux or Mac machine, or even a legacy Unix server; it doesn't matter because the computer doesn't need to know anything about the intricacies of the RAID configuration — it simply gets the end result, a very large and very robust disk volume.

The drawbacks to this solution include:

  • Purchase Cost: The cabinet, controller and disk drives are usually a non-trivial expenditure.
  • Potential Point of Failure: Every time you add another hardware component to a system, it's yet another piece that will evenually fail. By contrast, software RAID does not involve any additional hardware, so it can be considered less risky from a stability point of view.
  • Replacement Options: If the computer system is mission-critical, you have to be sure you can quickly repair or replace the RAID hardware if it were to fail unexpectedly. If a component in that cabinet were to become faulty without warning (e.g. blown power supply, overheating chip on the controller...), would you have the problem fixed within an hour? Could you pick up replacement parts at any local store or would you have to have them shipped? Would these parts even be available when you need them, possibly years from now? Should you purchase a second RAID cabinet just to keep as a spare?

Some hardware solutions do not involve an external cabinet, but rather use an internal RAID controller within the computer, such as a slide-in card or even a built-in controller on the mainboard. Typically, the controller will attach to standard internal disk drives or to external USB, eSATA or SCSI disk drives, usually without a cabinet.

These solutions are not as OS-independent as an external cabinet since they usually require some configuration software to be installed in the operating system to manage the RAID controller. In addition, these solutions also present a challenge in case of hardware fault: if the RAID controller were to fail, would you be able to replace it quickly and easily? If it is built into the mainboard, would you be able to find an equivalent replacement mainboard in short order? Will that board still be manufactured in a couple of years?

On the other hand, software-based RAID solutions are less expensive (free in the Linux world), do not introduce a new point of failure, and do not present hardware replacement issues. The only hardware components involved are the disk drives themselves, which are usually commodity items readily available in stores.

The drawback to software-based RAID solutions is that since no external cabinet is involved, you are generally limited to the number of drive bays in the computer case and to the number of disk drives interfaces (IDE, SATA or SCSI ports) available in the system. Generally, this means you are looking at a fairly small RAID configuration such as a pair of mirrored disks. However, if disk mirroring is all you are after, then a software-based RAID solution is definitely the way to go.

 

 

Array Types

Numbered Levels

When Patterson, Gibson and Katz wrote their original RAID specification, they documented 5 different ways of achieving fault tolerance through redundant organization of data on the storage disks. These five levels were dubbed RAID 1 to RAID 5, in progressive order of complexity.

Over time, a few non-standard RAID levels were added to the list, such as RAID level Zero (RAID 0), RAID 6 and RAID 1+0, to name a few. We will cover these briefly, but we will see in this tutorial that only 2 or 3 of all the possible configurations are actually used in real-life commercial products, simply because they offer the greatest benefits.

RAID 0: Striping

RAID 0 is one of the non-standard levels just mentioned. It does not really qualify as a RAID level because it does not feature any redundancy, which is the "R" in RAID.

Still, because it can be implemented through standard RAID controllers or software, this configuration is commonly dubbed as a RAID level nonetheless.

RAID 0 consists of stringing together a set of disk drives of arbitrary capacity to combine their size into a larger, single volume. This technique is called striping and does not involve any data redundancy. If any member of the array fails, the entire array fails.

Graphically, you can think of striping as spreading the contents of a file across a number of disk drives, like this:

The contents of a file can get scattered across any of the disks in the array, potentially with fragments on each disk, like this:

The main benefit of this configuration is that disk access from the array can be much faster than with a single disk since only small portions of a file need to be saved to, or retrieved from, each disk simultaneously. Access is always done in parallel to all devices at the same time, thus increasing disk I/O significantly.

The drawback, of course, is that if any drive in the array should fail for any reason, data integrity will be lost for the entire array since big chunks of data will be missing.

For this reason, striping is best suited to large read-only filesystems of replaceable data, where access speed is more important than data survivability. Suitable applications for disk striping might include static websites (frequent access to read-only information), music libraries, and reference databases of large files such as maps, videos or other images.

RAID 1: Mirroring

The first official RAID level, RAID 1, documents a technique called disk mirroring. Using this technique, two disk drives of equal capacity are used as a twinned pair, and any transaction saved to the array is written to both disks, thus maintaining a pair of mirror images.

Similarly, whenever the system requests data from storage, it can come from one disk or the other, at the controller's choice. If the data occupies more than one disk block, the controller will usually fetch different parts of the file from the two drives in parallel, resulting in much faster throughput.

If one disk crashes, the system keeps humming along as if nothing had happened since all read and write operations continue to take place on the remaining healthy disk.

As far as applications are concerned, however, there is only one disk drive in the machine – a very fast one that never crashes.

Since drive failures generally go unnoticed in a mirrored environment, RAID controllers or drivers usually feature monitoring systems that will immediately alert the system administrator whenever a member of the array has failed or is starting to act erratically. On the more sophisticated systems, a "hot spare" will be included in the array and will automatically kick in to replace the faulty disk when needed. The faulty drive can usually be hot-swapped and replaced withough interrupting the operation of the array.

Because disk mirroring greatly reduces the risk of data loss but does not provide any extra disk space, it is best suited for small and medium-size systems where data loss would directly impact revenues or productivity. Basically, this covers virtually all home and office computers.

If a system is used to create any type of document throughout the day, be it email, correspondence, software, music or images, it should use some form of redundant storage. Regular system backups are not enough: if a disk crashes in mid-day, all the work done since the last backup (usually the night before) has been lost. However, with disk mirroring, the system continues to work normally during a single-disk crash and the faulty drive can be replaced later without disrupting work-flow.

In addition to safeguarding the data, disk mirroring also saves users from the downtime associated with having to reinstall the operating system and all application software on a computer after a disk crash. With disk mirroring, the faulty disk can be replaced when convenient and the RAID software simply populates the new disk with a mirror image of the data on the surviving disk, usually while the user continues to work normally.

Disk mirroring should be given serious consideration for any mission-critical system. The cost of a second disk drive is negligible compared to the cost and disruption of reinstalling and reconfiguring all software on a system, not to mention the consequences of losing all data that has been entered since the last backup.

 

 

RAID 2: An Impractical Solution

While disk mirroring (RAID 1) gives us a simple, straightforward way to implement data redundancy, it does nothing to increase the size of the array beyond the size of a single member.

RAID levels 2 and up address this issue by stringing together a set of 3 or more disk drives, much like in a RAID 0 configuration, while also providing fault tolerance in the event that any single member of the array should fail.

RAID 2 achieves data redundancy by mimicking the technology used in error-correcting memory (RAM). This strategy is actually more complex than necessary since an array of disk drives offers a more beneficial environment than an array of memory chips when it comes to implementing redundancy.

The developers of RAID knew this, but introduced this level as a stepping stone towards RAID level 3, which is a more efficient implementation. No commercial products use RAID 2 at all, so we will skip ahead to a discussion of RAID 3 which will feature concepts that are critical to your understanding of fault-tolerant RAID configurations in general.

Once we have covered these important concepts and they are clear in your mind, we will quickly take one step back to explain why RAID 2 was abandoned. It will be easier to explain if we do things in that order.

RAID 3: Byte-Level Parity

Although RAID 3 is not used widely either for reasons we will uncover shortly, it is important to understand the concepts used in this configuration in order to understand how subsequent RAID levels work.

The real goal of this chapter is to explain how RAID 5 works, since that is the configuration you will almost always encounter in real life. However, to get to that point, it is helpful to start with the simpler RAID 3 configuration.

The critical concept behind all fault-tolerant configurations (RAID 2 and up) is the concept of parity.

To understand parity, let's keep in mind that data on a disk drive is really nothing more than a long sequence of bits — ones and zeros. Graphically, we could represent a 4-disk array like four silos of bits, something like this:

If you were to sample the very first bit of each disk drive, you would get a "bit stripe," as shown here:

The same could be done for each "row" of bits across all members of the array, resulting in as many bit stripes as there are bits on each disk:

Now comes the issue of parity. In this context, the term "parity" simply means whether the sum of all the bits in a stripe is even or odd. If the sum of all bits in a stripe is even, we say the parity is zero; if the sum is odd, the parity is one.

For example, let's look at stripe #1 in the above illustration. If you add up all the ones and zeros in that stripe, you end up with decimal 3, an odd number, so the parity for this stripe is 1. Similarly, the parity for stripe #2 would be zero since the sum of all these bits is even (well, it's zero). The sum of all bits in stripe #3 is decimal 4, an even number, so the parity for this stripe is zero.

Now, here is the clever part of the strategy: If we were to store the parity information on an extra disk drive in our array, we would have a way to reconstruct the contents of ANY member of the array if it were to fail.

To illustrated this, let's add a fifth disk drive to our array and store the parity bit for each stripe on that drive. We get something like this:

Now, if any of our 4 data drives should fail, we can deduce the missing value of each stripe by examining the parity for that stripe.

For instance, let's take out the second drive:

Using the parity information, the RAID controller can figure out what each missing bit from that drive had to be, simply by comparing the parity of the remaining bits from each stripe with the parity that had been saved on the extra disk drive.

This essentially what a RAID 3 configuration is.

Let's do the math together on the first stripe: without disk #2, we now have 1, 0 and 1 on the remaining data disks, totalling decimal 2, an even number. However, the former parity for that stripe was 1, which tells us the total of all the bits in the original (full) stripe was an odd number. The only way this can be possible is if the missing bit was a 1, so the RAID controller can fake the presence of the missing drive by returning a 1 to the computer system when it needs this information from the faulty drive.

Similarly, if we add up the remaining numbers from the second stripe, we get a total of zero. We note that the old parity was zero, which tells us the missing bit from the second stripe had to be a zero to retain an even total.

The RAID controller will continue to do this for every stripe that needs to be reconstructed whenever the computer is requesting data from the array.

Note that this strategy will work fine regardless of which drive is taken out of the array, as long as only one drive is faulty. If two or more drives become faulty, however, there is no way to reconstruct the missing information. For this reason, whenever a fault is detected in any member of the array, that drive should be replaced as soon as possible.

What if the parity drive itself becomes faulty? Well, that's not a problem as far as our applications are concerned since all the data is still intact; none of our data storage drives have failed. However, without a parity drive, we are no longer enjoying any redundancy in the array, so that drive should also be replaced as soon as possible.

When a new drive is in place, the RAID controller will repopulate the new drive with parity information from all the data drives and, when this process is completed, we will again enjoy the peace of mind a RAID configuration is designed to provide.

The Hot Spare

Most commercial RAID systems (either hardware-based or software-based) will provide for the presence of a spare drive, spinning and ready to automatically replace any failing drive in the array.

When the RAID controller detects that a drive is no longer responding (or is responding erratically), it formally takes it out of operation and starts reconstructing its data using the method we just outlined. The data is written to the spare drive which then assumes the role of the faulty unit.

This is done while the array continues to service read and write requests from the computer to which it is attached. The only perceptible difference as far as users and applications are concerned is that response time may be a little slower than usual since the array is performing intensive I/O operations while rebuilding the data on the "hot spare" drive.

Now, Back to RAID 2

Now that we understand how RAID 3 works, we can take a step back and examine why RAID 2 was abandoned.

RAID 2 works on the same concept as RAID 3 in that is uses byte-level parity to reconstruct data from a faulty drive. However, in our study of RAID 3, we have always assumed that we knew which drive was at fault. This made it fairly straightforward to deduce the value of the missing bits using the parity information.

However, what if we didn't know which drive was faulty? What if we simply knew that one bit was missing in each stripe but didn't know which one? This is basically what RAID 2 was trying to address, and it did so by adding more extra disk drives to store additional parity information that would allow it to triangulate on the right bit.

However, since disk controllers can detect read failures, all these extra measures that would be needed in a memory chip are not really required in a disk array; the controller can easily determine which drive is at fault without any fancy calculations. For this reason, RAID 2 is considered obsolete and is never used in commercial products.

 

 

RAID 4: Block-Level Parity

A RAID 4 configuration is very similar to a RAID 3 configuration, except that the controller reads and writes data in multi-byte blocks instead of in single bytes.

For instance, if you were to save a 12-byte piece of information such as the phrase "Hello world!" to a 5-disk RAID 3 array, the first byte would be saved to drive 1, the second byte to drive 2, and so on until all data disks have been used, and then the next byte would be saved to disk #1 again, in sequence, like this:

Note that each byte really expands to a set of 8 bits which are tracked with corresponding parity bits on the last disk, as seen in the previous section on RAID 3. However, for the sake of clarity, the parity information has been omitted from this illustration since it is not directly relevant to the concept presented here.

Now, on a RAID 4 configuration, the controller will save data in larger blocks on each disk, rather than one byte per disk. For instance, if the RAID controller is configured to save data in 16-byte blocks on each disk, the entire phrase "Hello world!" would have been saved entirely on disk #1 in our example.

Again, parity is tracked the same way as explained in the section on RAID 3. The only difference between RAID 3 and 4 is in how the RAID controller reads and writes data to the storage disks.

The size of a data block in RAID 4 is arbitrary and can be adjusted as desired to tweak performance for a particular application or environment. While we used a 16-byte block in our example, the norm in a real-world situation would be to use the size of a disk sector (512 bytes) or a multiple of that size.

So, why bother with this modification? What's the benefit?

The benefit is that with this technique, simultaneous reads become possible. For instance, if the application is dealing with relatively small data items such as database records, it will not be uncommon for one such record to be located entirely on a single disk rather than spread across all members of the array. In this case, when the application needs to retrieve 2 or more records, it can often do so in parallel if those records are physically stored on different disks.

Under RAID 3, the application would have to retrieve the first record in its entirely before it can get to the second one since all disks would have to be queried in sequence.

This difference in access method is most important in multi-user environments where response time will be distributed more fairly across all users using RAID 4 than RAID 3.

The Problem With All This

There remains one problem with all the fault-tolerant architectures we have examined so far (RAID 2, 3 and 4): the parity bits are all located on the same disk drive.

As a result, whenever the application saves data to any of the data disks, it must also update the parity drive each time. The great performance benefit of parallel access to multiple data disks is defeated by the fact that we still must update the single parity drive with each write operation. That drive is therefore the bottleneck in the array, effectively eliminating the performance advantage of parallel writes to the data disks.

For this reason, RAID 4 is not used in commecial products. In fact, none of the fault-tolerant architectures we have seen so far (RAID 2 to 4) are used in real life. They were described in that 1987 technical paper mentioned in the introduction simply as logical steps leading to RAID 5, which is the configuration you will usually find in real-life commercial products.

RAID 5: Distributed Parity

RAID 5 is essentially the same as RAID 4, except that the parity information is distributed across the data disks instead of stored on a separate disk.

At first, you would probably think that something like this shouldn't work. After all, if the parity is stored on a data disk and that disk crashes, you have just lost the information you needed to rebuild that disk!

You would be right about that.

However, if we strategically locate the parity information so that it is physically staggered across the array members, then it is possible to recover from any disk failure.

To illustrate, let's break down each disk in our 4-disk array into segments, each consisting of 25% of the disk.

In the above diagram, we have divided the contents of our array into 4 segments labeled A to D across all disks. However, unlike what we did in our previous illustration, the parity information (shown in yellow) has been staggered across each disk in a different, non-overlapping area.

Now, what happens if one disk suddenly fails? Let's assume disk #3 crashes unexpectedly:

For segments A, B and D, we will simply use the parity information which is still intact on the other disks to reconstruct the missing information in the usual manner. As for segment B, we may have lost the parity information but all the data blocks (B1, B2 and B4) are still intact, so we don't really need it. As you can see, in all cases, we can either reconstruct the missing data portions or we haven't lost them.

When the faulty drive is replaced (or if a hot spare kicks in), the RAID controller will reconstruct the missing information from segments A, C and D, and will rebuild the parity information for segment B. Within minutes, the array will be whole again.

Bottom Line for RAID 5

In brief, RAID 5 gives us the best of all worlds for most applications:

  • Parity information is distributed across all disks in the array rather than stored on a single disk, so there is no bottleneck.
  • The fact that multiple disks can be accessed in parallel gives us better performance than if we were using a single disk drive of the same capacity as the array.
  • Using an array instead of a single disk makes it possible to achieve cumulative capacities not yet technologically available from single devices.
  • And of course, data redundancy means that our users and applications will continue to work uninterrupted even in case of hardware failure on any one of the disks in the array.

 

 

Conclusion

In this tutorial, we have covered the operating concepts of Redundant Arrays of Independent Disks, and have determined that while several configurations are possible, only a few are actually used on most production systems. They are:

  • RAID 0, also known as striping, which combines the sizes of multiple disks to form a single, large volume. Since no data redundancy is provided with this configuration, it is best used for read-only data sets where the highest level of performance is desired and where data survivability is not an issue.
  • RAID 1, also known as disk mirroring, is best used on smaller system where large disk capacities are not required but where I/O throughput and data survivability are most critical.
  • RAID 5, a fault-tolerant striping array, is best suited where large amounts of disk space are required but data integrity is also critical.

A number of non-standard configurations are also available for specialized use, such as RAID 6 which is similar to RAID 5 but can survive the failure of any two disk drives in the array instead of just 1. This is achieved at the cost of additional parity blocks and extra processing overhead. Since the odds of two disk drives failing at the same time in an array are very slim, the additional performance hit is not usually justifyable, so this RAID level is rarely used.

There are also hybrid configurations, such as RAID 0+1 which is basically two striped arrays mirrored together.

However, in most real-life scenarios, the types of disk arrays you are most likely to encounter on production systems are disk mirroring and RAID 5.

 

 


Did you find an error on this page or do you have a comment?

Services
Sponsors