Raid Overview

 
 

"A Global Value-Added Distributor of new, used and refurbished computer periphery by a family-owned and operated company since 1979"
 Site Map

Product Categories

Battery BackUPS
CD/DVD Duplicators
Compaq
FibreChannel/HBA's
Hard Drives
Hard Drive Duplicators
HP
ATTO
Iomega
Laptops
Networking Solutions
Media
Optical Drives
Processor/Video Upgrades
Printers
SCSI Adaptors
SUN
Tape Drives
UMAX Scanners
USB/FIREWIRE Solutions

c2it makes it easy to send and receive money online with your friends and family. Click here to see how:



PayPal lets any business or consumer with an email address to securely, conveniently, and cost-effectively send and receive payments online.

Raid Overview  

ATTO Storage & 9 to 5 Computer distributors of data storage drives,
tape libraries, storage media and archival solutions.
 
 
RAID Overview:
Identifying What RAID Levels
Best Meet Customer Needs
Diamond Series RAID Storage Array

 

I. Introduction
RAID is an acronym for “Redundant Array of Independent (or Inexpensive) Disks”. It refers to a set of methods and algorithms for combining multiple disk drives as a group in which the attributes of the multiple drives are better than the individual disk drives. RAID can be used to improve data integrity (risk of losing data due to a defective or failing disk drive), cost, or performance. The different RAID implementations that are available today offer different tradeoffs between these three factors.
 

II. Origins of RAID
The concept of RAID was first defined in 1988, when a group of computer scientists at the University of
California Berkeley, (David Patterson, Garth Gibson, and Randy Katz) published a paper entitled “A
Case for Redundant Arrays of Inexpensive Disks (RAID).”
The group observed that computer CPU speed and memory size was growing exponentially, while I/O performance was increasing at a much slower rate. Unless I/O performance could be significantly improved, computer systems would not be able to take full advantage of the rapidly increasing CPU and memory performance.
At the time, hard drive manufacturers addressed this issue by designing and building Single Large Expensive Disks (SLED). While storage capacities of these disk drives were sufficient for the times, I/O performance was still not keeping up as the inherent mechanical limitations of the hard drives were significantly slower when compared to electronic circuitry.
To overcome these limitations, the UC Berkley scientists proposed that instead of storing all data on one disk drive (with only one spindle), why not combine several small inexpensive disks (with many spindles) and stripe the data (split the data across multiple drives), such that reads or writes could be done in parallel. To simplify the I/O management, a dedicated controller would be used to facilitate the striping and present these multiple drives to the host computer as one large logical drive. They estimated the performance improvements would be an order of magnitude greater than using SLEDs.
The problem with this approach was that the small inexpensive PC disk drives of the time were less reliable than the SLED’s. An artifact of striping data over multiple drives is that if one drive fails, all data on the other drives is rendered unusable. It would be analogous to deleting every
3rd or 4th sentence out of a book, then not knowing what sequence the sentences were written in. To compound this problem, by combining several drives together, the probability of one drive failing increases dramatically.
To overcome this pitfall, the scientists proposed adding extra drives to the RAID group to store redundant information. The thought was; if one drive failed, another drive within the group would contain the missing information, which could then be used to regenerate the lost information. Since all the information was still available, the end user would never be impacted with down time and the rebuild could be done in the background. If users requested information that had not already been rebuilt, the data could be reconstructed on the fly and the end user still would not know about it.
 

III. Original RAID Levels
The group outlined six RAID architectures (levels) ranging from “Level 0 RAID” to “Level 5 RAID”. These levels provided alternative ways of achieving storage fault tolerance, increased I/O performance and true scalability.
They used three main building blocks in their architectures:
 

1. Data Striping - Data from the host computer is broken up into smaller chunks and distributed to multiple drives within a RAID array. Each drives storage space is partitioned into stripes. The stripes are interleaved such that the logical storage unit is made up of alternating stripes from each drive. Major benefits are I/O performance and the ability to create large logical volumes. Used in RAID 0.
2. Mirroring. Data from the host computer is duplicated on a block-to-block basis across two disks. If one disk drive fails, the data remains available on the other disk. Used in RAID levels 1 and 1+ 0.
3. Parity. Parity. Data from the host computer is written to multiple drives. One or more drives are assigned to store parity information. In the event of a disk failure, parity information is combined with the remaining data to regenerate the missing information. Used in RAID levels 3, 4 and 5.
RAID 0 plus the five original RAID levels developed by the Berkley scientists (along with the RAID group’s performance, reliability and cost assumptions) are listed as follows:
RAID Level 0
• Striped Disks.
o Performance: Very Good
o Reliability: Poor— Less than a single disk drive (Non-Redundant)
o Cost:: Low
RAID Level 1
• Mirrored Disks.
o Performance: Slightly better than a single drive
o Reliability: Excellent.
o Cost:: High (must purchase 2X disks).
RAID Level 2
• Uses bit interleaving and ECC. (This feature is built in to most modern Disk drives now)
o Performance: Same as RAID level 1 for large I/Os.
On small I/Os it is very bad; have to read all disks; no parallelism.
o Reliability: Good
o Cost:: Cost is better than mirroring with 20% to 40% cost overhead
RAID Level 3
• Uses byte interleaving with parity instead of ECC. Parity data is stored on a dedicated drive.
o Performance: Same as RAID level 0 for reads, Writes are slightly slower.
o Reliability: Good
o Cost:: Cost is 1 additional disk per RAID group
RAID Level 4
• Same as RAID Level 3, but use sector interleaving instead of bit interleaving.
o Performance: Same as RAID level 0 for reads, Writes are slightly slower.
o Reliability: Good
o Cost:: Cost is 1 additional disk per RAID group
RAID: Level 5
• Same as RAID Level 4, but distributes stripe parity across all disks.
o Performance: Writes are slightly faster than RAID 3 and 4, but reads tend to be
considerably slower.
o Reliability: Good
o Cost:: Cost is 1 additional disk per RAID group
 

IV. RAID Today
The most widely used RAID levels today in 2002 are:
RAID 0:
Stripping
RAID 1: Mirroring
RAID 10 Mirroring and Striping
RAID (0+1) Striping and Mirroring
RAID 3: Striping with Dedicated Parity Disk
RAID 5: Stripping with Distributed Parity
As noted eailier, the different RAID levels offer a variety of performance, data availabilfty, and data integrity depending on the specific I/O environment, however ft is important to remember:
• RAID levels are not progressive - In other words, increasing the RAID level from 0 to 1 to 2 to 3 etc. does not give progressively better data integrity, performance, or cost. Each RAID level is independent and the numbering is arbitrary (use of the term RAID level creates some confusion).
• Not all RAID levels are redundant- RAID 0 provides no data redundancy, in fact, it is more prone to data loss than individual disk drives, because if any drive fails in a RAID 0 group, all data is lost.
• There are no standards for RAID - Each vendor has its own implementations, and may use different
terminology. Some vendors have invented their own RAID terminology (e.g., EMC’s RAID-S and Storage
Computer’s RAID 7). Vendors who claim to implement RAID 3 are actually implementing a modified RAID
3. Combinations such as 10, 0+1, and 53 are all vendor defined. Storage users must be closely examine
RAID implementations.
• RAID can be implemented in various places in the computer system - The storage devices, the Host Bus Adapter (HBA) and the host operating system (e.g., Windows 2000) can all implement RAID. It is possible to use a combination of these, for example, RAID 0 (striping) in the storage array combined with RAID 1 (mirroring) in the operating system. Each location has benefits and shortcomings, which need to be, understood by computer system architects.
• Physical vs. Logical drive numbering- Physical numbering refers to the physical components in the storage array. Logical (or virtual) numbering refers to the “disks” or “volumes” that the host operating system “sees” in the storage device. These two can be very confusing to new storage users.
• Logical disks do not always map I-to-I with physical disks - In RAID, several physical disk drives (or portions of several physical drives) can be grouped into a logical disk or Logical UNit (LUIV). Each LUN can be broken into logical blocks of 512 bytes each, numbered 0 through “n” (the Logical Block Number or LBIV). For example, a 100GB LUN has approx. 200,000,000 logical blocks.
• Logical volumes are very similar to logical drives - A logical volume is composed of one or
several logical drives, the member logical drives can be the same RAID level or different RAID levels.
The logical volume can be divided into partitions. During operation, the host sees a non-partitioned logical volume or a partition of a partitioned logical volume as one single physical drive.
 

V. RAID Summary
RAID can be a powerful tool in a storage environment. Using a RAID storage subsystem has the following advantages:
• Provides fault-tolerance by mirroring or parity operation.
• Increases disk access speed by breaking data into several blocks when reading/writing to several drives in parallel.
• Simplifies management by weaving multiple drives together to form a large volume or groups of volumes.
Today, the major RAID levels available offer the following characteristics:
A. RAID 0: Striped Disk Array
Raid 0 is not a fault tolerant RAID solution, if one drive fails, all data within the entire array is lost. It is used where raw speed is the only (or major) objective. It provides the highest storage efficiency of all array types.
Pro’s
• Improved I/O performance
• Most capacity-efficiency RAID level
• Ability to create large logical volumes.
Con’s
• RAID 0 does not utilize disk space for redundancy.
• If one disk fails, all data within the stripe set is lost.
Configuration
• RAID 0 array’s are made by grouping two or more physical disks together to create a virtual disk and making this virtual disk appear as one physical disk to the host. Each physical drives storage space is partitioned into stripes. The stripes are then interleaved so that the virtual disk is made up of alternating strips from each drive.
• To increase performance, RAID 0 writes block level data across all available stripes in the RAID 0 group, enabling parallel disk I/O, which optimizes I/O performance.
• Ideally, the size of the stripe is large enough to fit one record. The record broken into smaller sizes and evenly distributed across all drives in the stripe group.
Uses of RAID 0: RAID 0 should be used with applications that require the highest level of performance and use non-critical or temporary data, such as:
• Full motion video editing applications
• Prepress editing applications
• Scratch files for CAD
• Any application where the original content is backed up and can be easily restored. The time saved in doing normal data processing work with RAID 0 more than makes up for the time lost in infrequent disk crash events.
 

B. RAID 1: Mirrored Disk Array
RAID 1 provides complete protection and is used in applications containing mission critical data. It uses paired disks, where one physical disk is partnered with a second physical disk. Each physical disk contains the same exact data to form a single virtual drive.
Complete data protection is achieved by simultaneously writing two exact, block level copies, of data to each disk in a mirrored pair. There is no striping. Read performance is improved since either disk can be read at the same time. Write performance is the same as for single disk storage. RAID-i provides the best performance and the best fault-tolerance in a multi-user system.
Wfth RAID 1, the host will see what it believes to be a single physical disk of a specific size. (The host does not know or care about the mirrored pair) The RAID controller manages where the data is written and read. This allows one disk to fail without the host ever knowing, providing time for service personnel to replace the failed drive and initiate a rebuild.
Pro’s:
• Highest level of protection
- Mirroring provides 100% duplication of data.
• Read performance is faster than a single disk; (if the array controller is capable of performing simultaneous reads from both devices of a mirrored pair)
• Delivers the best performance of any redundant array type during a rebuild.
o No re-construction of data is needed. If a disk fails, copying on a block by block basis to a new disk is all that is required.
o No performance hit when a disk fails; storage appears to function normally to outside world.
• The only choice for fault tolerance, if only two drives are used.
Con’s
• Raid 1 writes the information twice, because of this there is a minor performance penalty when compared to writing to a single disk.
• I/O performance in a mixed read-write environment is essentially no better than the performance of a single disk storage system.
• Requires two disks for 100% redundancy; doubling the cost.
Uses of RAID 1:
RAID 1 provides the most complete protection, however, it also requires duplication of physical disks. In the past, this RAID level was used exclusively in smaller mission critical networks to keep costs down. As the cost of storage arrays decline, many system architects are reconsidering the use of RAID 1 in larger applications. Typically, these applications involve mostly read-only operations or light read-write operations. An example of a typical RAID 1 implementation is a data entry network. It is recommended for applications where:
• Data availability is very important
• Speed of read access is very important
• Read activity is heavy
• Applications needing logging or record keeping
C. RAID 10: Mirroring and Striping
RAID 10 consists of multiple sets of mirrored drives. These mirrored drives are then striped together to create the final virtual drive. The result is an extremely scalable mirror array, capable of performing reads and writes significantly faster (since the disk operations are spread over more drive heads).
Pro’s:
• Very high reliability
o Because there are multiple mirror sets, this configuration can actually handle multiple disk failures and still survive (*with one exception).
• Provides highest performance with data protection
• By striping multiple mirror sets, RAID 10 can create larger virtual drives. The host computer will see what it believes to be a single physical disk of a specific size.
• Can be tuned for either a request-rate intensive or transfer-rate intensive environment
*Disk failures occurring within the same mirror set are the exception which is extremely rare.
Con’s:
• Like Raid 1, RAID 10 writes the information twice, because of this there is a minor performance penalty when compared to writing to a single disk.
• I/O performance in a mixed read-write environment is essentially no better than the performance of a single disk storage system.
• Requires an additional disk to make up each mirror set.
Uses of RAID 10:
Applications where high performance and reliability are paramount are ideal for RAID 10. Examples would be on-line transaction processing environment and financial transaction processing environment. It is recommended for applications where:
• Data availability is critically important
• Overall performance is very important
0. RAID (0+1): Striping and Mirroring
Not to be confused with RAID 10 (they are very different). Raid 0+1 flips the order of RAID 10. Drives are first striped, then these drives are mirrored. Typically, two or more disks are striped to create one segment and an equal number of drives are striped to form an additional segment. These two striped segments are then mirrored to create the final virtual drive.
Pro’s
• High I/O performance
• Ability to create large logical volumes
Con’s
• Reliability is less than RAID 1 and 10. If one disk fails you essentially, have a RAID 0 configuration. Due to the multiple disks that make up the RAID 0 segment, the probability of a disk failure is greater.
• Requires duplicate drives. Capacity of physical drives is half.
Uses of RAID 0+1:
Applications that require high performance, but are not overly concerned with achieving maximum reliability.
E. RAID 3: Striping with Dedicated Parity Disk
RAID 3 is a fault tolerant version of RAID 1 (Striping). Fault tolerance is achieved by adding an extra disk to the array and dedicating it to storing parity information. Parity information is generated and written during write operations and checked on reads. It requires a minimum of three drives and provides data protection.
 

In the event of a disk failure, data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an I/O operation addresses all drives at the same time, RAID-3 cannot overlap I/O. Forthis reason, RAID-3 is best for single-usersystems with long record applications
Pro’s
• Good data protection
• Good write performance
• Good read performance
• The amount of useable space is the number of physical drives in the array minus 1.
Con’s
• A single disk failure reduces the array to RAID 0
• Performance is impacted when degraded
• Poor performance with small data transfers.
• Limited to single user environments.
Uses of RAID 3:
This version of RAID is best suited for:
• Single user, single tasking environments with large data transfers.
• Heavy write applications.
• Large volumes of data are stored
F. R1JD 5: Stnping and Panty
Raid 5 is similar to RAID 3 but the parity is not stored on one dedicated drive, instead parity information is interspersed across the drive array. RAID 5 requires a minimum of 3 drives. One drive can fail without affecting the availability of data. In the event of a failure, the controller regenerates the lost data of the failed drive from the other surviving drives.
By distributing parity across the arrays member disks, RAID Level 5 reduces (but does not eliminate) the write bottleneck. The result is asymmetrical performance, with reads substantially outperforming writes. To reduce or eliminate this intrinsic asymmetry, RAID level 5 is often augmented with techniques such as caching and parallel multiprocessors.
Pro’s:
• Best suited for heavy read applications.
• The amount of useable space is the number of physical drives in the virtual drive minus 1. Con’s
• A single disk failure reduces the array to RAID 0
• Performance is slower than RAID 1 when rebuilding
• Write performance is slower than read (write penalty)
• Block transfer rate is equal to single disk rate
Uses of RAID 5:
RAID 5 is a general-purpose RAID storage solution. It is recommended for applications where:
• Data availability is important
• Large volumes of data are stored
• Multi tasking applications using I/O transfers of different sizes
• Good read and moderate write performance is important
 

G. Comparing RAID configurations
Each of the described RAID levels offers different characteristics in terms of cost and performance. The following table compares the cost, data availability, and I/O performance of the commonly known RAID levels. I/O performance is shown both in terms of large I/O requests, or relative ability to move data, and random I/O request rate, or relative ability to satisfy I/O requests. Since each RAID level has inherently different performance characteristics relative to these two metrics

1 The data transfer capacity and I/ 0 request rate columns reflect only I/O performance inherent to the RAID model, and do not include the effect of other features, such as caching.
 

H. RAID Set-up Considerations
Setting up a fault tolerant RAID array involves trading off economy for MTDL (Mean time to Data Loss). MTDL is the probable time to failure for any component that makes data inaccessible.
If your data is backed up and performance and cost are your primary concern, then RAID 0 is the logical choice.
If you determine that you need data protection, then you have two choices to protect your data, Parity or Mirrored arrays.
The “cost” of data protection for a Parity RAID array is the equivalent of one disk per RAID group. At first glance, it would seem to be a no-brainer to select Parity Raid over Mirrored RAID

• When a drive fails in a parity RAID array (RAID 3 or 5), the array becomes a RAID 0-stripe group. If a second drive failure occurs before the array completes a rebuild, then you loose all data within the array.
If you are comfortable with this, then Parity RAID is probably the most economical solution for you. It would also seem to be very cost effective to build parity arrays with several disks, however, consider the following when configuring your parity array:
• More disks in a parity RAID array affects write performance adversely.
• More disks in any RAID array increases the probability of a drive failure,
• Modern disk drives can be as large as 160 GB. By creating a parity array with several disks, the capacity of the array skyrockets, dramatically increasing resynchronization time after a disk failure. This has a major impact on array performance and forces, the array to run “unprotected” for an extended period.
If you determine that the performance and/or protection limitations of parity RAID are too great, then a Mirrored (RAID 1) array or a dual level array (such as RAID 10) should be your choice.
 

VI. Close
Selecting the proper RAID level and setup for a disk storage array is the key to properly balancing system costs and performance needs. This primer and its overview are an excellent point to properly starting your RAID storage selection activities. Should you have any further questions the professionals at 9 to 5 Computer are ready to assist you in selecting a RAID system and setup to best meet your needs and goals.

 

 




© Copyright 2003.
*Price and availabilities subject to change without any notice. Not responsible for typographical errors.


 

Battery BackUPS | CD/Hard Disk Duplicators  | Compaq | FibreChannel/HBA's | Hard Drives |
| HP | ATTO | Iomega | Networking Solutions | Media |
Optical Drives | Processor/Video Upgrades | Printers | SCSI Adaptors | SUN | Tape Drives |
| Site Map | UMAX Scanners |
USB/FIREWIRE Solutions