Hint: R does not stand for Re-build or Re-initialize, or even Re-Format or Re-arrange.
All joking aside, most people in the IT field understand what a RAID array is and how it can help protect your data if used correctly, but many people do not understand what to do in the event of a system or drive failure.
First of all, let’s start at the beginning for the non-techie folk out there who just bought a Dell XPS system with a 2TB RAID 0 array simply because they heard it was fast and redundant. Most of these people might not have any idea they have an array, they just got it because it was a lot of storage. Anyways, RAID stands for Redundant Array of Independent Disks. It is a hardware or software based system that uses 2 or more hard disk drives that work together to achieve one of three things:
- Greater performance
- Greater capacity
There are several types of RAID arrays as well. They have numbered levels such as RAID Level 0, 1 or 5, which are the most common. You can also combine levels to make RAID 0+1, RAID 10 or RAID 50, etc…
The first type, RAID Level 0, involves simple striping where 2 or more disks are spanned across each other, increasing performance and capacity but more importantly losing all redundancy. In most cases data is split up into 64Kb chunks (stripe size) across the disks. Please Note: RAID 0 should not be called a RAID as the R in RAID stands for Redundant which RAID 0 is not. If one drive in a RAID 0 array fails the entire array will be inaccessible and not recoverable unless the failed drive is repaired and joined back up with the other drives in the array. Attempting to get any data off of a single drive in a RAID 0 array is pointless as you will only get file fragments. Typical proper applications for RAID 0 are usually as scratch disks for video/audio editing where speed and very high capacity is needed. Most video editing can be done on the RAID and then the final product (video) can be moved to a more secure location. Dell and other PC manufacturers are installing RAID 0 into many of their very high-end gaming systems for performance increases. However, most users are either unaware that they have a RAID system or are unaware that it is non-redundant. LaCie and other hard drive vendors are making simple RAID 0 external USB hard drive devices that do not properly explain that you are running a RAID system as well. You may argue that a single drive is not redundant either, and while this is true, the more drives you have the more likely you might have one that becomes damaged.
Another Type of RAID array is RAID Level 1. RAID 1 is also known as mirroring. This is where you use 2 identical disks and mirror them either by specialized software or hardware. However, you lose the capacity of one of the disks because the exact same information is on both disks (Hence the reason it is called a mirror.) This is more redundant because if one drive physically fails your data is still safe. On the other hand if you get logical corruption on one drive (i.e. – Partition table corruption) the other drive will be affected as well.
RAID Level 5 is the final array I will discuss in more detail here. RAID 5 uses a minimum of 3 drives and the data is stored in a way (using parity information) so that if one drive fails the data is still safe. In most cases all one must do in order to achieve redundancy again after a single drive fails in a RAID 5 array is to replace the drive in the system. The system will then rebuild the information onto the new drive from the parity information on the other drives. This is the most commonly used array in servers that do not need incredibly fast disk speed such as web servers. The only downside is that RAID 5 requires a much more significant computational overhead so you lose a small amount of speed.
The problem with any RAID array lies not in the system itself but usually how they are implemented. Most users implement RAID arrays because they think they are redundant and therefore they do not need to worry about backups. The problems begin to happen when systems are not in place to warn the user of drive failures. In other words, if a drive fails in a RAID 5 system and the system gives no warning then the only noticeable difference in the operation of the system would be slight slowdown. Then, what happens if another drive fails? You got it. Total system failure and much smaller chances of recovery.
The point of this article is not to get technical but to show that you need to backup your redundant volumes just like you would any other system, and if you don’t, when a RAID fails, call a professional. Tech people, by nature, like to solve problems on their own even if they don’t know what they are dealing with. They browse the internet for solutions that could easily damage the system further, or even make it completely unrecoverable. Others will call tech support for the company that built their system. Most of the time, these companies are not concerned about your data. They are only concerned about restoring your system to a working order. So, if you are talking to a tech about fixing your RAID 5 server and he tells you to re-initialize or pull out a drive then re-build, etc… MAKE SURE YOU ASK WHAT WILL HAPPEN TO YOUR DATA FIRST. We see drives everyday where customers say that the computer manufacturer told me to format my computer and reinstall Windows. Well, great!! The computer works again, but OHH NOO where did my data go!!
More problems can occur when individuals try repairs on their own and they make mistakes such as getting the drive order confused. We have even seen servers in for recovery that have been completely vandalized to the point where you can’t even tell what type of controller it had. In cases like this it is necessary to manually destripe the array using low level hex editors to determine drive order, data offsets, stripe sizes, and parity rotation that are dependant on which hardware/software was used. No one without deep knowledge of how the data is arranged in multiple RAID configurations can recover a bunch of disks that are just handed to you with no idea of order or what controller was used.
In other words, if you have a RAID failure and you are unsure of how the system works or what any of the options in the RAID controllers BIOS will do, follow these steps:
- Do not do anything until you fully understand the system and what you are dealing with (if you think the system is a RAID 5 but you are unsure, this means you!)
- Do not tinker
- Do not re-initialize the RAID in the controller BIOS. This will cause total data loss.
- Do not force problematic drives online.
- If your company will go out of business or you will lose thousands of hours of work, do not hesitate – Call a professional RAID data recovery company for help.
RAID technology, while not new, is not something that an inexperienced person should tamper with regardless of how good you are with computers. Especially if your data is priceless. Really, the moral to all stories involving data loss, RAID or not RAID, is to simply backup your data. And then backup those backups….rinse….repeat…