Steve Borba

My notes, I hope they help you, feel free to comment/add to them

Manual RAID Rebuild

A while ago (2009), we acquired a company that had an Exchange 5.5 server running on a Dell PowerEdge 2200 (I found a copyright date in a manual for it as 1997).  Well a short while after the acquisition (we had not migrated off the server yet), the UPS it was on went bad a started to power cycle repeatedly the server and when we finally got someone to the location to see what was happening, the server’s raid card was dead.  I looked around for spare parts and tried putting the drives in a PE2850 I had laying around, but none of the raid cards imported the RAID setup from the meta data, so I had to do some digging and figure it out.  I had all my notes in a text file, so here is a copy/paste with a little formatting.

——————————————————————————

First, I created a backup of each disk using:

dd if=/dev/<DEVICE> conv-sync,noerror bs=128K of=disk.img

Where <DEVICE> is the where linux put the drive, sdc in this case and <N> is the slot I pulled the drive from (0,2,3 and 4 in this case)

I then made an md5 hash and zipped up a copy (in case I inadvertently modified data on the disk)

md5 disk.img > disk.img.md5
cat disk.img | gzip > disk.img.gz

At this point I figured out that disk 4 was likely not part of the array. The zip of disk 4 was 0.5GB and disk0, 2, & 3 were 24GB.

We need to get the stripe size, raid type.

I decided to look for file, besides I needed one to figure out the raid type/pattern anyhow (I think a disk was moved sence nothing was in slot 1):

!/bin/bash
# look through the first 1GB, 1KB by 1KB, of each drive for txt files containing ip addresses (iis log files or similar, could have look for dates)
# This looks through all the disk images, but you only need to look in one
for i in $(seq 1 1 1048576)
do
  dd if=disk0.img bs=1K count=1 skip=$i | grep -E '(^|[[:space:]])[[:digit:]]{1,3}(\.[[:digit:]]{1,3}){3}([[:space:]]|$)' > /dev/null && echo $i >> list0.txt
  dd if=disk2.img bs=1K count=1 skip=$i | grep -E '(^|[[:space:]])[[:digit:]]{1,3}(\.[[:digit:]]{1,3}){3}([[:space:]]|$)' > /dev/null && echo $i >> list2.txt
  dd if=disk3.img bs=1K count=1 skip=$i | grep -E '(^|[[:space:]])[[:digit:]]{1,3}(\.[[:digit:]]{1,3}){3}([[:space:]]|$)' > /dev/null && echo $i >> list3.txt
  dd if=disk4.img bs=1K count=1 skip=$i | grep -E '(^|[[:space:]])[[:digit:]]{1,3}(\.[[:digit:]]{1,3}){3}([[:space:]]|$)' > /dev/null && echo $i >> list4.txt
done

After that was done, look for large continuous 1K blocks of data; the more the better.

Next, cut those sections out using (Where NNN is where the data started):

dd if=disk0.img bs=1K count=1 skip=NNN >> test.txt

Keep walking through this file till you find where the file “skips”, this will help you find what the stripe size is. I found the stripe size when I found a file starting at 31232 1K chunks in and went for 64K before it disappeared from that disk and showed up on disk3.
So this Raid set has 128K stripes.

I found that I needed at least 1 part for each disk of continuous data to get a good idea of the way the raid pattern goes (more is better though).

I found my break through at 268032 1K chunks in.
I needed to pull each stripe out so I can figure out how to piece them together and figure out what the raid type and pattern is.  I used the following script to extract the chunks:

!/bin/bash
for i in $(seq 4187 1 4189)
do
  dd if=disk0.img bs=64K count=1 skip=$i of=chunk-$i-0.txt
  dd if=disk2.img bs=64K count=1 skip=$i of=chunk-$i-2.txt
  dd if=disk3.img bs=64K count=1 skip=$i of=chunk-$i-3.txt
  dd if=disk4.img bs=64K count=1 skip=$i of=chunk-$i-4.txt
done

(268032 / 64K = 4188 (so I need to skip 4187 to get 4188) and grab 3 stripes (4188, 4189, 4190).)

This gave me the stripes from each disk and I looked through the text files to see where the file started and how it flowed from one disk to another.

Disk 4 did not have any data from these files, so it is confirmed that it isn’t part of the array.

I found that the log file started in chunk 4188 on disk 3, then went to chunk 4188 on disk 2, next chunk was 4189 on disk 0, then chunk 4189 on disk 3, skipping to chunk 4190 on disk 0, and ending in chunk 4190 on disk 2.

That looks like Raid 5 with it skipping around like that.

Raid 5 has two major options to figure out:
Parity N or Parity 0 (Zero)
Data Restart or Continuation

Parity 0 Starts with the Parity chunk where parity N ends with the Parity chunk

Data restart has the data stay left to right where continuation keeps the parity at the end/beginning of the Data

N with restart: N with continuation: 0 with restart: 0 with continuation:

D0D1D2D3D4D0D1D2D3D4D0D1D2D3D4D0D1D2D3D4
1234P1234PP1234P1234
567P8678P55P6788P567
910P11121112P910910P11121112P910
13P14151616P131415131415P16141516P13
P17181920P1718192017181920P17181920P

For chunk 4188-4190 and a three disk raid 5:

ChunkDiskADiskBDiskCDiskADiskBDiskCDiskADiskBDiskCDiskADiskBDiskC
4188P83758376P8375837683758376P83758376P
418983778378P83778378PP83778378P83778378
41908379P83808380P83798379P83808380P8379

N restart: B C + A B + A C
N Contin: B C + A B + C A
0 restart: A B + B C + A C
0 contin: A B + B C + C A

My Data: 3 2 + 0 3 + 0 2

We can get rid of the Parity 0’s because _ B + B _ does not fit, 2 != 0.
We can also get rid of Data Continuation because _ C + _ _ + C _ does not fit, 2 != 0 (still).

so, A = 0, B = 3 and C = 2!!!

Hmm… the data from disk 1 is on 3! (maybe it was a hot spare)

Yay! Raid type = 5, Order = 0, 3, 2, pattern = N with Data Restart, stripes = 64K. Should be done, right?

Almost, I guess the PERC 3/I claims the first 64K for it’s self, so I have to trim that off and now I know how to put Humpty back together again!
Here is the script I used to do this:

!/bin/bash
for i in $(seq 1 1 555727)
do
  let MODULUS=$i%3
  if [ $MODULUS != 0 ]; then
    dd if=disk0.img bs=64K count=1 skip=$i >> disk-full.img
  fi
  if [ $MODULUS != 2 ]; then
    dd if=disk3.img bs=64K count=1 skip=$i >> disk-full.img
  fi
  if [ $MODULUS != 1 ]; then
    dd if=disk2.img bs=64K count=1 skip=$i >> disk-full.img
  fi
done

Now to mount the file system. We need to know where the file system starts. I used mmls and found:

mmls disk-full.img

DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors

     Slot    Start        End          Length       Description
00:  Meta    0000000000   0000000000   0000000001   Primary Table (#0)
01:  -----   0000000000   0000000062   0000000063   Unallocated
02:  00:00   0000000063   0028676024   0028675962   NTFS (0x07)
03:  Meta    0028676025   0142159184   0113483160   Win95 Extended (0x0F)
04:  Meta    0028676025   0028676025   0000000001   Extended Table (#1)
05:  -----   0028676025   0028676087   0000000063   Unallocated
06:  01:00   0028676088   0142159184   0113483097   NTFS (0x07)
07:  -----   0142159185   0142265855   0000106671   Unallocated

So the first ntfs partition started at 63*512. Mount using:


mount disk-full.img /mnt/c -o loop,offset=32256 -t ntfs

The 2nd partition is at 28676088*512, Mount using:


mount disk-full.img /mnt/d -o loop,offset=14682157056 =t ntfs

Backup the data to a single file:


tar czf ntfs-data.tar.gz /mnt/c /mnt/d

 

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>