Storage Problems (UPDATED!!!)

So last week my big storage box started acting up.  Random reset, dropping a drive, all and all, not good.

So let me give you a quick rundown of this storage box.  I am running freenas.  I have a total of 11 drives currently.  9 of these drives are 2TB drives.   Configured in 3 Raid 5 configurations.   There is a small OS drive and a 128gig SSD just for cache.  Then striped across giving me a total of 9.63TB of storage with redundancy.   I store everything here, all my video and photo work.  My media collection.  My ESXi environment mounts iscsi off this thing.  So it’s pretty critical my geek life.

I did all sorts of testing.  Flashed the OS drive.  Replaced the OS drive.  No matter what I did.  4 minutes uptime, kernel panic and reboot.

So I ordered new parts which arrived yesterday.  I take the system out of the rack, put it on the table, open it up…. found the problem…

IMG_0398

Ouch.  A small fire in my server.

 

Update!!!! (12/23/15)
So things went from bad to worse.  Shortly after finding and fixing this.  I reinstalled the OS and brought everything up for a 24 hour burn in.  This worked.  Ok good, lets go back to the SD card for the OS.  Fresh install, 24 hour burn in.  Lets go!!

12 hours in.  System reboots.  Doesn’t come online… No prob, Ill fix it when I get home…….. (do you see the foreshadowing here?  cause I didn’t)

I get home, not booting right…  Ok,  reinstall os…. nope.   ok, maybe sd card is bad.  Back to the SSD.   Nope..

Uhhhh WTF!?!?!

Clean OS.  No auto Import.  Everything is fine…. import zfs volume…. kernel panic.  Dead..

1234931504682

Time to research.   Ok so from the inter-webs my prognoses is “screwed, data gone.

original

Apparently desktop memory and zfs are to blame here.   Not like I wasn’t trying to keep my data.  I had 3 raidz vdevs in a zfs pool.

So after contemplating all my poor poor data I decided to try to recover it.

Disk scans (SpinRite for 36 hours)  = nothing
zdb scan (multiple hours but kept crashing because ran out of swap) = nothing
OpenIndiana live cd = nothing

Finally I found a post where someone talked about trying to force the volume only as read only.  I figured, “hey, I’ve already spent 4 days trying to recover, why not”

So I boot up freenas.  Get on the console and type


zpool import -f -o readonly=on -R /mnt vol

It didn’t kernel panic….. wait, what?!

Holy $%^&*    IT MOUNTED!!! I’m jumping through directories all giddy that my data may still be intact.   But read only isn’t going to do me much good.  Need drives!!!!

I don’t have 10tb of external drive…. AJ!!!!

So I go to my buddies and steal all his externals.  I plug the all in at once and start the very very very slow copy.   After 5 days of copying to externals I was finally able to rebuild and start putting my data back.

So now the lessons learned:

  1. Regularly check that your offsite back ups are working
  2. Build a secondary nas for snapshot backups (this box will eventually be at AJ’s since we have a VPN between our places)
  3. Identify what is replaceable and what isn’t and dump that somewhere else too.

This was a long process but its coming to a close.  I will be doing snapshots of critical data to a secondary freenas box.   Once the initial snapshot is done, I will take the box to AJ’s and the snapshots will continue to backup there.

 

 

3 thoughts on “Storage Problems (UPDATED!!!)”

  1. I’m about to embark on a NAS server build and came across this post. I thought about ZFS, RAID, SnapRAID, mergefs and a bunch of other stuff. What was the issue with your ZFS pool? Anything to do with use of non-ECC memory? I keep hearing the back and forth on that.

    1. Ed,
      I do believe this had to do with using non-ecc memory. All the errors I ran into and researched indicated memory issues. So I don’t know 100% it was due to non-ECC but if I asked a magic 8-ball if it was non-ecc related all signs would point to yes. My rebuild of the server has been with ECC memory and so far no issues. Let me know if you have any other questions. Would be happy to try to help.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.