Raid5 to raid1 conversion nuked array?

Dragon2611 · July 8, 2016, 1:56pm

As title

Running inside proxmox, drives passed through, after reading the problems with RAID5 I set a balance task going via the webUI to change to RAID1 it was running ok but at somepoint within the last 24 hours it looks like the VM crashed or paniced (It was sitting at the login prompt but unressponive on the console and network)

After restarting the drives appear to be totally blank, fdisk shows no partions, gdisk run on the proxmox host also shows no partitions.

Any ideas?

Also any possiblity of data recovery?

Thankfully I backed up pretty much everything I really cared about but there’s a stil a load of other data that would be nice to recover as it would take a while to find again.

phillxnet · July 8, 2016, 3:45pm

@Dragon2611 Firstly I would say not to worry about not seeing any partitions on your data disks as Rockstor only uses whole disk (ie non partitioned) btrfs configurations for it’s data disks. In this configuration btrfs deals with the disk ‘raw’ hence no partitions. Secondly I would say not to panic as on the forum there are probably a few who have run into similar problems, however I haven’t seen this one myself. So I’ll leave recovery advise to those who are more experienced with this one.

But as a start it would be good to have more info so could your first past the result of the following command into this thread to help others help you (executed as the root user):

btrfs fi show

And any notes on how that output differs from what you expect.

Also what does the Rockstor Web-UI have to say on the state of the balance / pools / drives etc.
And what does a pain mount command return?:

mount

Well done on having backups by the way but I’d say it’s definitely worth dabbling to see if you can bring this one back, I expect something triggered a kernel panic or memory exhaustion and left you halfway way along on the raid conversion but it may very well be possible to recover from this. Note however that you may only get one chance to mount read-write in this scenario in order to do a repair so tread carefully with any manual commands, especially initially. There after your options for repair are reduced, if I understand it correctly. However there are still options to mount degraded and read only, that may well allow you to retrieve the data prior to starting a-fresh on the raid1 pool.

Hope that helps.

Dragon2611 · July 8, 2016, 4:04pm

Trying to mount seems to hang the mount command, it’s doing a btrfs-rescue chunk-recover at the moment.

Btrfs check (without any destructive options set) returned couldn’t open file system.

The find root command showed some inconsitencies.

Since I’ve already started the chunk recover command i’ll try and let it finish, if I’ve hosed the file system then whoopsie, as I said anything important should be backed up.

The Duplicati backups from my PC/MAC was synthing’d off to 2 other nodes AND duplicati’d again to Hubic for good measure.

Same goes for the VM snapshots/backups

Only thing that isn’t liable to be backed up is essentially stuff that can be redownloaded from source (even if it is a ballache)

Dragon2611 · July 9, 2016, 9:53am

I left the Chunk recover running overnight (in screen as I was ssh’d in) woke up this morning to find the FS mounted and the balance still running.

Not sure if anything got corrupted (I would guess whatever it was writing when it crashed) but at least it’s there now.

Dragon2611 · July 10, 2016, 9:50am

Hmm I’m not really sure what’s going on with it, as if I reboot.etc then the FS takes a very long time to mount, and BTRFS-transaction is hitting a lot of CPU.

I’m not thinking it might just be the balance operation being very slow, possibly due to either fragemenation or the number of snapshots it has to deal with.