Hi All,
First of all I’ll state that this BTRFS issue is from an OMV 32bit installation BTRFS Version V4.20.1
I’m very frustrated with the outcome of events where I think I have lost 6.23TiB of content.
I’m going to Vent / document the process I went through here as it might help others or I might get some guidence on how to get access to my data again. My memory of what I started working on a couple of days ago is not a fresh, so I may not have the details quite right.
I had 2x 8TB Drives in RAID 1 (Mirror), One Drive started to “Fail” and caused performance issues but I was still able to access the disks, etc.
Hence I managed to Convert the BTRFS from Mirror to Single. I did this by using OMV’s “WIPE” option on the drive that was failing, which was quick, so it might have just deleted the Partition information. then I could run the BTRFS convert utility to convert from RAID1 to single. This completed after a day or so. (I don’t understand why this took so long.)
This was fine and continued to work. I had left it like this for a few days.
Eventually the drive that was “Failing” started to cause IO issues for the whole machine and I needed to pull it out. I tried to power cycle to fix the IO issues and drive power down commands, etc. Basically it had to be removed.
After this the only way to mount the now Single BTRFS volume (I think it complained of a missing volume) was to mount it in degraded mode.
This worked for a while but I found it frustrating that it would not mount on boot.
I tried the command to remove the missing device. I think that after this it ether said that the missing device was the device in use OR it no longer indicated a missing device. I’m not sure now.
It still would not just mount, only in degraded mode.
Typically a check disk utility would be the “goto” for other files systems, hence I started searching around.
At some point I ran;
btrfs inspect-internal dump-super
btrfs rescure super-recover
I tried btrfs rescue zero-log
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-rescue
indicates possible loss of data in the past 30 seconds but since I have not written data to the drive for a while and it was not mounted, I thought this should be safe.
I’ve run a btrfs check --check-data-csum but after a long time of running and no error ouputs I canceled it.
I’ve run a btrfs check --init-extent-tree but that ran for days and ended up “Aborted”. so I don’t know if that had done anything or was just reporting what It could do.
I found this man page for BTRFS;
https://www.systutorials.com/docs/linux/man/8-btrfs-check/
It indicated under SAFE OR ADVISORY OPTIONS that I could run this command.
btrfs check --clear-space-cache v1|v2
I read online that this space-cache is entries that indicate where empty space on the dirve exists and is used to speed up free space checks. Also something I had read indicated to me that perhaps this could be the reason.
I ran v1 but that Failed,
I ran V2 and it indicated nothing to do
I ran v1 and it indicated something different this time
so I ran this command a couple more times.
Then I think it indicated an error
After this btrfs would no longer mount the drive.
I really don’t understand what is wrong with BTRFS that a “safe” check tool can make things worse then it was.
This is the error output to the SSH Prompt
mount: /srv/dev-disk-by-id-ata-ST8000AS0002-1NA17Z_Z8409EZA-part1: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error.
the Console indicates
BTRFS error (device sda1) : block=7630187954176 read time tree block corruption detected
BTRFS error (device sda1) : failed to read block groups: -5
BTRFS error (device sda1) : open_ctree failed
I ran;
btrfs rescure super-recover
responds with;
All supers are valid, no need to recover
BTW, the “Failed” drive, I removed the screws and cleaned contacts on the PCB, it now appears to work okay, spin up and read checks okay but I have not written to the disk, possible to use for some revocery maybe if I can undo the “WIPE”.
Any suggestions?