We have a server with 6 To in a raid array as a base for btrfs.
(because of some problem with the raid code in btrfs)
It was running mostly fine for few month.
But over time btrfs-transacti became slow.
We had pause on the whole array everyday.
Then eventually it wasn’t reachable anymore. I’m new on btfrs so I didn’t know what’s really happen, but btrfs-transacti start to eat all memory and take huge time.
I discovered that there was more than 9000 snapshot but only 500 in rockstor. It’s look like the snapshot code on rockstor timeout and didn’t save the snapshot in the database. So the snapshot was not invisible in the web gui.
It take a whole day to delete all this unknow snapshots with the server offline.
Unfortunately the btrfs-transacti was still slow with a metadata used space around 80 GB.
We tried to do a metadata balance but metadata used space didn’t shrink.
Today I have 2 subvolumes with superblock error on mount. Rockstor fail to mount this subvolume and the whole array became readonly.
For now we are doing rsync to save data on working subvolume.
I read a lots of warning on zeroing the log.
What can I do before this ? btrfs rescue super-recover ?
@PhilA To help other forum member to chip in with advise / help it would good to see the output of the following command:
btrfs fi show
obviously after you have all that you need to retrieve by your chosen method (rsync in this case), or possible btrfs restore you can have a go at repair, and best I think to start without the --repair switch.
9000 snapshots is rather starting to push practical limits so this can explain the earlier slow downs in part. Also I’m not that clear on the current state of the snapshots count.
When you say:[quote=“PhilA, post:1, topic:2776”]
Rockstor fail to mount this subvolume and the whole array became readonly.
[/quote]
Do you mean that the entire pool is now read only?
Also what btrfs raid level were you using?
some news, I was able to btrfs restore more than 99% on my data (more than 24hours), but missing some important files.
I tried btrfs check but it fait with a segfault
btrfs rescue fail with segfault
btrfs rescue zerolog didn’t solve the problem (but did not segfault)
btrf-find-root was consuming 100% CPU for hours.
at last tried a --init-csum-tree -> segfault