Multiples problems

PhilA · February 3, 2017, 1:48pm

Hi all,

We have a server with 6 To in a raid array as a base for btrfs.
(because of some problem with the raid code in btrfs)

It was running mostly fine for few month.
But over time btrfs-transacti became slow.
We had pause on the whole array everyday.
Then eventually it wasn’t reachable anymore. I’m new on btfrs so I didn’t know what’s really happen, but btrfs-transacti start to eat all memory and take huge time.
I discovered that there was more than 9000 snapshot but only 500 in rockstor. It’s look like the snapshot code on rockstor timeout and didn’t save the snapshot in the database. So the snapshot was not invisible in the web gui.
It take a whole day to delete all this unknow snapshots with the server offline.
Unfortunately the btrfs-transacti was still slow with a metadata used space around 80 GB.
We tried to do a metadata balance but metadata used space didn’t shrink.
Today I have 2 subvolumes with superblock error on mount. Rockstor fail to mount this subvolume and the whole array became readonly.
For now we are doing rsync to save data on working subvolume.

I read a lots of warning on zeroing the log.
What can I do before this ? btrfs rescue super-recover ?

Help is welcome
Regards

Flyer · February 3, 2017, 3:35pm

Hi @PhilA and welcome to Rockstor (although while having issues)

We’re talking about Btrfs so our first reference is Btrfs documentation:

hopefully this will help https://btrfs.wiki.kernel.org/index.php/Gotchas#Fragmentation

Before going on with defrag, did you try to perform a scrub (instead of a balance) ??

Asking to @phillxnet and @suman to pitch in too

Mirko

PhilA · February 3, 2017, 3:43pm

Thanks for your reply Flyer.
My main problem for now is the failed mount of my subvolume and if I’m right scrubs or balance won’t help for this one.

Flyer · February 3, 2017, 4:02pm

Sorry @PhilA, you’re right

Did you try a
btrfs check --repair your_device ?

phillxnet · February 3, 2017, 6:05pm

@PhilA To help other forum member to chip in with advise / help it would good to see the output of the following command:

btrfs fi show

obviously after you have all that you need to retrieve by your chosen method (rsync in this case), or possible btrfs restore you can have a go at repair, and best I think to start without the --repair switch.

9000 snapshots is rather starting to push practical limits so this can explain the earlier slow downs in part. Also I’m not that clear on the current state of the snapshots count.

When you say:[quote=“PhilA, post:1, topic:2776”]
Rockstor fail to mount this subvolume and the whole array became readonly.
[/quote]
Do you mean that the entire pool is now read only?
Also what btrfs raid level were you using?

PhilA · February 3, 2017, 6:38pm

#btrfs fi df /mnt2/DataPool/
Data, single: total=5.55TiB, used=5.45TiB
System, single: total=4.00MiB, used=752.00KiB
Metadata, single: total=87.01GiB, used=85.63GiB
GlobalReserve, single: total=512.00MiB, used=26.42MiB

btrfs fi show /mnt2/DataPool/

Label: ‘DataPool’ uuid: a14b6bba-2d8f-4b66-9fab-2bdbad64eeae
Total devices 1 FS bytes used 5.54TiB
devid 1 size 6.36TiB used 5.64TiB path /dev/sdb

btrfs subvolume list -s /mnt2/DataPool/ | wc -l

427

The pool is readonly yes
We use hardware raid because of problem with raid level in btrfs

thanks

PhilA · February 6, 2017, 9:49am

some news, I was able to btrfs restore more than 99% on my data (more than 24hours), but missing some important files.

I tried btrfs check but it fait with a segfault
btrfs rescue fail with segfault
btrfs rescue zerolog didn’t solve the problem (but did not segfault)
btrf-find-root was consuming 100% CPU for hours.
at last tried a --init-csum-tree -> segfault

Flyer · February 6, 2017, 11:26am

btrfs segfaults -> My suspicion is our current 4.8.3 version @phillxnet (remember my tests over >=4.8.4 fixing other segfaults)

@suman can we update btrfs progs?

Mirko