Scrub only works on when manually initiated

IanO · March 3, 2018, 7:13pm

I have a share that appears generally to be in fine shape. I have run the btrfs check --repair successfully such that there are no errors remaining. However, every time I run a scrub something happens several hours through the scrub that basically crashes the server, and I need to do a hard reboot.

However, if I run a scrub on the same file system mounted independently to a mount point of my creation, instead of the /mnt2/… automated mount, then the scrub continues apparently to success within about 6 hours. No impact on server.

Any ideas what might be the issue? I have other file systems of similar and larger size on the same server that run unattended scrubs without issue.

IanO · March 7, 2018, 9:48pm

I had a slight modification to make to the scenario…i.e. for a specific disk…running the scrub process manually seems to work, through to a successful conclusion. When scheduled it does not seem to for some reason.

Also, I have done three manually initiated scrubs on this same disk today. Each time it runs, it finds a small number of read errors, and says it corrects them all every time. Then next time it runs, another small amount of read errors are found (slightly different number of bytes involved). Any idea what that means?

Haioken · March 8, 2018, 12:37am

@IanO

Not really, though crashing the system is obviously not ideal.
I’ve had no such issues on my own system. In the Rockstor UI, can you provide the details of one of the scrub events that has failed, requiring a reboot?
Also, take a look at the Rockstor logs, and /var/log/messages around the time of one of the crashes, see if anything there raises alarms.

Sounds like a dying disk - how many read errors / bytes are you seeing each time?

IanO · March 12, 2018, 2:06am

Thanks for the response. Its funny, the latest scrub had no errors, although I will be replacing that disk anyway, since I have lost my trust in it. This is the only disk (RAID1) that I do not have a backup for and I am sure there are some files missing. Each time it ran previously it would have between 26-64 bytes of read errors.

I will look in the log files in more detail. Thanks.