I started to doing some test on this setup. The first use case is what happens if I unplug the drive while the system is running, and plugged in back again.
The first test is to “accidentally” unplugged one of the drive. Unfortunately, the whole system became unresponsive. No response from Web UI, console, ssh shell, even the host did not respond to ping. On the console, it simply stucked on the shell login prompt.
I then hard restted the host (with that drive still unplugged). Then I was able to see the disk as “detached” and the pool became unusable. Then I plugged in the drive again, and after a while, the whole pool, shares were back again, and looked nothing happened.
And because while the second drive was unplugged, the whole pool was not writable, so there’was actually no way to make the two drive unsync.
The second use case is similar to the first one, but this time, I will format the unplugged drive on another server to see what happens.
The interesting thing is, on the second try, when I unplugged the drive, the system did not hang, and the pool was still usable with one of the drives shown as detached. So I wrote some files on it, and even made a scrub.
Then I formatted the unplugged drive on another computer and plugged it in back again. Unfortunately, I didn’t see any errors on web ui, it told me the disk is back to the pool and everything was good, but actually there were tons of errors shown on the host console.
In the end, I used “btrfs replace” and rebuilt my array without any data loss.
After this test, I’d like to make some suggestions to Rockstor:
- If some disk fails, there should be some clear notification or alert show on the UI, other than let it go without being noticed.
- There should be web ui support to replace disks for an pool. (I also evaluated OpenNAS which supports this feature well). I guess this is already on our roadmap:)