I know on FreeNAS there’s a way to attempt fixing the sectors and if they don’t fix rebuild the pool skipping them. Is there a way to do something similar on Rockstor? I am in Raid 1+0 so I suppose a disk failure wouldn’t be the worst thing, but I like fixing errors
@coleberhorst Hello again.
In this case the error is hardware based and given the number of sectors involved I would say that this is a very poorly drive that need to be replaced ‘too sweet’ especially given they are already identified as un-correctable. The ‘fix’ method to assert correctness of stored data in both ZFS and btrfs is the scrub but in this case you are well beyond the filesystem level and clearly in very poorly hardware realm. Look to the method (currently command line only) where you remove and or replace this drive’s place in it’s pool but as the drive is known dodgy to treat it as read only. Note that if the pool is currently at minimum drive count for the btrfs raid level then you will have to perform an in place ‘replace’ and as the drive looks very poorly it would be advisable to treat it as read only ie the ‘-r’ switch detailed in our following open issue:
as if you drop below minimum drive count for a given btrfs raid level then you only get one chance to mount the pool in degraded mode in which to fix the issue. But is seems that your pool is still holding up so currently your options are broader. The minimum drive count for btrfs raid 10 is 4 so you would need 5 drives if you were to remove this device from the pool by simply using the Rockstor UI resize pool - remove drive option, however that will exercise the existing poorly drive quite a bit which, given it’s current report, would be inadvisable. Likewise adding an additional drive, to raise the pool above minimum count, would also exercise the drive quite a bit. Hence the replace -r suggestion.
If the drive ends up failing and dropping out of existence from btrfs’s point of view then the following issue has details of the procedure then required:
But you are not at this stage just yet.
So I would say you need to do a replace (with the read only -r switch) of this disk in it’s existing pool to get it out with the minimum of exercise.
Yes on this note your situation is not a good candidate but modern drives do have a degree self healing capability that is often not triggered until the problematic sector is written to. As a consequence there are texts that suggest a full disk write to a drive will force all faulty / bad sectors to undergo the ‘auto replace with spare sectors’ procedure built into drives. However although I have done this on quite a few drives they have pretty much all shortly there after produced additional bad sectors and that was with less than a hand full of bad sectors. You have > 65000 and they are already marked as un-correctable so yes kid glove time and step very carefully and be sure to understand the commands required.
It is also possible to manually mark sectors as bad (rather than via the auto method just described), but you already have >65000 !!
Thanks as always for your very detailed answers and help.
After closer investigation there is also a SMART error showing the cache is failing and throwing parity errors so this drive seems close to dead along with it’s 1/8th sectors unreadable. I will do as you advise and remove it, probably replacing it within a week or so with a new 6TB one.
Current config is raid 10 with 2TB x 3, 4TB, 6TB. New will be 2TB x 2, 4TB, 6TB x 2. As you said my hardware isn’t the best. Using my old desktop and some of the drives are quite old.