Hi,
I have a RAID 1 pool_1 with 2 3TB WDC disks including one disk that has some errors and want to replace the damaged disk.
The orignal discs were:
Name
Temp Name
Btrfs DevID
Capacity
Allocated (%)
Write I/O errors
Read I/O errors
Flush I/O errors
Corruption errors
Generation errors
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4RUA75L
sdb
1
2.73 TB
2.15 TB (78.8%)
18446744072015792559
1302366094
7226
87367315
27351
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7LX5ZLP
sda
2
2.73 TB
2.15 TB (78.8%)
0
6
0
494450
0
I added a new disc of the same size: ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3VD7U1N so I now have 3 discs available for pool_1.
I then did a balance to write to the new disk ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4RUA75L.
The balance did not complete. I received the following error after 30% balance:
Error running a command. cmd = btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt2/Pool_1. rc = 1. stdout = [’’]. stderr = [“ERROR: error during balancing ‘/mnt2/Pool_1’: Input/output error”, ‘There may be more info in syslog - try dmesg | tail’, ‘’]
The question is:
Should I remove or spin down the disc with errors and then rebalance from the one non-damaged disk or
Since this event I have also attempted to remove the damaged disc using the Resize/Reraid pool option.
The error message was:
Removing disks ([u’ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4RUA75L’]) may shrink the pool by 2305826816 KB, which is greater than available free space 2736748636 KB. This is not supported.
Traceback (most recent call last):
File “/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py”, line 68, in run_for_one
self.accept(listener)
File “/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py”, line 27, in accept
client, addr = listener.accept()
File “/usr/lib64/python2.7/socket.py”, line 202, in accept
sock, addr = self._sock.accept()
error: [Errno 11] Resource temporarily unavailable
It may help if we knew which version of rockstor you are using. To be sure if you paste the output of:
yum info rockstor
A complication here is that both drives have read IO and corruption errors logged.
Also do you have the replacement drive available, and do you have the option to attach it sumultaneously to the existing 2 drives? This expands your options and knowing this will help folks on the forum suggest possible scenarios.
@Beachrock Thanks for the update. Always best to copy data / refresh backups from suspect pools, just in case.
OK, that’s fairly new but it could also do with updating (unless you depend on AFP which was removed in 3.9.2-56), and let me know via PM here on the forum if you arn’t offered 3.9.2-57 for some reason. But there are no disk/pool management improvements between -55 and -57 so 55 is fine for your current endeavour. And do keep in mind that if this is what the Web-UI is displaying you could still be subject to the bug described in:
So the yum info is the one to trust.
But there has been a long awaited disk/pool management improvement in 3.9.2-60 ‘Build on openSUSE’ testing channel and we are soon to release a new installer which will signify our next stable channel release. Worth keeping in mind when the next re-install comes around.
Hope that helps and do keep us posted on how it goes.
Copied the files in Pool_1 into another pool as a backup.
To negate the error message about not being able to remove the problem disk because of size restrictions I removed some files in Pool_1 that were not necessary to reduce the usage in the pool.
Conducted a scrub of Pool_1
Conducted removal of problem disc (rebalancing took forever)
Shutdown and removed problem disc
Restarted and all functional. Data in Pool_1 intact.