Challenge when balancing for disk change out

Hi,
I have a RAID 1 pool_1 with 2 3TB WDC disks including one disk that has some errors and want to replace the damaged disk.

The orignal discs were:

Name Temp Name Btrfs DevID Capacity Allocated (%) Write I/O errors Read I/O errors Flush I/O errors Corruption errors Generation errors
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4RUA75L sdb 1 2.73 TB 2.15 TB (78.8%) 18446744072015792559 1302366094 7226 87367315 27351
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7LX5ZLP sda 2 2.73 TB 2.15 TB (78.8%) 0 6 0 494450 0

I added a new disc of the same size: ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3VD7U1N so I now have 3 discs available for pool_1.

I then did a balance to write to the new disk ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4RUA75L.

The balance did not complete. I received the following error after 30% balance:
Error running a command. cmd = btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt2/Pool_1. rc = 1. stdout = [’’]. stderr = [“ERROR: error during balancing ‘/mnt2/Pool_1’: Input/output error”, ‘There may be more info in syslog - try dmesg | tail’, ‘’]

The question is:

  1. Should I remove or spin down the disc with errors and then rebalance from the one non-damaged disk or
  2. have I created a bigger problem for myself.

Any ideas are welcome.

Cheers.

Since this event I have also attempted to remove the damaged disc using the Resize/Reraid pool option.

The error message was:

Removing disks ([u’ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4RUA75L’]) may shrink the pool by 2305826816 KB, which is greater than available free space 2736748636 KB. This is not supported.

        Traceback (most recent call last):

File “/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py”, line 68, in run_for_one
self.accept(listener)
File “/opt/rockstor/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/workers/sync.py”, line 27, in accept
client, addr = listener.accept()
File “/usr/lib64/python2.7/socket.py”, line 202, in accept
sock, addr = self._sock.accept()
error: [Errno 11] Resource temporarily unavailable

Sort of stuck …

@Beachrock Hello again.

It may help if we knew which version of rockstor you are using. To be sure if you paste the output of:

yum info rockstor

A complication here is that both drives have read IO and corruption errors logged.

Also do you have the replacement drive available, and do you have the option to attach it sumultaneously to the existing 2 drives? This expands your options and knowing this will help folks on the forum suggest possible scenarios.

Hope that helps.

1 Like

Hi Phil,

The current rockstor version is: 3.9.2-55.

I could also buy another drive and create another pool and then copy pool_1 to the new pool.

After that I can then delete the pool_1 contents and remove the damaged disc from pool_1.

That might be quicker.

Phil,
I have a spare pool (apple_pool) with Raid 1 enabled that is 3.64TB with only 86GB used.
Pool_1 only uses 2.55TB.

I am copying Pool_1 to apple_pool and then will sort out the mess that is pool_1.
David

1 Like

@Beachrock Thanks for the update. Always best to copy data / refresh backups from suspect pools, just in case.

OK, that’s fairly new but it could also do with updating (unless you depend on AFP which was removed in 3.9.2-56), and let me know via PM here on the forum if you arn’t offered 3.9.2-57 for some reason. But there are no disk/pool management improvements between -55 and -57 so 55 is fine for your current endeavour. And do keep in mind that if this is what the Web-UI is displaying you could still be subject to the bug described in:

So the yum info is the one to trust.

But there has been a long awaited disk/pool management improvement in 3.9.2-60 ‘Build on openSUSE’ testing channel and we are soon to release a new installer which will signify our next stable channel release. Worth keeping in mind when the next re-install comes around.

Hope that helps and do keep us posted on how it goes.

This is now closed:

Steps taken:

  1. Copied the files in Pool_1 into another pool as a backup.
  2. To negate the error message about not being able to remove the problem disk because of size restrictions I removed some files in Pool_1 that were not necessary to reduce the usage in the pool.
  3. Conducted a scrub of Pool_1
  4. Conducted removal of problem disc (rebalancing took forever)
  5. Shutdown and removed problem disc
  6. Restarted and all functional. Data in Pool_1 intact.

Thanks for all your help

2 Likes