Rockstor 3.9.2-3 shares unmounted after reboot

phillxnet · December 8, 2017, 6:32pm

@Thailgrott Hello again.
@Hooverdan and @upapi_rockstor A belated welcome to the Rockstor community to you both.

I haven’t gotten to the bottom of this hole issue just yet but I have managed to reproduce the failure to mount due to the exception being raised, along with the obvious repercussions, during the attempt to establish share usage when quotas are disabled. This would seem obviously ‘over the top’ and non robust so I have started by submitting the following pr which is ready for review:

github.com/rockstor/rockstor-core

don't raise exception on quota not enabled. Fixes #1867

rockstor:master ← phillxnet:1867_shares_unmounted_after_reboot_-_3.9.2-3

opened 06:12PM - 08 Dec 17 UTC

phillxnet

+1 -1

We already log the state of quota not enabled via run_command log=True on the li…ne relating to this pr and given a failure / exception here (pre pr) breaks mounts we should simply log and move on. Our parent function already deals with this scenario by returning vol / pvol sizes of 0. This is akin to the treatment we give to command lines involved in the following functions withing the same file: is_subvol, subvol_info, add_share, balance_status, device_scan. Although this pr does address a failure to mount shares due to 'quota not enabled' it does not address the reason for the quota not being enabled in the first place. This however can be addressed in a future pr, along with it's various ramifications ie 0 usage against shares and an inability to change share sizes etc. Fixes #1867 Proof of fix re shares unmounted was accomplished by re-producing the issue (fresh 3.9.2-3 install setup with an example plex config as per: http://rockstor.com/docs/docker-based-rock-ons/plex-media-server.html#plex-server-rock-on Then rebooting. There after every reboot failed with exceptions as noted in the issue text (quota not enabled). Post pr every reboot thus far successfully remounted all shares and started the plex rock-on but with the above noted caveats re shares. @schakrava Ready for review.

which was created against the issue I raised earlier from all of your most detailed reports:

github.com/rockstor/rockstor-core

shares unmounted after reboot - 3.9.2-3

opened 04:12PM - 08 Dec 17 UTC

closed 12:50PM - 11 Dec 17 UTC

phillxnet

Thanks to forum members upapi_rockstor, Horatio, and Hooverdan for initially rep…orting this issue. As per the 3.9.2-3 update it can be the case that some or all shares can fail to mount. This looks to be related to the Bootstrap process as System - Services indicates that OFF for Bootstrap and: ``` systemctl status rockstor-bootstrap ``` similarly indicates a failure due to MAX attempts(15) reached. the 3.9.2-3 update appears to do no more than: "move to docker-ce / dockerd. Fixes #1860" #1865 Based on a local reproduction of this issue: On reboot mounts are still failed due to the following quota issue on the rock-ons-root: ``` [08/Dec/2017 15:58:23] ERROR [storageadmin.middleware:32] Exception occured while processing a request. Path: /api/commands/refresh-share-st ate method: POST [08/Dec/2017 15:58:23] ERROR [storageadmin.middleware:33] Error running a command. cmd = /sbin/btrfs qgroup show /mnt2/rock-pool/rock-ons-ro ot. rc = 1. stdout = ['']. stderr = ["ERROR: can't list qgroups: quotas not enabled", ''] Traceback (most recent call last): File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/handlers/base.py", line 132, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/views/decorators/csrf.py", line 58, in wrapped_view return view_func(*args, **kwargs) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/views/generic/base.py", line 71, in view return self.dispatch(request, *args, **kwargs) File "/opt/rockstor/eggs/djangorestframework-3.1.1-py2.7.egg/rest_framework/views.py", line 452, in dispatch response = self.handle_exception(exc) File "/opt/rockstor/eggs/djangorestframework-3.1.1-py2.7.egg/rest_framework/views.py", line 449, in dispatch response = handler(request, *args, **kwargs) File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/utils/decorators.py", line 145, in inner return func(*args, **kwargs) File "/opt/rockstor/src/rockstor/storageadmin/views/command.py", line 318, in post import_shares(p, request) File "/opt/rockstor/src/rockstor/storageadmin/views/share_helpers.py", line 94, in import_shares volume_usage(pool, share.qgroup, share.pqgroup) File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 835, in volume_usage out, err, rc = run_command(cmd, log=True) File "/opt/rockstor/src/rockstor/system/osi.py", line 121, in run_command raise CommandException(cmd, out, err, rc) CommandException: Error running a command. cmd = /sbin/btrfs qgroup show /mnt2/rock-pool/rock-ons-root. rc = 1. stdout = ['']. stderr = ["ERROR: can't list qgroups: quotas not enabled", ''] [08/Dec/2017 15:58:24] ERROR [system.osi:119] non-zero code(1) returned by command: ['/sbin/btrfs', 'qgroup', 'show', '/mnt2/rock-pool/rock-ons-root/btrfs/subvolumes/f352dfd367879d5f612bde14ec3da50250bb13b417a47312e94440156f8dc984']. output: [''] error: ["ERROR: can't list qgroups: quotas not enabled", ''] ``` Quota issues were also indicated as related by forum user upapi_rockstor and confirmed by Hooverdan in the second of the following forum threads: Please update the following forum threads with this issues resolution. https://forum.rockstor.com/t/after-update-to-3-9-2-3-yum-updates-plex-shares-are-not-working/4141 https://forum.rockstor.com/t/rockstor-3-9-2-3-shares-unmounted-after-reboot/4144

Too right. As above there is a pending pr to address at least this element by way of making Rockstor more robust ‘quota not enabled’ state. My initial suspicion is that we have a compound issue here were by recent changes have compounded to surface this problem over what appears to be an unrelated update ie that of 3.9.2-3 which simply moved from legacy docker to docker-ce.

Obviously we have more to do here and are in the midst of changing the update strategies and I think this had lead to these breaks leaking through to stable channel. It is expected that we are to setup a new dev and testing routine which should help to guard against such things.

If any of the more ‘brave’ (read happy to edit code and take a risk) participants would like to try out the changes as indicated in the above pr then that may help with confirming the fix for at least the unmounted at boot element of this issue. Please don’t attempt this if you are not familiar with Python file editing and happy to try a reboot and take a risk.

Sorry for the inconvenience folks and thanks for helping to support Rockstor’s development.

Hope that helps and please be advised that our dev release model is in flux but soon to settle again:

And as always @suman’s (project lead) word on the matter supersedes mine as I may have this all wrong thus far.