Pool full - system unstable - pools no longer mount

Hello, recently my data pool (named “pool1”, NOT the system pool) has filled up. The system was unresponsive so I power cycled the pc. Rebooting reveals that the pools don’t mount. The Bootstrap service is not running and will not start.

I have read various threads on this symptom but none have worked for me. After a reboot, if I attempt to start the bootstrap service (systemctl start rockstor-bootstrap.service) my pool1 mounts. I have since removed some files and rebooted but still no luck getting the service to start properly.

[root@ymtrockstor ~]# journalctl -n 50
-- Logs begin at Mon 2020-09-14 18:30:12 EDT, end at Mon 2020-09-14 19:33:01 EDT. --
Sep 14 18:32:03 ymtrockstor bootstrap[2658]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Exception while setting access_token for url(h
Sep 14 18:32:03 ymtrockstor bootstrap[2658]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Exception while setting access_token for url(h
Sep 14 18:32:03 ymtrockstor bootstrap[2658]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Exception while setting access_token for url(h
Sep 14 18:32:03 ymtrockstor bootstrap[2658]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Exception while setting access_token for url(h
Sep 14 18:32:07 ymtrockstor bootstrap[2658]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Exception while setting access_token for url(h
Sep 14 18:32:07 ymtrockstor bootstrap[2658]: Max attempts(15) reached. Connection errors persist. Failed to bootstrap. Error: Exception while setting access_token for url(http://127.0.0.1:443): No JSON object could be decoded. content: N
Sep 14 18:32:07 ymtrockstor systemd[1]: rockstor-bootstrap.service: main process exited, code=exited, status=1/FAILURE
Sep 14 18:32:07 ymtrockstor systemd[1]: Failed to start Rockstor bootstrapping tasks.
Sep 14 18:32:07 ymtrockstor systemd[1]: Unit rockstor-bootstrap.service entered failed state.
Sep 14 18:32:07 ymtrockstor systemd[1]: rockstor-bootstrap.service failed.
Sep 14 18:32:07 ymtrockstor systemd[1]: Starting Samba SMB Daemon...
Sep 14 18:32:07 ymtrockstor systemd[1]: smb.service: Supervising process 3412 which is not our child. We'll most likely not notice when it exits.
Sep 14 18:32:07 ymtrockstor smbd[3412]: [2020/09/14 18:32:07.635188,  0] ../lib/util/become_daemon.c:138(daemon_ready)
Sep 14 18:32:07 ymtrockstor smbd[3412]:   daemon_ready: STATUS=daemon 'smbd' finished starting up and ready to serve connections
Sep 14 18:32:07 ymtrockstor systemd[1]: Started Samba SMB Daemon.
Sep 14 18:32:07 ymtrockstor systemd[1]: Reached target Multi-User System.
Sep 14 18:32:07 ymtrockstor systemd[1]: Reached target Graphical Interface.
Sep 14 18:32:07 ymtrockstor systemd[1]: Starting Update UTMP about System Runlevel Changes...
Sep 14 18:32:07 ymtrockstor systemd[1]: Started Stop Read-Ahead Data Collection 10s After Completed Startup.
Sep 14 18:32:07 ymtrockstor systemd[1]: Started Update UTMP about System Runlevel Changes.
Sep 14 18:32:07 ymtrockstor systemd[1]: Startup finished in 1.900s (kernel) + 2.583s (initrd) + 1min 52.555s (userspace) = 1min 57.039s.
Sep 14 18:32:37 ymtrockstor systemd[1]: Starting Stop Read-Ahead Data Collection...
Sep 14 18:32:37 ymtrockstor systemd[1]: Started Stop Read-Ahead Data Collection.
Sep 14 18:32:52 ymtrockstor chronyd[2166]: Selected source 206.108.0.133
Sep 14 18:36:58 ymtrockstor dbus[2161]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Sep 14 18:36:58 ymtrockstor systemd[1]: Starting Hostname Service...
Sep 14 18:36:58 ymtrockstor dbus[2161]: [system] Successfully activated service 'org.freedesktop.hostname1'
Sep 14 18:36:58 ymtrockstor systemd[1]: Started Hostname Service.
Sep 14 18:45:10 ymtrockstor systemd[1]: Starting Cleanup of Temporary Directories...
Sep 14 18:45:10 ymtrockstor systemd[1]: Started Cleanup of Temporary Directories.
Sep 14 18:48:39 ymtrockstor kernel: BTRFS info (device sdc): disk space caching is enabled
Sep 14 18:48:39 ymtrockstor kernel: BTRFS info (device sdc): has skinny extents
Sep 14 18:48:39 ymtrockstor kernel: BTRFS info (device sdc): bdev /dev/sdd errs: wr 0, rd 11, flush 0, corrupt 0, gen 0
Sep 14 19:01:01 ymtrockstor systemd[1]: Created slice User Slice of root.
Sep 14 19:01:01 ymtrockstor systemd[1]: Started Session 1 of user root.
Sep 14 19:01:01 ymtrockstor CROND[7232]: (root) CMD (run-parts /etc/cron.hourly)
Sep 14 19:01:01 ymtrockstor run-parts(/etc/cron.hourly)[7235]: starting 0anacron
Sep 14 19:01:01 ymtrockstor run-parts(/etc/cron.hourly)[7241]: finished 0anacron
Sep 14 19:01:01 ymtrockstor run-parts(/etc/cron.hourly)[7243]: starting 0yum-hourly.cron
Sep 14 19:09:23 ymtrockstor run-parts(/etc/cron.hourly)[7298]: finished 0yum-hourly.cron
Sep 14 19:09:23 ymtrockstor systemd[1]: Removed slice User Slice of root.
Sep 14 19:31:39 ymtrockstor dbus[2161]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Sep 14 19:31:39 ymtrockstor systemd[1]: Starting Hostname Service...
Sep 14 19:31:39 ymtrockstor dbus[2161]: [system] Successfully activated service 'org.freedesktop.hostname1'
Sep 14 19:31:39 ymtrockstor systemd[1]: Started Hostname Service.
Sep 14 19:33:01 ymtrockstor sshd[9451]: Accepted password for root from 10.211.211.118 port 24125 ssh2
Sep 14 19:33:01 ymtrockstor systemd[1]: Created slice User Slice of root.
Sep 14 19:33:01 ymtrockstor systemd-logind[2175]: New session 2 of user root.
Sep 14 19:33:01 ymtrockstor systemd[1]: Started Session 2 of user root.
Sep 14 19:33:01 ymtrockstor sshd[9451]: pam_unix(sshd:session): session opened for user root by (uid=0)

[root@ymtrockstor ~]# systemctl start rockstor-bootstrap.service
Job for rockstor-bootstrap.service failed because the control process exited with error code. See "systemctl status rockstor-bootstrap.service" and "journalctl -xe" for details.

bootstrap journalctl

@erisler Hello again.

So to clarigy Re:

and

Is it still the case that if you wait a bit after boot and then try starting the bootstrap service via the given command that it does, or doesn’t then start OK.

I’m reading that it does but we obviously want

as in on boot.

If there is a delay in the mount, then the boot strap service can time out. And that is what this looks like.

Can you confirm your Rockstor version:

yum info rockstor

As then we can see if we have a quote disable option or not.

And given this initially looks like a btrfs mount/timing issue post an unclean mount, it may be worth trying our years newer btrfs stack in our Rockstor 4 "Built on openSUSE’ offering. It may just be that it is able to import and mount this pool just dandy. No download just yet but we do have a DIY installer builder here:

And we are after tester for this procedure:

Just a thought. Especially as if you have a spare drive to use as a system drive you could see if it helps and always revert to your existing install either way. Do make sure not to have both system drives attached at the same time though. This breaks things. We aren’t releasing any more updates to either CentOS based channel now so it might be a good time to adopt the newer btrfs stack to see if it can import / mount you pool in it’s current state. We are on release candidate 3 rpm as of yesterdays release into the ‘Built on openSUSE’ testing channel - 4.0.2-0:

Hope that helps.

Hi Philip - I will look at giving thE v4 a go. Would you mind clarifying if the v3 configuration Can be restored to v4?

Eg backup my current config (v3.9 I think); install v4; import pools, shares, snapshots; then restore backed up config - should the system then be running as expected if all goes well?

@erisler Re:

Just make sure you download the backed up config as it’s stored on the system drive which will not be attached/available. So download to a client machine. It’s also human readable so you can look inside to re-assure yourself of it’s contents. Although you may want to pop a copy into a reformatting editor to make it easier to read.

This should work yes, but it’s not perfectly dependable unfortunately. Essentially the same code but the older you current system the less likely it will work unfortunately. But yes it’s meant to work but we do have some work to do to improve config save / restore. Definitely worth trying, but definitely take a note of your key configs just in case it lets you down. It’s still a little beta quality unfortunately.

Pool import is however pretty good now. Take a look at the following doc section for what is covered so far, thought at a beta level unfortunately:

http://rockstor.com/docs/config_backup/config_backup.html

And if you do use another system device then you can always reference the old system if absolutely necessary. I’d just make sure you take a note of your key exports just in case their restore fails. I have seen this myself, and did capture the relevant logs but I’ve not had time to look deeper into it. Also note that if you have any system disk shares you are using they will of course need to be moved to your data drives before the move as you can’t have two Rockstor system disks attached at the same time.

Hope that helps.