Pool not mounted on reboot

New Rockstor install - create a pool, then a share and export it with Samba - all great.
Then on reboot, the share is not visible from a Samba client. Samba service is not started - start it manually, but still nothing. The disk, pool, share and export details are all shown correctly in the UI.

Login with SSH, neither share or pool are mounted under /mnt2 That probably explains why Samba isn’t starting… Forcing a scrub on the pool works, and mounts the pool (and I can see the test files copied to it - BTRFS is working… :smile: ) but not the mount for the share.

I’ve tried deleting everything and starting again, same behavior.

System is a plain HP N54L microserver with three disks intended for the BTRFS array and one separate system disk. Initially I had two USB devices plugged in as well (formatted with FAT32 and NTFS at the moment so not visible), but seeing the reference on here for Shares not mounted… I removed the USB devices just in case they were causing disk ID issues. Same behavior.

I fear I’ve found a problem with the reference between the share mount and the disk configuration, but I don’t know where to go to debug further - very happy to go digging to nail this one…!


dev/disk/by-id
lrwxrwxrwx 1 root root 9 Nov 29 18:35 wwn-0x50014ee2b719d259 -> …/…/sda
lrwxrwxrwx 1 root root 9 Nov 29 18:35 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4VC8357 -> …/…/sda
lrwxrwxrwx 1 root root 9 Nov 29 18:35 wwn-0x50014ee20c785b2c -> …/…/sdb
lrwxrwxrwx 1 root root 9 Nov 29 18:35 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7PU87TP -> …/…/sdb
lrwxrwxrwx 1 root root 9 Nov 29 18:35 ata-HDS722580VLSA80_VN6BHECBUYJV8C -> …/…/sdc
lrwxrwxrwx 1 root root 9 Nov 29 18:35 wwn-0x5000c50064d49231 -> …/…/sdd
lrwxrwxrwx 1 root root 9 Nov 29 18:35 ata-VB0250EAVER_Z3TR4KN1 -> …/…/sdd
lrwxrwxrwx 1 root root 10 Nov 29 18:35 wwn-0x5000c50064d49231-part3 -> …/…/sdd3
lrwxrwxrwx 1 root root 10 Nov 29 18:35 ata-VB0250EAVER_Z3TR4KN1-part3 -> …/…/sdd3
lrwxrwxrwx 1 root root 10 Nov 29 18:35 wwn-0x5000c50064d49231-part2 -> …/…/sdd2
lrwxrwxrwx 1 root root 10 Nov 29 18:35 ata-VB0250EAVER_Z3TR4KN1-part2 -> …/…/sdd2
lrwxrwxrwx 1 root root 10 Nov 29 18:35 wwn-0x5000c50064d49231-part1 -> …/…/sdd1
lrwxrwxrwx 1 root root 10 Nov 29 18:35 ata-VB0250EAVER_Z3TR4KN1-part1 -> …/…/sdd1


Hi,

Sorry to bump this - has anyone ever seen a similar problem, or are there any suggestions of where I might start debugging?

At the moment all I’ve got is a system which works fine until I reboot it - when I repeatablyI loose access to the data unless I go in and manually remount the paths. I can’t hand it over to any other users in this state…

Happy to help find this, but I can’t really see where to start - I can’t see any errors reported, so it just looks like a total failure of the UI to map the shares, which seems very odd.

Where should I start…?

@kimbl, sorry to not chip in on this one already, not sure if I can help much but this does look like a recently fixed issue. The dev log for the imminent release of 3.8-10 details an improvement of a fix to the rockstor-bootstrap.service reported by @roweryan and addressed by Mr @suman himself. So I’m hoping that in 3.8-9.07 you should be OK; and of course this will be incorporated into the next stable release (soon, fingers crossed). Please see a hopefully related forum thread where 3.8-9.07 seems to have worked for @Yoshi at least.

There have been some fairly major changes of late and this has caused one or two releases of Rockstor to trip up a little more than anyone involved would have liked. This is partly down to myself as I am tasked with implementing some additional testing that I have unfortunately not yet got stated on (backlog of promises I’m afraid) however I have every intention of getting stuck in on this so hopefully this should reduce the likely hood of further failures going forward.

Oh and Welcome to the Rockstor community and thanks for offering to help on this one, much appreciated. Please let the forum know if the testing branch works in your situation and if not then hopefully your issue can be resolved in a timely fashion going forward.

Thanks - I was on 3.8-9.02 from the iso image - I will switch to “testing” and try and get it to update to 07 and see if that works…
Cheers
Kim

Sorry, nope - now on 3.8-9.07, seemingly the same behavior - can create a pool, a share and an export. Works fine then after reboot it’s gone.

Samba has not started after the reboot. Manually starting from the GUI makes it appear on the network but the share is empty. Checking from SSH, the mount points are there but not actually mounted to the disks (hence they’re empty…)

I’ve tried wiping back to the disk level then recreating pool, share and export - same.

I guess all I can do is wait for 3.8-9.10? As this is such a fundamental thing I’d really like to get it nailed - if this had real data on it’d be very worrying :smile: I’ve got a fallback to plain Debian and a manual config but I really wanted this to work.

Anything I can go looking at to try and debug why it’s not mounting?

@kimbl Thanks, this is great. Well not really but still very good to know. I think I have reproduced at least one element of your experience with 3.8-9.07 and believe it to be associated with drive letter re-ordering, or at least at one level.
However on my test machine the samba service is always indicated as and actually active; unlike your experience. But I can reproduce (but only very occasionally) the blank shares / empty mount points.

I have started a new issue in the hope of trying to narrow down what’s going on with this one. My failed share mounts only happen very rarely and seem to be affected by the drive re-ordering that happens on power up. Do you experience any differing effect with regard to your mount points over several power cycles?

Make sure to power down and up rather than reboot as I think this increases the chance of drive name re-ordering.

Hi,

I’m not sure it’s actually device names changing - noted erors about Oauth when the rockstor service is starting up - this after a reboot in /var/log/messages

Dec 5 22:17:37 hoskullen bootstrap: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try a gain. Exception: OauthApp matching query does not exist.
Dec 5 22:17:37 hoskullen bootstrap: Max attempts(15) reached. Connection errors persist. Failed to bootstrap. Error: OauthApp matching query does not exist.
Dec 5 22:17:37 hoskullen systemd: rockstor-bootstrap.service: main process exited, code=exited, status=1/FAILURE
Dec 5 22:17:37 hoskullen systemd: Failed to start Rockstor bootstrapping tasks.
Dec 5 22:17:37 hoskullen systemd: Dependency failed for Samba SMB Daemon.

Later there are errors from Samba, but I can’t see anything critical:

Dec 5 22:37:20 hoskullen smbd[3300]: [2015/12/05 22:37:20.744017, 0] …/source3/smbd/server.c:1269(main)
Dec 5 22:37:20 hoskullen smbd[3300]: standard input is not a socket, assuming -D option
Dec 5 22:37:20 hoskullen systemd: smb.service: Supervising process 3303 which is not our child. We’ll most likely not notice when it exits.
Dec 5 22:37:20 hoskullen smbd[3303]: [2015/12/05 22:37:20.897840, 0] …/lib/util/become_daemon.c:136(daemon_ready)
Dec 5 22:37:20 hoskullen systemd: Started Samba SMB Daemon.
Dec 5 22:37:20 hoskullen systemd: Started Samba NMB Daemon.
Dec 5 22:37:20 hoskullen smbd[3306]: STATUS=daemon ‘smbd’ finished starting up and ready to serve connectionsUnable to connect to CUPS server localhost:631 - Transport endpoi nt is not connected
Dec 5 22:37:20 hoskullen smbd[3305]: STATUS=daemon ‘smbd’ finished starting up and ready to serve connectionsfailed to retrieve printer list: NT_STATUS_UNSUCCESSFUL
Dec 5 22:38:20 hoskullen smbd[3359]: [2015/12/05 22:38:20.980935, 0] …/source3/printing/print_cups.c:151(cups_connect)
Dec 5 22:38:20 hoskullen smbd[3359]: Unable to connect to CUPS server localhost:631 - Transport endpoint is not connected
Dec 5 22:38:20 hoskullen smbd[3305]: [2015/12/05 22:38:20.981222, 0] …/source3/printing/print_cups.c:528(cups_async_callback)
Dec 5 22:38:20 hoskullen smbd[3305]: failed to retrieve printer list: NT_STATUS_UNSUCCESSFUL

Disk uuid’s aren’t changing - boot 1

da224edf-61f3-4de1-b60b-7ed4a3c7ccbc -> …/…/sdb
997ff808-67f8-4147-aa83-b981d0a19c0f -> …/…/sdd3
3deed609-5664-4956-9b5c-bfa3a15efb2d -> …/…/sdd2
f627f03f-3040-4486-8bd1-a4b2ec7428b5 -> …/…/sdd1

boot 2

da224edf-61f3-4de1-b60b-7ed4a3c7ccbc -> …/…/sda
997ff808-67f8-4147-aa83-b981d0a19c0f -> …/…/sdd3
3deed609-5664-4956-9b5c-bfa3a15efb2d -> …/…/sdd2
f627f03f-3040-4486-8bd1-a4b2ec7428b5 -> …/…/sdd1

I have tried a clear install on Virtualbox with three disk images and it seems there to work as you’d expect.

Tomorrow’s about my last chance with this, I’ll see if I get anywhere - after that I’ll have to give in and do a plain Debian to get it going :smile:

@kimbl Thanks for the additional info, I’m currently chipping away at something that may be related but I’m afraid I can’t be sure yet if it will impact on your particular problem. I hope it does but only time will tell. I’ll update this post if it’s looking that way. Not likely to be release in your time frame though.

In typical VM’s the drives don’t get re-arranged from one boot / power cycle to another which is part of what leads me to think this is related.

Looking at your messages log it looks like you have an issue with the rockstor-bootstrap.service which I though had been fixed recently. I would chase your Max attempts reached OauthApp issue but I’m unfamiliar with that aspect.

Anyone else able to chip in on this one?

@kimbl I’ve had another idea re your samba / rockstor-bootstrap.service problem, as it looks very much like the problem recently addressed by @suman which you now have in 3.8-9.07 via 1026 bootstrap fail and so I was wondering if on your system it just isn’t re-trying for long enough. As there are reports of it working for some. Of course it could be something else entirely.

Anyway worth a try and given it’s a pretty fresh bit of code it may just need a little tweak.

I know it’s all last minute for you but if you could try changing the number of re-tries in this recent code it might be as simple as that.

ie in the file /opt/rockstor/src/rockstor/scripts/bootstrap.py

There is the following section at around line 52:-

        except Exception, e:
            #Retry on every exception, primarily because of django-oauth related
            #code behaving unpredictably while setting tokens. Retrying is a
            #decent workaround for now(11302015).
            if (num_attempts > 15):
                print('Max attempts(15) reached. Connection errors persist. '
                      'Failed to bootstrap. Error: %s' % e.__str__())
                sys.exit(1)

If you changed the two instances of 15 to say 30 or 45 and then reboot to see if that helps. Each unit increase equates to a further 2 seconds to wait and retry (see line 63 in the same file).
You could confirm that the changes took effect by noting the message change (to you’re chosen number) in your /var/log/messages.

As you can see from @suman in code comment this is a workaround for now so if simply increasing the retries does not work for you there is still hope for the future. I think the “update django-oauth-toolkit” issue exists to cover a future fix.

Hope this is more helpful than my last (rather late night) post.

1 Like

Hi,

I believe I have the same or a similar problem. Samba does not work after reboot of rockstor, and bootstrap is not loading either. I tried running bootstrap again after 2 hours uptime, but it still didn’t start. I have tried reinstalling rockstor using the ISO on the webpage, and I have also tried to install the testing updates but nothing seems to fix the problem. Help?

Is the exception “Exception: Internal Server Error: No JSON object could be decoded” relevant?

output from running systemctl and bootstrap:

[root@nas_sn17 ~]# systemctl status rockstor-bootstrap -l
● rockstor-bootstrap.service - Rockstor bootstrapping tasks
Loaded: loaded (/etc/systemd/system/rockstor-bootstrap.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sat 2016-02-20 10:51:50 CET; 5min ago
Main PID: 31221 (code=exited, status=1/FAILURE)

Feb 20 10:51:49 nas_sn17 bootstrap[31221]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Feb 20 10:51:49 nas_sn17 bootstrap[31221]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Feb 20 10:51:49 nas_sn17 bootstrap[31221]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Feb 20 10:51:49 nas_sn17 bootstrap[31221]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Feb 20 10:51:49 nas_sn17 bootstrap[31221]: Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Feb 20 10:51:49 nas_sn17 bootstrap[31221]: Max attempts(15) reached. Connection errors persist. Failed to bootstrap. Error: Internal Server Error: No JSON object could be decoded
Feb 20 10:51:50 nas_sn17 systemd[1]: rockstor-bootstrap.service: main process exited, code=exited, status=1/FAILURE
Feb 20 10:51:50 nas_sn17 systemd[1]: Failed to start Rockstor bootstrapping tasks.
Feb 20 10:51:50 nas_sn17 systemd[1]: Unit rockstor-bootstrap.service entered failed state.
Feb 20 10:51:50 nas_sn17 systemd[1]: rockstor-bootstrap.service failed.

[root@nas_sn17 ~]# /opt/rockstor/bin/bootstrap
BTRFS device scan complete
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Exception occured while bootstrapping. This could be because rockstor.service is still starting up. will wait 2 seconds and try again. Exception: Internal Server Error: No JSON object could be decoded
Max attempts(15) reached. Connection errors persist. Failed to bootstrap. Error: Internal Server Error: No JSON object could be decoded

Having the same problem “No JSON object could be decoded”. Can’t start smb or rockstor-bootstrap.service services. Did you ever find a solution conrad? Does anyone know where the json file attempting to be read resides?

I just updated to 3.8-12 and have this exact same problem. It also causes Samba to refuse to start. My workaround for the latter is to remove dependency to rockstor-bootstrap.service in /etc/systemd/system/smb.service.

As to the cause of the error I have no idea, unfortunately. The source code is in /opt/rockstor/src/rockstor/scripts/bootstrap.py if you want to trace it down.

Edit: meant to say I have the same problem as @conrad and @grizzly above.

A bit of digging indicates that it’s broken when bootstrap.py tries to access https://localhost/api/commands/bootstrap at this line:

aw.api_call(‘commands/bootstrap’, calltype=‘post’)

The error code is 500 but the error page says:

Page not found

Sorry, an unexpected internal error has occured.

In the nginx error log, I saw that these API calls have succeeded:

/o/token/
/api/network
/api/disks/scan
/api/commands/refresh-pool-state

while these have failed:

/api/commands/bootstrap
/api/commands/refresh-share-state
/api/commands/refresh-snapshot-state

Hope this helps.

Hi,

sorry for the late reply.

I sent in the error logs and was told that the error was fixed in the next release. When I installed 3.8-13, the problem was gone!

Did you try upgrading to 3.8-13?

Yes on 3.8-13. I was facing freezing\performance issues with 100% cpu utilisation. Fixed by disabling quotas on my shares, ie btrfs quota disable <path_to_share>. However this prevented rockstor-bootstrap.service and samba services from starting. Fixed by re-enabling quotas on /mnt2/rockstor_rockstor by command:

btrfs quota enable /mnt2/rockstor_rockstor

Now samba and bootstrap services start okay. However I’m now unable to create snapshots or use replication because quotas are a prerequisite for these features in Rockstor. Instead, I’m using btrfs commands for this functionality for the time being. In particular, this shell script:
http://goo.gl/Bx4QTV

has better functionality than Rockstor’s replication because it does rotation, encryption (via ssh) for transit via the Internet, and is faster.

This may not be of much help, but I just upgraded to 3.8-13.12 due to some other bug, and no longer have this problem.

@kimbl The issue opened in part as a result of this forum thread dev name change breaking mounts has now had another associated pull request / code change committed and is available as of Rockstor version 3.8-14.02 testing channel updates. I know this is waking an old thread that contains references to more than one issue but just by way of a heads up for anyone who wishes to test and confirm this proposed fix. As mentioned in the linked issue it is rather difficult to reproduce as it depends on specific hardware config / speed, and so is not able to be confirmed by any current Rockstor developers. @kimbl 's report in this thread looked like the closes to what has hopefully now been addressed via the now merged revise internal use and format of device names.

The referenced issue is awaiting confirmation of a fix, once one or two are received it can be closed as fixed.