Rockstor not starting correctly

JoeMacFox · July 12, 2017, 3:06pm

Hello,

my Rockstor system worked quite well for a few weeks, but suddenly it does not fully start. Unforutnately I do not know exaclty how Rockstor starts, so I cannot find out, what is going wrong.

I can login via the local terminal and also via SSH, but nothing else (no access to the Web frontend, no access to the files).
In the terminal, I see my shares under /mnt2/, but all shares are empty.

Let me first explain my setup:
I am using a Dell T20 Server; it contains one SSD (which is sdc, it contains the Rockstor system) and two 8TB HDs(sda and sdb). The SSD is attached to ata3, where the two HDs are connected to ata1 and ata2.

in the logfile I can see:

  ....
    ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    ata3.00: ATA-8: SATA SSD, S5FAM030, max UDMA/100
    ata3.00: 250069680 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
    ata1.00: ATA-9: ST8000AS0002-1NA17Z, AR17, max UDMA/133
    ata1.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
    ata2.00: ATA-9: ST8000AS0002-1NA17Z, AR17, max UDMA/133
    ata2.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
    ata3.00: configured for UDMA/100
    ata1.00: configured for UDMA/133
    ata2.00: configured for UDMA/133
    ...

so, it detects all my disks.

a few lines later:
> Btrfs loaded, crc32c=crc32c-intel
> BTRFS: device label Server1 devid 1 transid 2204 /dev/sdc2
> BTRFS info (device sdc2): disk space caching is enabled
> BTRFS info (device sdc2): has skinny extents
> BTRFS info (device sdc2): detected SSD devices, enabling SSD mode
but here it talks only about sdc2, not about the other disks and partitions.

Also there is:

Device: /dev/sda, type changed from 'scsi' to 'sat'
Device: /dev/sda [SAT], opened
Device: /dev/sda [SAT], ST8000AS0002-1NA17Z, S/N:Z840WD96, WWN:5-000c50-0936b3be2, FW:AR17, 8.00 TB
Device: /dev/sda [SAT], not found in smartd database.
same for sdb (and sdc)

Then there are some error messages:

server dracut: dracut-
server dracut: Executing: /usr/sbin/dracut --hostonly --hostonly-cmdline --hostonly-i18n -o "plymouth dash resume ifcfg" -f /boot/initramfs-4.8.7-1.el7.elrepo.x86_64kdump.img 4.8.7-1.el7.elrepo.x86_64
server initrock: Traceback (most recent call last):
server initrock: File "/opt/rockstor/bin/initrock", line 45, in <module>
server initrock: sys.exit(scripts.initrock.main())
server initrock: File "/opt/rockstor/src/rockstor/scripts/initrock.py", line 411, in main
server initrock: '--database=%s' % db, app])
server initrock: File "/opt/rockstor/src/rockstor/system/osi.py", line 114, in run_command
server initrock: raise CommandException(cmd, out, err, rc)
server initrock: system.exceptions.CommandException: Error running a command. cmd = /opt/rockstor/bin/django migrate --list --database=default storageadmin. rc = 1. stdout = ['']. stderr = ['Traceback (most r$
server initrock: grations/loader.py", line 47, in __init__', '    self.build_graph()', '  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/migrations/loader.py", line 191, in build_graph', '    self$
server initrock: nections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?', '', '']
server systemd: rockstor-pre.service: main process exited, code=exited, status=1/FAILURE
server systemd: Failed to start Tasks required prior to starting Rockstor.

as said above, I do not know, what steps Rockstor SHOULD do during startup, so I cannot decide what is going wrong.

Anybody having an idea, where I should look at?

Thanks,
Joe

phillxnet · July 12, 2017, 4:12pm

@JoeMacFox A belated welcome to the Rockstor community.

From the looks of your log entry excerpts it all seems OK bar the Rockstor db migration, which seems to have failed.

So I’d guess that an update has taken place and the associated db migration (initialted from the initrock script is now failing and it’s called form the systemd service rockstor-pre.service).

Could you paste the output from the following command:

/opt/rockstor/bin/django showmigrations

it should be something like:

admin
 [ ] 0001_initial
auth
 [X] 0001_initial
 [X] 0002_alter_permission_name_max_length
 [X] 0003_alter_user_email_max_length
 [X] 0004_alter_user_username_opts
 [X] 0005_alter_user_last_login_null
 [X] 0006_require_contenttypes_0002
contenttypes
 [X] 0001_initial
 [X] 0002_remove_content_type_name
django_ztask
 (no migrations)
oauth2_provider
 [X] 0001_initial
 [ ] 0002_08_updates
sessions
 [ ] 0001_initial
sites
 [ ] 0001_initial
smart_manager
 [ ] 0001_initial
 [ ] 0002_auto_20170216_1212
storageadmin
 [X] 0001_initial
 [X] 0002_auto_20161125_0051
 [X] 0003_auto_20170114_1332
 [X] 0004_auto_20170523_1140

As Rockstor manages the mounts of your pool and sub volumes (shares) that explains why the mount points are empty. So I don’t think there is ‘as yet’ cause for concern on that front as it currently looks like the problem is concerning the database migration failing for some reason. And relating to that could you paste the output of the following commands:

btrfs fi show

and

btrfs fi usage /mnt2/rockstor_rockstor

and

journalctl -xe

may have some clues (look for red stuff).

Hopefully this extra info can shed some light on whats happened here.

JoeMacFox · July 12, 2017, 6:43pm

Thanks, Philip for the reply.

There are lots of error messages:
[root]# /opt/rockstor/bin/django showmigrations
> Traceback (most recent call last):
> File “/opt/rockstor/bin/django”, line 44, in
> sys.exit(djangorecipe.manage.main(‘rockstor.settings’))
> File “/opt/rockstor/eggs/djangorecipe-1.9-py2.7.egg/djangorecipe/manage.py”, line 9, in main
> management.execute_from_command_line(sys.argv)
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/management/init.py”, line 354, in execute_from_command_line
> utility.execute()
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/management/init.py”, line 346, in execute
> self.fetch_command(subcommand).run_from_argv(self.argv)
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/management/base.py”, line 394, in run_from_argv
> self.execute(*args, **cmd_options)
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/management/base.py”, line 445, in execute
> output = self.handle(*args, **options)
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/management/commands/showmigrations.py”, line 36, in handle
> return self.show_list(connection, options[‘app_labels’])
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/management/commands/showmigrations.py”, line 44, in show_list
> loader = MigrationLoader(connection, ignore_no_migrations=True)
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/migrations/loader.py”, line 47, in init
> self.build_graph()
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/migrations/loader.py”, line 191, in build_graph
> self.applied_migrations = recorder.applied_migrations()
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/migrations/recorder.py”, line 59, in applied_migrations
> self.ensure_schema()
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/migrations/recorder.py”, line 49, in ensure_schema
> if self.Migration._meta.db_table in self.connection.introspection.table_names(self.connection.cursor()):
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py”, line 164, in cursor
> cursor = self.make_cursor(self._cursor())
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py”, line 135, in _cursor
> self.ensure_connection()
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py”, line 130, in ensure_connection
> self.connect()
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/utils.py”, line 98, in exit
> six.reraise(dj_exc_type, dj_exc_value, traceback)
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py”, line 130, in ensure_connection
> self.connect()
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py”, line 119, in connect
> self.connection = self.get_new_connection(conn_params)
> File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/postgresql_psycopg2/base.py”, line 176, in get_new_connection
> connection = Database.connect(**conn_params)
> File “/opt/rockstor/eggs/psycopg2-2.6-py2.7-linux-x86_64.egg/psycopg2/init.py”, line 164, in connect
> conn = _connect(dsn, connection_factory=connection_factory, async=async)
> django.db.utils.OperationalError: could not connect to server: No such file or directory
> Is the server running locally and accepting
> connections on Unix domain socket “/var/run/postgresql/.s.PGSQL.5432”?

and:
[root]# btrfs fi show

Label: ‘Server1’ uuid: ead7052a-fbf7-492a-962c-83b3bfd0fdae
Total devices 1 FS bytes used 2.68GiB
devid 1 size 108.07GiB used 6.02GiB path /dev/sdc2

Label: ‘Server2’ uuid: 65ef60dc-fe0b-42cb-8a6f-6212e502caea
Total devices 1 FS bytes used 426.71GiB
devid 1 size 7.28TiB used 430.02GiB path /dev/sda

Label: ‘Server3’ uuid: e561bf47-262f-48da-8038-2884969bb997
Total devices 1 FS bytes used 432.00KiB
devid 1 size 7.28TiB used 2.02GiB path /dev/sdb

[root]# btrfs fi usage /mnt2/rockstor_rockstor

ERROR: cannot access ‘/mnt2/rockstor_rockstor’: No such file or directory

There was never a file or directory called “rockstor_rockstor”

journalctl -xe:

dracut[4555]: *** Creating initramfs image file ‘/boot/initramfs-4.8.7-1.el7.elrepo.x86_64kdump.img’ done ***
kdumpctl[2301]: No memory reserved for crash kernel.
kdumpctl[2301]: Starting kdump: [FAILED]
systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start Crash recovery kernel arming.
– Subject: Unit kdump.service has failed
– Defined-By: systemd
– Support: systemd-devel Info Page

– Unit kdump.service has failed.

– The result is failed.
…

Thanks,
Joe

phillxnet · July 12, 2017, 7:08pm

@JoeMacFox OK, so your issue is definitely the db migration being broken, and it’s not just a matter of your root fs being full. If you execute the following command:

systemctl disable kdump

as the output from your journalctl -xe is just noise and it will save you scrolling back up. We have an inadvertently enabled kdump service it seems in our recent kernels. It fails because there is not memory reserved for that kernel in the grub command line. Once disabled it will make for a quicker boot and allow you to more easily see the actual error.

Strange you don’t have the normal rockstor_rockstor label for your system pool. In your case the command would be:

btrfs fi usage /mnt2/Server1

as that is presumably your label for the Rockstor install. Is this a default Rockstor install as usually the system pool is on the third partition not the second?

btrfs fi show
Label: 'rockstor_rockstor'  uuid: 0a2d4553-d2e4-427e-ac19-75682fb626eb
	Total devices 1 FS bytes used 2.51GiB
	devid    1 size 26.73GiB used 5.04GiB path /dev/sda3

Anyway I’m afraid I’m going to have to cut out on this one (have to be elsewhere) but at least we know it’s the database migration thats throwing a wobbly here.

Hopefully others can cut in and offer advise on how this might be fixed (bar a rather overkill - shutdown - disconnect data drives - reinstall - complete first screen - shutdown - attach data disks - import pools again: assuming no data on system disk.). ie Reinstalling Rockstor.

Hope that helps and others can cut in here.

@Flyer you are familiar with the db migration stuff I believe, if you find yourself at a loose end of course: or do we just have a PostgreSQL service not running for some reason ie “Is the server running locally and accepting
connections…”

Flyer · July 13, 2017, 7:11am

That migration error seems PostgreSQL related and not migrations itself

File "/opt/rockstor/eggs/psycopg2-2.6-py2.7-linux-x86_64.egg/psycopg2/__init__.py", line 164, in connect
conn = connect(dsn, connectionfactory=connection_factory, async=async)
django.db.utils.OperationalError: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

My guess: no Rockstor init because of missing rockstor_rockstor (?) -> no PostgreSQL

Agree with @phillxnet suggestion about reinstalling (you won’t lose your data)

Mirko