FATAL: the database system is in recovery mode

Rod_Kinnison · December 20, 2018, 7:49pm

[Please complete the below template with details of the problem reported on your Web-UI. Be as detailed as possible. Community members, including developers, shall try and help. Thanks for your time in reporting this issue! We recommend purchasing commercial support for expedited support directly from the developers.]

Brief description of the problem

Was using Windows file history and it notified me that the ‘drive was full’ which wasn’t exactly possible

Detailed step by step instructions to reproduce the problem

[write here]

Web-UI screenshot

[Drag and drop the image here]

Error Traceback provided on the Web-UI


            Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 41, in _handle_exception
    yield
  File "/opt/rockstor/src/rockstor/storageadmin/views/appliances.py", line 41, in get_queryset
    self._update_hostname()
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/utils/decorators.py", line 145, in inner
    return func(*args, **kwargs)
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/transaction.py", line 271, in __exit__
    connection.set_autocommit(True)
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py", line 294, in set_autocommit
    self.ensure_connection()
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py", line 130, in ensure_connection
    self.connect()
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/utils.py", line 98, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py", line 130, in ensure_connection
    self.connect()
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/base/base.py", line 119, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/db/backends/postgresql_psycopg2/base.py", line 176, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/opt/rockstor/eggs/psycopg2-2.6-py2.7-linux-x86_64.egg/psycopg2/__init__.py", line 164, in connect
    conn = _connect(dsn, connection_factory=connection_factory, async=async)
OperationalError: FATAL:  the database system is in recovery mode

And now. the web interface gives this:

Page not found

Sorry, an unexpected internal error has occured.

A step by step description of the actions leading up to the error.
Download the log.tgz file (containing the tarred and zipped server logs) available at error.tgz, and attach it to the email.

Ok, so I installed Rockstor to test because it looked like the perfect solution that allowed me to use my older hardware for a NAS. All I needed it to do was to store the backup of one single computer and replace an external USB drive. It’s failed.

I’ve now made at least a dozen attempts to install Rockstor. I’ve gotten it working I think twice. And now, after getting hours into a backup it promptly crashes. With my linux experience, that means it’s going to require yet another wipe and reinstall because I’ll be damned if I spend hours googling cryptic linux commands to attempt to fix the current install.

Sorry but I’m glad I didn’t click on your menu spam asking me to pay for this. Had I done so, I’d be asking for a refund about now.

vesper1978 · December 21, 2018, 8:47am

Hi @Rod_Kinnison,

I’m not a Rockstor Developer, I just help out in the forum from time to time.

That error is from the PostgreSQL database that powers Rockstor being in recovery mode due to what is likely corruption. It’s not a Rockstor error.

If you’re able to access the cli, can you provide the output from:

sudo service postgresql status
sudo journalctl -xe

The output from these commands many provide information on how to troubleshoot this.

Haioken · December 22, 2018, 1:20am

First, I would like to mention again that (unless you’ve changed things since your previous attempt) you’re running this on very old hardware. Things cannot be expected to work perfectly on a system that does not provide the basics that the underlying OS (Centos 7) expects to find. (ACPI, EFI, …)

Beyond that, some of the issues you’re experiencing are leading me to think you may have another issue - the state of your disks.
Corruption of the main database shortly after install makes me think that things aren’t being written properly.

This is a pretty young NAS project built by a small team of developers, manual intervention can be required at times, as not all possible conditions have been caught and managed by the UI.

I would like to see the results of the latest S.M.A.R.T tests, assuming any have been performed, grab this with:

for disk in $(realpath /dev/disk/by-id/*|sort|uniq|grep -vP '\d$'); do smartctl -a $disk > smartlog_$(basename $disk).log; done

and then submit the smartlog_.log files.

If smart tests haven’t been formed, run a quick self test on each first with:

 for disk in $(realpath /dev/disk/by-id/*|sort|uniq|grep -vP '\d$'); do smartctl -t short $disk; done

before running and wait about 1/2 an hour before running the previous command.

Further, the btrfs stats of each of your BTRFS filesystems. This command will likely take a while to do, as it’s scrubbing the filesystems first (as I assume at this point that you haven’t setup a scheduled scrub)

for fs in $(mount|grep btrfs|grep "subvol=/)"|awk '{print $3}'); do btrfs scrub start -B $fs; btrfs dev stats $fs > stats_$(basename $fs).log; done

And then submit the stats_.log files

I’d break these down for you for future reference, but apparently you’d be damned.

You have members of the community trying to help you despite a pretty lousy attitude towards the project and the developers. Honestly, it kind of saps the will to help a little bit.

I can understand your frustrations, I had some of my own at the start (and as I’m sure @phillxnet remembers, I began with a similarly entitled attitude). I can tell you from experience the negative comments are not helping you.

I’m willing to continue providing help from my own expertise (Linux systems engineer by trade, Rockstor user for >1 year) where possible, but only if you’re willing to help yourself by providing the info required, and stop with the negative commentary.