Error 24 Too many open files causes scheduled scrub job to fail and prevents login to web UI

Hi

My scheduled scrub job has not completed for the last few runs. It terminates with Error 24 Too many open files. Likewise, I occasionally get the same error message when logging in to the web UI. Rebooting fixes the logon problem, but the problem returns after some (up)time.

I installed and ran lsof as recommended in in another topic, I could not make much sense of the output… I did however notice many lines like these for gunicorn and other packages:
gunicorn 10307 10324 root 45u REG 0,44 617 36999 /tmp/tmpD7CCS4 (deleted)
gunicorn 10307 10324 root 46u REG 0,44 617 37027 /tmp/tmpH1Dc_X (deleted)
gunicorn 10307 10324 root 47u REG 0,44 617 38196 /tmp/tmp075mXa (deleted)
gunicorn 10307 10324 root 48u REG 0,44 617 39250 /tmp/tmpsA75Ls (deleted)
gunicorn 10307 10324 root 49u REG 0,44 617 37275 /tmp/tmp7troCJ (deleted)
gunicorn 10307 10324 root 50u REG 0,44 617 38232 /tmp/tmpicczIw (deleted)
gunicorn 10307 10324 root 51u REG 0,44 617 37320 /tmp/tmp_wENgF (deleted)
gunicorn 10307 10324 root 52u REG 0,44 617 39530 /tmp/tmpS3eHXD (deleted)

The /tmp directory is currently empty

The log för the scheduled scrub job:
2 June 12th 2017, 1:42:00 am June 12th 2017, 1:42:03 am finished
3 June 19th 2017, 1:42:00 am June 19th 2017, 1:42:02 am finished
4 June 26th 2017, 1:42:00 am June 26th 2017, 1:42:02 am finished
5 July 3rd 2017, 1:42:00 am July 3rd 2017, 1:42:03 am finished
6 July 10th 2017, 1:42:00 am July 10th 2017, 1:42:02 am error
7 July 17th 2017, 1:42:00 am July 17th 2017, 1:42:02 am finished
8 July 24th 2017, 1:42:00 am July 24th 2017, 1:42:02 am error
9 July 31st 2017, 1:42:00 am July 31st 2017, 1:42:02 am error
10 August 7th 2017, 1:42:00 am August 7th 2017, 1:42:02 am error
11 August 14th 2017, 1:42:00 am August 14th 2017, 1:42:02 am error

Truncated Rockstor log file below (there are lots of similar errors in the file):

[13/Aug/2017 20:37:52] ERROR [storageadmin.views.command:84] Exception while refreshing state for Pool(pool-1). Moving on: Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-ST3000DM001-1ER166_W500QT0K’]): [Errno 24] Too many open files
[13/Aug/2017 20:37:52] ERROR [storageadmin.views.command:85] Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-ST3000DM001-1ER166_W500QT0K’]): [Errno 24] Too many open files
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/command.py”, line 76, in _refresh_pool_state
pool_info = get_pool_info(first_attached_dev.target_name)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 100, in get_pool_info
o, e, rc = run_command(cmd)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 107, in run_command
raise Exception(msg)
Exception: Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-ST3000DM001-1ER166_W500QT0K’]): [Errno 24] Too many open files
[13/Aug/2017 20:37:52] ERROR [storageadmin.views.command:84] Exception while refreshing state for Pool(rockstor_rockstor). Moving on: Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-KINGSTON_SV300S37A60G_50026B72320277D8-part3’]): [Errno 24] Too many open files
[13/Aug/2017 20:37:52] ERROR [storageadmin.views.command:85] Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-KINGSTON_SV300S37A60G_50026B72320277D8-part3’]): [Errno 24] Too many open files
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/command.py”, line 76, in _refresh_pool_state
pool_info = get_pool_info(first_attached_dev.target_name)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 100, in get_pool_info
o, e, rc = run_command(cmd)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 107, in run_command
raise Exception(msg)
Exception: Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-KINGSTON_SV300S37A60G_50026B72320277D8-part3’]): [Errno 24] Too many open files
[13/Aug/2017 20:37:52] ERROR [storageadmin.middleware:32] Exception occured while processing a request. Path: /api/commands/refresh-snapshot-state method: POST
[13/Aug/2017 20:37:52] ERROR [storageadmin.middleware:33] Exception while running command([’/sbin/btrfs’, ‘subvolume’, ‘list’, ‘-u’, ‘-p’, ‘-q’, ‘/mnt2/pool-1’]): [Errno 24] Too many open files
Traceback (most recent call last):
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/handlers/base.py”, line 132, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/views/decorators/csrf.py”, line 58, in wrapped_view
return view_func(*args, **kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/views/generic/base.py”, line 71, in view
return self.dispatch(request, *args, **kwargs)
File “/opt/rockstor/eggs/djangorestframework-3.1.1-py2.7.egg/rest_framework/views.py”, line 452, in dispatch
response = self.handle_exception(exc)
File “/opt/rockstor/eggs/djangorestframework-3.1.1-py2.7.egg/rest_framework/views.py”, line 449, in dispatch
response = handler(request, *args, **kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/utils/decorators.py”, line 145, in inner
return func(*args, **kwargs)
File “/opt/rockstor/src/rockstor/storageadmin/views/command.py”, line 323, in post
import_snapshots(share)
File “/opt/rockstor/src/rockstor/storageadmin/views/share_helpers.py”, line 139, in import_snapshots
share.name)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 485, in snaps_info
mnt_pt])
File “/opt/rockstor/src/rockstor/system/osi.py”, line 107, in run_command
raise Exception(msg)
Exception: Exception while running command([’/sbin/btrfs’, ‘subvolume’, ‘list’, ‘-u’, ‘-p’, ‘-q’, ‘/mnt2/pool-1’]): [Errno 24] Too many open files
[13/Aug/2017 20:38:52] ERROR [storageadmin.views.command:84] Exception while refreshing state for Pool(pool-1). Moving on: Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-ST3000DM001-1ER166_W500QT0K’]): [Errno 24] Too many open files
[13/Aug/2017 20:38:52] ERROR [storageadmin.views.command:85] Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-ST3000DM001-1ER166_W500QT0K’]): [Errno 24] Too many open files
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/command.py”, line 76, in _refresh_pool_state
pool_info = get_pool_info(first_attached_dev.target_name)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 100, in get_pool_info
o, e, rc = run_command(cmd)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 107, in run_command
raise Exception(msg)
Exception: Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-ST3000DM001-1ER166_W500QT0K’]): [Errno 24] Too many open files
[13/Aug/2017 20:38:52] ERROR [storageadmin.views.command:84] Exception while refreshing state for Pool(rockstor_rockstor). Moving on: Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-KINGSTON_SV300S37A60G_50026B72320277D8-part3’]): [Errno 24] Too many open files
[13/Aug/2017 20:38:52] ERROR [storageadmin.views.command:85] Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-KINGSTON_SV300S37A60G_50026B72320277D8-part3’]): [Errno 24] Too many open files
Traceback (most recent call last):
File “/opt/rockstor/src/rockstor/storageadmin/views/command.py”, line 76, in _refresh_pool_state
pool_info = get_pool_info(first_attached_dev.target_name)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 100, in get_pool_info
o, e, rc = run_command(cmd)
File “/opt/rockstor/src/rockstor/system/osi.py”, line 107, in run_command
raise Exception(msg)
Exception: Exception while running command([’/sbin/btrfs’, ‘fi’, ‘show’, ‘/dev/disk/by-id/ata-KINGSTON_SV300S37A60G_50026B72320277D8-part3’]): [Errno 24] Too many open files
[13/Aug/2017 20:38:52] ERROR [storageadmin.middleware:32] Exception occured while processing a request. Path: /api/commands/refresh-share-state method: POST
[13/Aug/2017 20:38:52] ERROR [storageadmin.middleware:33] Exception while running command([’/sbin/btrfs’, ‘subvolume’, ‘list’, ‘-s’, ‘/mnt2/pool-1’]): [Errno 24] Too many open files
Traceback (most recent call last):
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/handlers/base.py”, line 132, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/views/decorators/csrf.py”, line 58, in wrapped_view
return view_func(*args, **kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/views/generic/base.py”, line 71, in view
return self.dispatch(request, *args, **kwargs)
File “/opt/rockstor/eggs/djangorestframework-3.1.1-py2.7.egg/rest_framework/views.py”, line 452, in dispatch
response = self.handle_exception(exc)
File “/opt/rockstor/eggs/djangorestframework-3.1.1-py2.7.egg/rest_framework/views.py”, line 449, in dispatch
response = handler(request, *args, **kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/utils/decorators.py”, line 145, in inner
return func(*args, **kwargs)
File “/opt/rockstor/src/rockstor/storageadmin/views/command.py”, line 318, in post
import_shares(p, request)
File “/opt/rockstor/src/rockstor/storageadmin/views/share_helpers.py”, line 74, in import_shares
shares_d = shares_info(pool)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 421, in shares_info
o, e, rc = run_command([BTRFS, ‘subvolume’, ‘list’, ‘-s’, mnt_pt])
File “/opt/rockstor/src/rockstor/system/osi.py”, line 107, in run_command
raise Exception(msg)
Exception: Exception while running command([’/sbin/btrfs’, ‘subvolume’, ‘list’, ‘-s’, ‘/mnt2/pool-1’]): [Errno 24] Too many open files
[13/Aug/2017 20:38:52] ERROR [storageadmin.middleware:32] Exception occured while processing a request. Path: /api/commands/refresh-snapshot-state method: POST
[13/Aug/2017 20:38:52] ERROR [storageadmin.middleware:33] Exception while running command([’/sbin/btrfs’, ‘subvolume’, ‘list’, ‘-u’, ‘-p’, ‘-q’, ‘/mnt2/pool-1’]): [Errno 24] Too many open files
Traceback (most recent call last):
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/core/handlers/base.py”, line 132, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/views/decorators/csrf.py”, line 58, in wrapped_view
return view_func(*args, **kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/views/generic/base.py”, line 71, in view
return self.dispatch(request, *args, **kwargs)
File “/opt/rockstor/eggs/djangorestframework-3.1.1-py2.7.egg/rest_framework/views.py”, line 452, in dispatch
response = self.handle_exception(exc)
File “/opt/rockstor/eggs/djangorestframework-3.1.1-py2.7.egg/rest_framework/views.py”, line 449, in dispatch
response = handler(request, *args, **kwargs)
File “/opt/rockstor/eggs/Django-1.8.16-py2.7.egg/django/utils/decorators.py”, line 145, in inner
return func(*args, **kwargs)
File “/opt/rockstor/src/rockstor/storageadmin/views/command.py”, line 323, in post
import_snapshots(share)
File “/opt/rockstor/src/rockstor/storageadmin/views/share_helpers.py”, line 139, in import_snapshots
share.name)
File “/opt/rockstor/src/rockstor/fs/btrfs.py”, line 485, in snaps_info
mnt_pt])
File “/opt/rockstor/src/rockstor/system/osi.py”, line 107, in run_command
raise Exception(msg)
Exception: Exception while running command([’/sbin/btrfs’, ‘subvolume’, ‘list’, ‘-u’, ‘-p’, ‘-q’, ‘/mnt2/pool-1’]): [Errno 24] Too many open files

Hi again!

I’m running 3.9.1-0, stable

Rebooted Rockstor server. After that I could again login to web UI. After about one day I could not log on with the same error.

Also received this from Rockstor notifying of a failed scrub job:
Subject: Cron root@nitrogen /opt/rockstor//bin/st-pool-scrub 1 *-----*

Traceback (most recent call last):
File “/usr/lib64/python2.7/logging/handlers.py”, line 76, in emit
if self.shouldRollover(record):
File “/usr/lib64/python2.7/logging/handlers.py”, line 154, in shouldRollover
msg = “%s\n” % self.format(record)
File “/usr/lib64/python2.7/logging/init.py”, line 724, in format
return fmt.format(record)
File “/usr/lib64/python2.7/logging/init.py”, line 464, in format
record.message = record.getMessage()
File “/usr/lib64/python2.7/logging/init.py”, line 324, in getMessage
msg = str(self.msg)
TypeError: str returned non-string (type list)
Logged from file pool_scrub.py, line 78
Traceback (most recent call last):
File “/usr/lib64/python2.7/logging/handlers.py”, line 76, in emit
if self.shouldRollover(record):
File “/usr/lib64/python2.7/logging/handlers.py”, line 154, in shouldRollover
msg = “%s\n” % self.format(record)
File “/usr/lib64/python2.7/logging/init.py”, line 724, in format
return fmt.format(record)
File “/usr/lib64/python2.7/logging/init.py”, line 464, in format
record.message = record.getMessage()
File “/usr/lib64/python2.7/logging/init.py”, line 324, in getMessage
msg = str(self.msg)
TypeError: str returned non-string (type list)
Logged from file pool_scrub.py, line 39

OK;

I checked the total number of open files on my Rockstor system using lsof | wc -l
Turn out that on the newly rebooted system it’s about 6500. At that point no errors. I then left it overnight, with no user load and no scheduled jobs. In the morning 11727 files where open, and the login error described above reoccured.

ulimit -a seems to say that there is a per user limit of 1024 open files.
My notebook running Fedora currently has about 102000 open files

Either some system limit is preventing necessary files to be opened or some process causing to many files to be opened.

Is there a way to reliably allowing more open files?

@peter Hello again.

This looks more like an issue reported by @magicalyak in the following thread:

And I don’t currently think it’s related to the 'Too many open files" thing.

But on the ‘Too many open files’ front, have a look at the following open issue where some progress has been made by the reporter, but currently on alleviating rather than fixing the issue. Interestingly their ‘open files’ look to be different from your own:

https://github.com/rockstor/rockstor-core/issues/1656

gunicorn 13354 13367 root 615u REG 0,40 592 9875127 /tmp/exports (deleted)

Have a look and see if there is anything you can contribute to that issue. Still gunicorn though.

Thanks to @phillxnet and @magicalyak there two workarounds for the two issues that works for me:

Workaround 1: Schedule a nightly reboot of rockstor. This should let you use the web Ui at all time

Workaround 2: Delete any scheduled scrub jobs, and run the manually from the Pools page instead

1 Like