Rockstor 3.8-13 freezes 1-2 times per day during big chunks of file copying

I am on Rockstor 3.8-13 since a few weeks back with 4.4.5-1.el7.elrepo.x86_64.

My issue is a little bit like System freeze during writting access but there are some differences like that I do not get 100% CPU when this happens.

When I do a file copy from a windows share on my local network over to Rockstor I have had 1-2 Rockstor freezes per day 3 days in a row now during these big file copy processes.
I had to reboot the Rockstor to get it up and running again.

I cant see that the Rockstor CPU:s are running high and the file transfer runs with pretty nice high speed around 80-95MByte/second.

In all three cases the freeze has come at the end of the file copy process which is around 10-20K files that has been copied from a Windows 10 share over to a Rockstor share.

  1. Since I am not as good on Linux as Windows I am not good on knowing where to pull out debug logs etc so I need your guidance on where to start to look.

  2. How much info do you need of my hardware installation, is the below data enough?
    System:
    Motherboard ASUS PF/-X with 16GB RAM, Intel Xenon X3440.
    Using the onboard NIC Marvell® 8056 PCI-E GbE LAN.
    Storage is 8x4TB WD Green connected to an IBM ServeRAID M1015 8-CH SAS-SATA PCI-E - 46M0861.
    Boot disk is 2x16GB Transcend TS16GMSA370 mSATA on a LogiLink PC0078 PCI-Express card running in RAID 0 mode as explained in http://rockstor.com/docs/mdraid-mirror/boot_drive_howto.html.
    Everything mounted in a Define R5 chassi with a Corsair 1000W PSU.

Only clue I have is that all 3 complete system freezes/hangs has been during file copy processes of MANY files from a Windows share into a Rockstor Samba share (10-20K files).

Hi @CrazyFin and welcome to Rockstor Community (I say welcome because this seems to be your first message)

If you’re running Rockstor over testing channel you should have “Logs Manager” available for log reading, otherwise you’ll have to check for logs via root shell.

Useful logs are rockstor.log and samba logs ( first one is under /opt/rockstor/var/log, second one is on /var/log/samba/log.smbd , Rockstor and Smbd (Samba) from “Logs Manager” )

Let us know / see your logs

Flyer

1 Like

Aahh thanks. I was looking in /var/log/… and found the samba-logs but no other logs that could be useful.
However the rockstor.log in /opt/rockstor/var/log seems to have some activity and I do see some error messages there.

How do you prefer that one uploads logfiles for investigation and does the rockstor.log-file contain any sensitive info that should be masked out?

Hi @CrazyFin, just copy&paste interesting parts here using code tag (that one next to quotes and before upload)

Flyer

@Flyer

I see Hyperlink, Blockquote, Preformatted text and finally Upload but no “code tag” that I can use for pasting in code parts like I see in other forums. (Looks like CTRL+SHIFT+C, “preformatted text” is the one?)

Here are some error lines that concerns me:

This error has not caused the freezes though since Rockstor is still up and running:
[30/May/2016 22:55:04] ERROR [storageadmin.views.network:85] Unknown ctype: bridge config: {}

And between 16:01:01 I see a long pause until I restarted the system at 21:01:40. I think my freeze happened during this time so the rockstor.log does not show to much of good info.

I blanked out passwords and email-addresses with XXX:s. Was surprised to see the passwords in clear text there!
[30/May/2016 16:00:34] DEBUG [storageadmin.util:48] Current Rockstor version: 3.8-13 [30/May/2016 16:01:01] ERROR [storageadmin.util:46] request path: /api/users method: POST data: {u'username': u'xxxxxxxxxxxxx', u'public_key': xxxxx, u'shell': u'/bin/bash', u'group': u'xxxxxxxxxxx', u'uid': No ne, u'xxxxxxxxxxxxx': False, u'password': u'XXXXXXXXXXXXXX', u'email': u'XXXXXXXXXXXXXXXXXXXXX'} [30/May/2016 16:01:01] ERROR [storageadmin.util:47] exception: User(XXXXXXX) already exists. Please choose a different username Traceback (most recent call last): File "/opt/rockstor/eggs/gunicorn-0.16.1-py2.7.egg/gunicorn/workers/sync.py", line 34, in run client, addr = self.socket.accept() File "/usr/lib64/python2.7/socket.py", line 202, in accept sock, addr = self._sock.accept() error: [Errno 11] Resource temporarily unavailable [30/May/2016 16:01:01] DEBUG [storageadmin.util:48] Current Rockstor version: 3.8-13 [30/May/2016 21:01:40] DEBUG [smart_manager.data_collector:403] Listening on port http://127.0.0.1:8080 and on port 10843 (flash policy server)

Looking in the Samba-log-file between 20:00 and 21:10 I see this:

A lot of “Could not find child”-messages and it looks like Samba stopped around 20:13:11 and then started again at 21:01:45 when I had rebooted the server from the freeze.
Not much help in this log either… :frowning:

[2016/05/30 20:09:11.040345, 2] ../source3/smbd/server.c:443(remove_child_pid) Could not find child 15495 -- ignoring [2016/05/30 20:10:11.044097, 2] ../source3/smbd/server.c:443(remove_child_pid) Could not find child 15617 -- ignoring [2016/05/30 20:11:11.047555, 2] ../source3/smbd/server.c:443(remove_child_pid) Could not find child 15744 -- ignoring [2016/05/30 20:11:54.844735, 2] ../source3/smbd/server.c:443(remove_child_pid) Could not find child 15868 -- ignoring [2016/05/30 20:12:11.052978, 2] ../source3/smbd/server.c:443(remove_child_pid) Could not find child 15869 -- ignoring [2016/05/30 20:13:11.065157, 2] ../source3/smbd/server.c:443(remove_child_pid) Could not find child 15991 -- ignoring [2016/05/30 21:01:45, 0] ../source3/smbd/server.c:1241(main) smbd version 4.2.10 started. Copyright Andrew Tridgell and the Samba Team 1992-2014 [2016/05/30 21:01:45, 2] ../source3/lib/tallocmsg.c:124(register_msg_pool_usage) Registered MSG_REQ_POOL_USAGE [2016/05/30 21:01:45, 2] ../source3/lib/dmallocmsg.c:78(register_dmalloc_msgs) Registered MSG_REQ_DMALLOC_MARK and LOG_CHANGED [2016/05/30 21:01:45.579105, 3] ../source3/param/loadparm.c:3653(lp_load_ex) lp_load_ex: refreshing parameters [2016/05/30 21:01:45.579148, 3] ../source3/param/loadparm.c:544(init_globals) Initialising global parameters [2016/05/30 21:01:45.579215, 3] ../source3/param/loadparm.c:2596(lp_do_section) Processing section "[global]" [2016/05/30 21:01:45.579259, 2] ../source3/param/loadparm.c:2613(lp_do_section) Processing section "[juhas]" [2016/05/30 21:01:45.579316, 2] ../source3/param/loadparm.c:2613(lp_do_section) Processing section "[music]" [2016/05/30 21:01:45.579366, 2] ../source3/param/loadparm.c:2613(lp_do_section) Processing section "[pictures]" [2016/05/30 21:01:45.579415, 2] ../source3/param/loadparm.c:2613(lp_do_section) Processing section "[videos]" [2016/05/30 21:01:45.579462, 2] ../source3/param/loadparm.c:2613(lp_do_section) Processing section "[movies]" [2016/05/30 21:01:45.579535, 3] ../source3/param/loadparm.c:1493(lp_add_ipc) adding IPC service

And after the reboot I see these log lines. Do they indicate some kind of error with my boot disk setup?

[30/May/2016 21:01:40] DEBUG [smart_manager.data_collector:403] Listening on port http://127.0.0.1:8080 and on port 10843 (flash policy server) [30/May/2016 21:01:42] ERROR [storageadmin.views.command:75] Exception while refreshing state for Pool(rockstor_rockstor). Moving on: Error running a command. cmd = ['/sbin/btrfs', 'fi', 'show', '/dev/md125']. rc = 1. stdout = ['']. stderr = [''] [30/May/2016 21:01:42] ERROR [storageadmin.views.command:76] Error running a command. cmd = ['/sbin/btrfs', 'fi', 'show', '/dev/md125']. rc = 1. stdout = ['']. stderr = [''] Traceback (most recent call last): File "/opt/rockstor/src/rockstor/storageadmin/views/command.py", line 67, in _refresh_pool_state pool_info = get_pool_info(fd.name) File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 71, in get_pool_info o, e, rc = run_command(cmd) File "/opt/rockstor/src/rockstor/system/osi.py", line 92, in run_command raise CommandException(cmd, out, err, rc) CommandException: Error running a command. cmd = ['/sbin/btrfs', 'fi', 'show', '/dev/md125']. rc = 1. stdout = ['']. stderr = [''] [30/May/2016 21:01:42] ERROR [storageadmin.views.command:97] Exception while mounting a share(home) during bootstrap: Error running a command. cmd = ['/bin/mount', '-t', 'btrfs', '-o', 'subvol=home', '/dev/md125' , '/mnt2/home']. rc = 32. stdout = ['']. stderr = ['mount: /dev/md125 is already mounted or /mnt2/home busy', ' /dev/md125 is already mounted on /boot', ''] [30/May/2016 21:01:42] ERROR [storageadmin.views.command:98] Error running a command. cmd = ['/bin/mount', '-t', 'btrfs', '-o', 'subvol=home', '/dev/md125', '/mnt2/home']. rc = 32. stdout = ['']. stderr = ['mount : /dev/md125 is already mounted or /mnt2/home busy', ' /dev/md125 is already mounted on /boot', ''] Traceback (most recent call last): File "/opt/rockstor/src/rockstor/storageadmin/views/command.py", line 93, in post mount_share(share, mnt_pt) File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 239, in mount_share return run_command(mnt_cmd) File "/opt/rockstor/src/rockstor/system/osi.py", line 92, in run_command raise CommandException(cmd, out, err, rc) CommandException: Error running a command. cmd = ['/bin/mount', '-t', 'btrfs', '-o', 'subvol=home', '/dev/md125', '/mnt2/home']. rc = 32. stdout = ['']. stderr = ['mount: /dev/md125 is already mounted or /mnt2/ho me busy', ' /dev/md125 is already mounted on /boot', ''] [30/May/2016 21:01:42] ERROR [storageadmin.views.command:97] Exception while mounting a share(root00) during bootstrap: Error running a command. cmd = ['/bin/mount', '-t', 'btrfs', '-o', 'subvol=root00', '/dev/md 125', '/mnt2/root00']. rc = 32. stdout = ['']. stderr = ['mount: /dev/md125 is already mounted or /mnt2/root00 busy', ' /dev/md125 is already mounted on /boot', ''] [30/May/2016 21:01:42] ERROR [storageadmin.views.command:98] Error running a command. cmd = ['/bin/mount', '-t', 'btrfs', '-o', 'subvol=root00', '/dev/md125', '/mnt2/root00']. rc = 32. stdout = ['']. stderr = ['m ount: /dev/md125 is already mounted or /mnt2/root00 busy', ' /dev/md125 is already mounted on /boot', ''] Traceback (most recent call last): File "/opt/rockstor/src/rockstor/storageadmin/views/command.py", line 93, in post mount_share(share, mnt_pt) File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 239, in mount_share return run_command(mnt_cmd) File "/opt/rockstor/src/rockstor/system/osi.py", line 92, in run_command raise CommandException(cmd, out, err, rc) CommandException: Error running a command. cmd = ['/bin/mount', '-t', 'btrfs', '-o', 'subvol=root00', '/dev/md125', '/mnt2/root00']. rc = 32. stdout = ['']. stderr = ['mount: /dev/md125 is already mounted or /mnt

I just realised that this error message shows up in the log when I enable the dashboard plugin “Network”.
Nothing is shown when I select my network interface “Wired 1” (or the two others “enp5s0” or “docker0”. “enp5s0” is not connected so I understand why that one is not showing anything in the graph)

Hi @CrazyFin, I ask for @phillxnet help too cos’ currently I’m not running docker over Rockstor.

Reading your logs it seems there’s something crashing samba (probably something over the Rock-on installed)

@Flyer and @phillxnet

Flyer, thanks for pointing me towards samba and some Rock-On that I have installed!

I realise that I haven’t had a freeze since last night and I have been doing more file copying from my “home Windows server” over to RockStor as well as moving large chunks of files within RockStor shares and no freeze at all.
Only difference is that I have not been running Logitech Softsqueeze player on one of my Windows machines against the Logitech Squeezebox Rock-On during since last night (eventhough the Logitech Rock-On has been enabled all the time).

I´ll keep RockStor running for a few days more without playing music files from SoftSqueeze and then I´ll test again with starting to play music files from my Windows machine with SoftSqueeze connected to the Logitech Rock-On.

I´ll report back in 5-7 days or so. (Or earlier if RockStor still freezes before I start playing music files.)

2 Likes

Happy to hear this @CrazyFin, update us :wink:

Generally speaking about Samba : it’s our “connection” between *nix and win worlds, but when transfering many files - example : the initial data transfer when migrating from one system to another one - isn’t the best solution (you should consider nfs, ftp, others + permissions rewrite once transfered - I’m saying this considering my personal experience based on a 400K files system and different transfers during years)

Flyer/Mirko