Intel RST (motherboard software RAID) plus a few more Questions, Fibrechannel, Linkaggregation, and filesystem distribution

joka-fx · March 24, 2016, 1:26pm

Hi and thank you for your work!

I’ve just learned about Rockstor two week ago and decided to try it out. Did sound promising with btrfs and an easy way of sharing files to different platforms.

I downloaded Rockstor 3.8-12 and installed it on an Supermicro 6018U-TR4T+ with two SSD in a Raid-1 (Intel motherboard controller sSata) for the OS installtion. The installation did see the Raid disk and I used standard installation. Both SSD blinked and did install the OS, so the motherboard raid seemd to work. Then when the installation was finished and I rebooted into Rockstor, it bypassed the motherboard (fake raid) and booted on only one of the drives. Changes isn’t synced between the disks. After a few reboots the motherboard decides that the other disk is in lead and become primary. Rockstor starts but then all the changes is lost. So clearly the motherboard raid isn’t working when running Rockstor. Also in the OS the disks is presented sda, sda1, sda2, sda3 and sdb when the first disk is the active, and after reboot and if the second is active, then sda, sdb, sdb1, sdb2, sdb3. No md device present.

Next I installed a Qlogic QLE2672 dual FC 16GB/s card to the server and hooked it up to a Infortrend DS4024RUCB unit. This hardware raid unit has 24pcs 1,2Tb disks and I created two 12 disk raid-6 volumes and mapped the partitions to separate lun and fc channels. Nicely the disks present themselves as sdc and sdd via FC as transport. I create a single (hardware raid) pool with these two disks. My question is: how is the system distributing the filesystem between the disks when writing files, randomly or some loadbalance algorithm or should I do a btrfs raid-0?

After that I tried to modify network settings (command line) to create a bonding interface. It worked until I touched the network settings in the gui, then all was overwritten. So I reinstalled trying to create the interfaces with Network Manager during installation. Network Manager has great support for different types of interfaces. But it’s a bit touchy when using the GUI from within Rockstor. Configuration changes…

Next step is to buy the license, but then I noticed that it created a unique identifier everytime I reinstall the product. So I guess I have to wait until the “last” installation? Or can you generate a license the mapps to the host nic mac address or something so it is more easy to recover/reinstall?

Thinking about using it for production, but there are a few very small nice to have features that might stop me and make me install a full Centos 7. As described above motherboard raid for the operating system, lacp link aggregation for the network, and also quick access for usb disks connected to the server in case of temporary backup. Do you have any timeframe/roadmap for this?

Thank you! Looking forward to see this product evolve and also buy it to support you!

phillxnet · March 24, 2016, 2:24pm

@joka-fx Welcome to the Rockstor community. I’m afraid I’m going to have to be a little shorter in my answer than I would like as I have to dash but just wanted to chip in that I have seen this failure of the install program to initialize the IRST properly. For me I think I noticed that my bios boot stated that the drive was still initializing and I saw the absence of md126 device with it’s 3 partitions ie rockstor_rockstor on /dev/md126p3. I believe there to be a bug in the installer and unfortunately I haven’t made good enough notes on it just yet. But from a quick look at what notes I did make it seems that when I installed again I took a slightly different path through the installer ie non auto and btrfs scheme and then the install ended up as expected.
My notes state I also went back and ticked auto but once I’d first done non auto and btrfs scheme things proceeded as expected. Obviously as you did I selected the presented bios disk.
Sorry not very clear but I have seen this and it looks like during install one has to do non auto and select btrfs scheme, first then the bios raid is initialized correctly. Once done the disks page should look like the image in the following post:

There was a forum post that in turn links to an older one re “Reinstall activation” that may help with that question.

Thanks for the interest and let us know how that install quirk goes, we have inherited it I’m afraid as we use the installer neat from our CentOS base and it’s a subtle problem that is time consuming to root out. I will try a re-install on my bios raid system here and report back but sorry got to dash now.

Spectre694 · March 24, 2016, 8:48pm

I can only really answer the FS questions.

If you have only presented btrfs with one “disk” then it will do no load balancing all of the actual data allocation is handled by the hardware RAID it will put the data on the disk how it best sees fit. (I’m guessing you made sdc and sdd into a RAID 0 array)

If you feed btrfs sdc and sdd (making a raid 0 again) directly then you will have the same thing the RAID 6 data allocation will be done in hardware and the final striping across sdc and sdd is handled by btrfs

either way gives you the same result though all the redundancy is handled by your hardware raid and btrfs is unable to self heal although it can still tell you when something is wrong.

joka-fx · March 25, 2016, 9:34am

Thank you Philip,
by manually partition the system drive I got the md driver working. I had to do the installation a few times over. Got me spending most of the evening yesterday doing it. The Anaconda installer isn’t the most stable. Remember it to be that way a few fedora releases back.

I also tried creating a network bond interface during the install process with Anaconda. That got me a working 802.3ad interface from a shell point of view. Never got the webui to initialize correct so I deleted the interfaces and did a new /opt/rockstor/bin/initrock after removing the .initrock file. I also changed the hostname in the networking menu of the Anaconda installer, it might have been that the webgui init didn’t like? Well I’ve just have to wait for official support of 802.3ad

So I think I’ve got a working installation. Updated to 3.8-12.04 and it still seem to be working. Going to do a few more tests, failover, removing disks and cut the connection on one of the fibres to see how the controllers failover and how btrfs handles it. If that works I’m putting the system on stable release and let it run.

joka-fx · March 25, 2016, 9:51am

I didn’t create a SWAP partition during install. There is 64GB local DDR4 in the machine and Rockstor do not seem to eat much.

[root@pixel rockstor]# cat /proc/meminfo
MemTotal:       65861104 kB
MemFree:        64606388 kB
MemAvailable:   64957480 kB

I don’t think this is going to be a stability issue?

phillxnet · March 25, 2016, 10:48am

@joka-fx That’s good news on the bios raid install front. I was a little surprised when it happened to me, hence the find. Anaconda is definitely getting better though but yes there is also an occasional python error message on otherwise completely normal installs. A little disconcerting but still it does afford us quite a bit of flexibility.

Re disconnecting drives etc I would caution you that btrfs is still in heavy development on that and a few other fronts (speed of raid 5/6 rebalances etc) and doesn’t yet have a concept of a bad disk so we obviously can’t yet put such things into Rockstor. But upon them appearing there is every intention of doing what we can to make things easier. As is it will be necessary to drop to command line to effect special mounts etc in disaster recovery scenarios, just a heads up.

Memory / swap wise I would say you are golden there. But some say there can still be issues with no swap at all but I think this relates to older kernels. Let us know how this goes as there is another thread/exploration on alternative install arrangements by @Spectre694 that may benefit from no swap - “Install to full disk BTRFS” to simplify things.

On the bonding interface, no this is not at the moment supported so Rockstor just doesn’t know what’s going on if it’s setup in the installer. There is a forum thread “Support bonded networking” however and an open issue “Support bonded networking”. Bit of a mismatch but Rockstor does it’s own network config so we can have such things as dedicated management interfaces etc so can’t really inherit in the same way as a generic CentOS from the installer. Best, as you have found, to stick to as few changes in the installer as possible really. Not as neat as we would like but we gain a great deal from inheriting it. Maybe in the future it could be customized beyond kickstart config and branding but at the moment I don’t think this is a priority.

Thanks again for the reporting and testing, it definitely helps to surface the things of interest and highlight weak points.

I expect others will chip in here with more comments as this is some nice equipment. I’m not sure of the reliance on hardware raid myself but then given my previous comments maybe this could act as a stop gap until btrfs gets it hot spare and dead drive patches merged.

Don’t know of a time frame but it’s definitely under consideration and now has it’s own issue “Support backup to USB devices” and a couple of forum threads have brought it up: “[Where Rockstor could improve…] (Where Rockstor could improve compared to other NAS solutions?)” and this one. There has been some progress towards this in the background however, ie drive roles and improved smart custom options.

Do keep the forum posted on your progress and findings.

joka-fx · March 26, 2016, 2:58pm

Since this a setup with a hardware raid dual controller system connected with redundancy to the HBA. I kept waiting for the controllers to fail and resume trafik on the other controller when pulling cables etc… That did not happen…!

Well after a break it just hit me that I’m doing it all to simple, frankly to be honest completely wrong. First of I was only mapping the disks to one controller each. Wich means sdc → controller A and sdd on controller B. That will never be redundant. So back to the hardware raid unit and configure host lun mappings. Put the two hw-raid disks out on both controllers.

That presents the next problem. Now the two disks will appear as four to the system. sdc, sdd, sde and sdf, where sdc and sde is actually the same disk and also sdd and sdf. And the Rockstor GUI will start to complain about “Warning! Disk serial number is not legitimate or unique.”

Ok, this will not work, to map the four devices, where two of them are the same. This is multipathing and the kernel have multipath modules. So after a quick look I notice that the device-mapper-multipath rpm is not installed. Quickly installing it and copying the the default multipath.conf to /etc . loading the module “modprobe dm-multipath” and enabling the service “systemctl enable multipathd” “systemctl start multipathd” After that I run the command “multipath -v2” and that gives me a config with two mpathX devices

[root@pixel multipath]# multipath -v2
create: mpatha (3600d0231000570111684d7087090645f) undef IFT     ,DS 4000 Series
size=11T features='0' hwhandler='0' wp=undef
|-+- policy='service-time 0' prio=1 status=undef
| `- 4:0:0:0 sdc 8:32 undef ready running
`-+- policy='service-time 0' prio=1 status=undef
  `- 5:0:0:0 sde 8:64 undef ready running
create: mpathb (3600d02310005701132963c685c6c3388) undef IFT     ,DS 4000 Series
size=11T features='0' hwhandler='0' wp=undef
|-+- policy='service-time 0' prio=1 status=undef
| `- 4:0:0:1 sdd 8:48 undef ready running
`-+- policy='service-time 0' prio=1 status=undef
  `- 5:0:0:1 sdf 8:80 undef ready running

and looking at the command mpathconf gives the following output

[root@pixel dev]# mpathconf
multipath is enabled
find_multipaths is enabled
user_friendly_names is enabled
dm_multipath module is loaded
multipathd is running

and in the /dev I’ve got dm-0 and dm-1 devices

[root@pixel dev]# ls -l /dev/mapper/
total 0
crw------- 1 root root 10, 236 Mar 26 15:46 control
lrwxrwxrwx 1 root root       7 Mar 26 15:46 mpatha -> ../dm-1
lrwxrwxrwx 1 root root       7 Mar 26 15:46 mpathb -> ../dm-0

The mpatha and mpathb disks also appear in the disks section of the webgui. So does sdc, sdd, sde and sdf. Only mpatha and mpathb is selectable when trying to create a pool. Might be that it’s old config left. Anyway thats the only two disks I’d like to select.

And finally when trying to create the storage pool with mpatha and mpathb I’ve got an error

Error running a command. cmd = [‘/sbin/mkfs.btrfs’, ‘-f’, ‘-d’, ‘single’, ‘-m’, ‘single’, ‘-L’, ‘test_storage’, ‘/dev/mpatha’, ‘/dev/mpathb’]. rc = 1. stdout = [‘btrfs-progs v4.4.1’, ‘See http://btrfs.wiki.kernel.org for more information.’, ‘’, ‘’]. stderr = [“Failed to check size for ‘/dev/mpatha’: No such file or directory”, ‘’]

joka-fx · March 26, 2016, 8:08pm

So I got the multipath driver to work, sadly not with Rockstor. First I edited the /etc/multipath.conf to add knowledge for my storage system

/etc/multipath.conf

    device {
            vendor "IFT"
            path_grouping_policy group_by_prio
#       getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
            path_checker readsector0
            path_selector "round-robin 0"
            hardware_handler "0"
            failback 15
            rr_weight uniform
            no_path_retry 12
            prio alua
    }

This got rid of the undefined status.

[root@pixel /]# multipath -ll
mpathb (3600d02310005701132963c685c6c3388) dm-0 IFT     ,DS 4000 Series
size=11T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 5:0:0:1 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 4:0:0:1 sdd 8:48 active ready running
mpatha (3600d0231000570111684d7087090645f) dm-1 IFT     ,DS 4000 Series
size=11T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 5:0:0:0 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 4:0:0:0 sdc 8:32 active ready running

Then I decided to delete all my storage partitions to begin with a clean raid setup for storage. Created a 8+8+7+1 disk setup. Three Raid-5 Volumes and a Spare. Mapped the three new partitions to both fc channels and looked at my config

[root@pixel ~]# more /etc/multipath/wwids
 Valid WWIDs:
/3600d0231000570111684d7087090645f/	# delete this
/3600d02310005701132963c685c6c3388/	# delete this
/3600d0231000570112fc2c97736201f2a/
/3600d02310005701135bf47155071a3f1/
/3600d0231000570113d70af4c7efb1751/

[root@pixel ~]# more /etc/multipath/bindings
mpatha 3600d0231000570111684d7087090645f	# delete this
mpathb 3600d02310005701132963c685c6c3388	# delete this
mpathc 3600d0231000570112fc2c97736201f2a
mpathd 3600d02310005701135bf47155071a3f1
mpathe 3600d0231000570113d70af4c7efb1751
[root@pixel ~]#

As you can see I still got the old drives in the config so I just deleted the two top rowes, and renamed the last three aliases so the new drives become mpatha, mpathb and mpathc

[root@pixel ~]# multipath -ll
mpathc (3600d0231000570113d70af4c7efb1751) dm-2 IFT     ,DS 4000 Series
size=7.6T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 5:0:0:0 sdf 8:80  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:0:0:0 sdc 8:32  active ready running
mpathb (3600d0231000570112fc2c97736201f2a) dm-1 IFT     ,DS 4000 Series
size=6.5T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 5:0:0:2 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:0:0:2 sde 8:64  active ready running
mpatha (3600d02310005701135bf47155071a3f1) dm-0 IFT     ,DS 4000 Series
size=7.6T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:0:1 sdd 8:48  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 5:0:0:1 sdg 8:96  active ready running

So now I got rid of all old config and tried to create a new pool. Sadly that did not work so I looked at the command in the error message and there is a fault in the device mappings. The system tries to create a pool with /dev/mpatha /dev/mpathb and /dev/mpathc. they do not exist.

/dev/dm-0 /dev/dm-1 and /dev/dm-2 exists or rather /dev/mapper/mpatha /dev/mapper/mpathb and /dev/mapper/mpathc

so I ran the command

[root@pixel ~]# /sbin/mkfs.btrfs -f -d single -m single -L test_storage /dev/mapper/mpatha /dev/mapper/mpathb /dev/mapper/mpathc
btrfs-progs v4.4.1
See http://btrfs.wiki.kernel.org for more information.

Label:              test_storage
UUID:               0a53e0d6-6b6e-47b2-9934-9661b12fde07
Node size:          16384
Sector size:        4096
Filesystem size:    21.81TiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         single            8.00MiB
  System:           single            4.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  3
Devices:
   ID        SIZE  PATH
    1     7.63TiB  /dev/mapper/mpatha
    2     6.54TiB  /dev/mapper/mpathb
    3     7.63TiB  /dev/mapper/mpathc

[root@pixel ~]# btrfs fi show
Label: 'rockstor_pixel'  uuid: 2977b322-3810-476e-af22-7432e59c24c3
        Total devices 1 FS bytes used 5.37GiB
        devid    1 size 95.09GiB used 8.04GiB path /dev/md126p2

Label: 'test_storage'  uuid: 0a53e0d6-6b6e-47b2-9934-9661b12fde07
        Total devices 3 FS bytes used 112.00KiB
        devid    1 size 7.63TiB used 20.00MiB path /dev/mapper/mpatha
        devid    2 size 6.54TiB used 0.00B path /dev/mapper/mpathb
        devid    3 size 7.63TiB used 0.00B path /dev/mapper/mpathc

As this is created in the command prompt it will not show up in the GUI.

Next test is to make symbolic links in /dev mapping mpath[a,b,c] to dm-[0,1,2].

Just one problem… have to delete the manually created btrfs pool…

joka-fx · March 26, 2016, 8:59pm

Thank you for the wipefs -a command…

After creating the symbolic links I could create the storage_pool from within the gui on the multipath devices. Only one problem… The /dev filesystem is a tempfs and my symlinks will be lost after next reboot.

This was actually more easy to solve. Just rename the alias of the interfaces in the /etc/multipath/bindings file to dm-0, dm-1 and dm-2. The same names that where already present in /dev

Now I can create a storage pool on the dm-0, dm-1 and dm-2 devices but that was also a disaster… didn’t survive reboot and even if only dm-[0-2] was selected, after creation the storage pool shows dm-[0-2] and sd[c-h] as members…

joka-fx · March 27, 2016, 6:31am

With a really sad face I’ll have to abandon the idea of running Rockstor on this system for now. Maybe I’m doing it all wrong, but my verdict is that there is no love between the Rockstor GUI interface and the under laying Linux platform. You guys have done a tremendous great work as far as I seen to make it easy to use, to have access to the btrfs filesystem, and an easy way of getting for the moment a “home” NAS server up’n running. Remember that there’s a Ferrari running below at the operating system level. Linux has evolved to become part of every large datacenter around the world. Do not try to invent the wheel again. Do not box it into a corner. This might be great for some environments, but it’s always the problem to keep on par with upstream features, to convert and implement. The system I’ve done tests on needs to be delivered into production and operational with full redundancy on Wednesday, so I have run out of time and need to start from the beginning doing it the old fashioned way. But one thing I do know is that I’ll will be back with another system later on testing these features again! Also my future home NAS will be running Rockstor!

joka-fx · March 27, 2016, 7:42am

One quick notice when doing the Centos 7 (5111) install on the same system regarding the Intel RST that we got working with a few troublesome moments earlier. In Centos the system is mapping the iRST via I think the device-mapper-multipath. I can see dm-0, dm-1 and dm-2 devices from the basic system setup.

[root@pixel ~]# more /etc/fstab
/dev/mapper/centos-root /                       xfs     defaults        0 0
UUID=1bafe9a8-3a11-46d8-b527-278c870ae662 /boot                   xfs     defaults        0 0
/dev/mapper/centos-home /home                   xfs     defaults        0 0
/dev/mapper/centos-swap swap                    swap    defaults        0 0

[root@pixel ~]# ls -l /dev/mapper/
totalt 0
lrwxrwxrwx. 1 root root       7 27 mar 09.03 centos-home -> ../dm-2
lrwxrwxrwx. 1 root root       7 27 mar 09.03 centos-root -> ../dm-0
lrwxrwxrwx. 1 root root       7 27 mar 09.03 centos-swap -> ../dm-1
crw-------. 1 root root 10, 236 27 mar 09.03 control

[root@pixel dev]# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=32914904k,nr_inodes=8228726,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/centos-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=28,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
/dev/mapper/centos-home on /home type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/md126p1 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=6586212k,mode=700)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=6586212k,mode=700,uid=1000,gid=1000)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

So maybe if you include device-mapper-multipath and device-mapper-multipath-libs in the install archive?