VTd passthru of LSI2008 HBA not detecting disks in 3.8-10 Rockstor release

WORKING util-linux and python ver’s in CentOS 7.2 (1511) detecting disks. Have not applied ‘yum update -y’ yet, abt to pull the trigger though.

[root@localhost ~]# yum info util-linux | grep Release
Release     : 26.el7
Release     : 26.el7
[root@localhost ~]# yum info  python | grep Release
Release     : 34.el7
[root@localhost ~]#

yum update -y

Dependencies Resolved

=================================================================================
 Package                 Arch      Version                      Repository  Size
=================================================================================
Installing:
 kernel                  x86_64    3.10.0-327.3.1.el7           updates     33 M
Updating:
 bind-libs-lite          x86_64    32:9.9.4-29.el7_2.1          updates    724 k
 bind-license            noarch    32:9.9.4-29.el7_2.1          updates     81 k
 dracut                  x86_64    033-360.el7_2                updates    311 k
 dracut-config-rescue    x86_64    033-360.el7_2                updates     49 k
 dracut-network          x86_64    033-360.el7_2                updates     90 k
 glibc                   x86_64    2.17-106.el7_2.1             updates    3.6 M
 glibc-common            x86_64    2.17-106.el7_2.1             updates     11 M
 gmp                     x86_64    1:6.0.0-12.el7_1             updates    280 k
 grub2                   x86_64    1:2.02-0.33.el7.centos.1     updates    1.5 M
 grub2-tools             x86_64    1:2.02-0.33.el7.centos.1     updates    3.3 M
 kernel-tools            x86_64    3.10.0-327.3.1.el7           updates    2.4 M
 kernel-tools-libs       x86_64    3.10.0-327.3.1.el7           updates    2.3 M
 libxml2                 x86_64    2.9.1-6.el7_2.2              updates    666 k
 logrotate               x86_64    3.8.6-7.el7_2                updates     66 k
 openssl                 x86_64    1:1.0.1e-51.el7_2.1          updates    711 k
 openssl-libs            x86_64    1:1.0.1e-51.el7_2.1          updates    950 k
 python-perf             x86_64    3.10.0-327.3.1.el7           updates    2.4 M
 rdma                    noarch    7.2_4.1_rc6-2.el7            updates     28 k
 tuned                   noarch    2.5.1-4.el7_2.1              updates    193 k

Transaction Summary
=================================================================================
Install   1 Package
Upgrade  19 Packages
Installed:
  kernel.x86_64 0:3.10.0-327.3.1.el7

Updated:
  bind-libs-lite.x86_64 32:9.9.4-29.el7_2.1
  bind-license.noarch 32:9.9.4-29.el7_2.1
  dracut.x86_64 0:033-360.el7_2
  dracut-config-rescue.x86_64 0:033-360.el7_2
  dracut-network.x86_64 0:033-360.el7_2
  glibc.x86_64 0:2.17-106.el7_2.1
  glibc-common.x86_64 0:2.17-106.el7_2.1
  gmp.x86_64 1:6.0.0-12.el7_1
  grub2.x86_64 1:2.02-0.33.el7.centos.1
  grub2-tools.x86_64 1:2.02-0.33.el7.centos.1
  kernel-tools.x86_64 0:3.10.0-327.3.1.el7
  kernel-tools-libs.x86_64 0:3.10.0-327.3.1.el7
  libxml2.x86_64 0:2.9.1-6.el7_2.2
  logrotate.x86_64 0:3.8.6-7.el7_2
  openssl.x86_64 1:1.0.1e-51.el7_2.1
  openssl-libs.x86_64 1:1.0.1e-51.el7_2.1
  python-perf.x86_64 0:3.10.0-327.3.1.el7
  rdma.noarch 0:7.2_4.1_rc6-2.el7
  tuned.noarch 0:2.5.1-4.el7_2.1

Complete!
[root@localhost ~]#

So after FULL system update all is still good. util-linux/python did NOT get updated/need updates, lsblk, btrfs mounts still happy as piggies in mud.

[root@localhost ~]# lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda               8:0    0    30G  0 disk
├─sda1            8:1    0   500M  0 part /boot
└─sda2            8:2    0  29.5G  0 part
  ├─centos-root 253:0    0  27.5G  0 lvm  /
  └─centos-swap 253:1    0     2G  0 lvm  [SWAP]
sdb               8:16   0  93.2G  0 disk /btrfs
sdc               8:32   0 186.3G  0 disk /btrfs-r0
sdd               8:48   0 186.3G  0 disk
sde               8:64   0 186.3G  0 disk
sdf               8:80   0 186.3G  0 disk
sr0              11:0    1  1024M  0 rom
[root@localhost ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   28G 1021M   27G   4% /
devtmpfs                 911M     0  911M   0% /dev
tmpfs                    921M     0  921M   0% /dev/shm
tmpfs                    921M  8.5M  912M   1% /run
tmpfs                    921M     0  921M   0% /sys/fs/cgroup
/dev/sda1                497M  183M  314M  37% /boot
tmpfs                    185M     0  185M   0% /run/user/0
/dev/sdb                  94G  9.9G   82G  11% /btrfs
/dev/sdc                 746G  9.9G  734G   2% /btrfs-r0
[root@localhost ~]#

Color me THROUGHLY perplexed! Got a rockstor tarball/build I can lay down on top of this working setup or can I safely assume it is a BIT more complicated/complex than that?

@whitey Well look at you go, this is great. And are you game to try the elrepo kernel?
On a Rockstor 3.8-10.03 we have:-

uname -a
Linux rockstord 4.2.5-1.el7.elrepo.x86_64 #1 SMP Tue Oct 27 12:32:38 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

I’m afraid the tarball / build question is more one for @suman, but you will need the elrepo kernel anyway really as 3.10 is ages old for btrfs.

Anytime now someone will get fed up with all this and jump in with a nice obvious point that we both missed, here’s hoping.

What, you guys already have a .03 update release out for 3.8-10 or is that internal? :smiley:
I am certainly game/down to try whatever out, long as we think/feel it has ‘legs’. This is all test/boom boom env stuff for me, can spin/up and tear down in a matter of mins. I just don’t wanna bother anyone if the ‘bang is not worth the buck’ as in if we think it is specific to me/my env (I don’t think it is/can be but hey I’ll keep an open mind) then let’s scrap it but if we feel there may be a ‘red headed gremlin’ running rampant that ‘may’ bite anyone else let’s press on.

What’s the easiest way to implement elrepo kernel? Do I need to uninstall/blacklist or just move kernel boot order in grub.conf/menu.lst?

May need a lil hand holding but not much.

One of these days we’ll put this one to bed!

I think it’s certainly worth the effort and prudent to install the elrepo kernel at least, given it’s so key to btrfs and our little investigation.
My understanding re elrepo kernel install is add the repo and install as usual:-
Instructions are available at http://elrepo.org/tiki/tiki-index.php
I know that’s a rtm type answer but other than that I don’t really know.
But you will want their ml (main line) variant, currently we have:-

yum list installed | grep kernel
kernel-ml.x86_64                4.2.5-1.el7.elrepo             @anaconda/3  

OK just had a look and I see a problem here, they appear to only now have 4.3.2-1 and 4.3.3-1 so we can’t directly compare, which is a shame.

@suman how do we install the exact same kernel as Rockstor carries.

As for 3.8-10.03 its a minor fix up from what you have already tried in the testing channel.

Updated to latest avail kernel-ml from elrepo

[root@localhost ~]# yum --enablerepo=elrepo-kernel install kernel-ml
Loaded plugins: fastestmirror
base                                                      | 3.6 kB  00:00:00
elrepo                                                    | 2.9 kB  00:00:00
elrepo-kernel                                             | 2.9 kB  00:00:00
extras                                                    | 3.4 kB  00:00:00
updates                                                   | 3.4 kB  00:00:00
elrepo-kernel/primary_db                                  | 828 kB  00:00:01
Loading mirror speeds from cached hostfile
 * base: mirror.fusioncloud.co
 * elrepo: dfw.mirror.rackspace.com
 * elrepo-kernel: dfw.mirror.rackspace.com
 * extras: centos.mirror.lstn.net
 * updates: centos.firehosted.com
Resolving Dependencies
--> Running transaction check
---> Package kernel-ml.x86_64 0:4.3.3-1.el7.elrepo will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=================================================================================
 Package         Arch         Version                  Repository           Size
=================================================================================
Installing:
 kernel-ml       x86_64       4.3.3-1.el7.elrepo       elrepo-kernel        37 M

Transaction Summary
=================================================================================
Install  1 Package

Total download size: 37 M
Installed size: 169 M
Is this ok [y/d/N]: y
Downloading packages:
warning: /var/cache/yum/x86_64/7/elrepo-kernel/packages/kernel-ml-4.3.3-1.el7.elrepo.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID baadae52: NOKEY
Public key for kernel-ml-4.3.3-1.el7.elrepo.x86_64.rpm is not installed
kernel-ml-4.3.3-1.el7.elrepo.x86_64.rpm                   |  37 MB  00:00:03
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo.org
Importing GPG key 0xBAADAE52:
 Userid     : "elrepo.org (RPM Signing Key for elrepo.org) <secure@elrepo.org>"
 Fingerprint: 96c0 104f 6315 4731 1e0b b1ae 309b c305 baad ae52
 Package    : elrepo-release-7.0-2.el7.elrepo.noarch (installed)
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo.org
Is this ok [y/N]: y
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Warning: RPMDB altered outside of yum.
  Installing : kernel-ml-4.3.3-1.el7.elrepo.x86_64                           1/1
  Verifying  : kernel-ml-4.3.3-1.el7.elrepo.x86_64                           1/1

Installed:
  kernel-ml.x86_64 0:4.3.3-1.el7.elrepo

Complete!

Rebooted into that kernel, all seems well still to add to the mystery.

[root@localhost ~]# uname -a
Linux localhost.localdomain 4.3.3-1.el7.elrepo.x86_64 #1 SMP Tue Dec 15 11:18:19 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda               8:0    0    30G  0 disk
├─sda1            8:1    0   500M  0 part /boot
└─sda2            8:2    0  29.5G  0 part
  ├─centos-root 253:0    0  27.5G  0 lvm  /
  └─centos-swap 253:1    0     2G  0 lvm  [SWAP]
sdb               8:16   0  93.2G  0 disk
sdc               8:32   0 186.3G  0 disk
sdd               8:48   0 186.3G  0 disk
sde               8:64   0 186.3G  0 disk
sdf               8:80   0 186.3G  0 disk
sr0              11:0    1  1024M  0 rom
[root@localhost ~]# btrfs filesystem show
Label: none  uuid: a7ded511-cde9-4246-b0a8-156b6811926f
        Total devices 1 FS bytes used 9.78GiB
        devid    1 size 93.16GiB used 13.03GiB path /dev/sdb

Label: none  uuid: b242eaa0-b5ab-4165-8b07-cabfb3169520
        Total devices 4 FS bytes used 9.78GiB
        devid    1 size 186.31GiB used 4.01GiB path /dev/sdc
        devid    2 size 186.31GiB used 4.00GiB path /dev/sde
        devid    3 size 186.31GiB used 3.01GiB path /dev/sdd
        devid    4 size 186.31GiB used 3.01GiB path /dev/sdf

btrfs-progs v3.19.1
[root@localhost ~]#

Anyone else here find this strange?

Rockstor 3.8-10.02 (w/ elrepo kernel 4.2.5 build included in RS release)

[root@rocket ~]# dmesg | grep mpt2sas
[    1.133956] mpt2sas version 20.100.00.00 loaded
[    1.140070] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (4047048 kB)
[    1.211747] mpt2sas0: MSI-X vectors supported: 1, no of cores: 2, max_msix_vectors: 8
[    1.212152] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 59
[    1.212155] mpt2sas0: iomem(0x00000000fd5f0000), mapped(0xffffc90000d40000), size(65536)
[    1.212157] mpt2sas0: ioport(0x0000000000004000), size(256)
[    1.352180] mpt2sas0: Allocated physical memory: size(7445 kB)
[    1.352183] mpt2sas0: Current Controller Queue Depth(3307), Max Controller Queue Depth(3432)
[    1.352184] mpt2sas0: Scatter Gather Elements per IO(128)
[   31.382445] mpt2sas0: _base_event_notification: timeout
[   31.382491] mpt2sas0: sending message unit reset !!
[   31.387923] mpt2sas0: message unit reset: SUCCESS
[   31.607713] mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:8237/_scsih_probe()!
[root@rocket ~]#

CentOS 7.2 (1511 w/ elrepo 4.3.3 kernel)

[root@localhost ~]# dmesg | grep mpt2sas
[    0.991351] mpt2sas version 20.100.00.00 loaded
[    0.998795] mpt2sas0: 32 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (2048740 kB)
[    1.075498] mpt2sas0: MSI-X vectors supported: 1, no of cores: 1, max_msix_vectors: 8
[    1.076066] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 58
[    1.076068] mpt2sas0: iomem(0x00000000fd5f0000), mapped(0xffffc90000400000), size(65536)
[    1.076069] mpt2sas0: ioport(0x0000000000004000), size(256)
[    1.196721] mpt2sas0: Allocated physical memory: size(4964 kB)
[    1.196724] mpt2sas0: Current Controller Queue Depth(3307), Max Controller Queue Depth(3432)
[    1.196725] mpt2sas0: Scatter Gather Elements per IO(128)
[    1.257870] mpt2sas0: LSISAS2008: FWVersion(19.00.00.00), ChipRevision(0x03), BiosVersion(07.37.00.00)
[    1.257874] mpt2sas0: Dell 6Gbps SAS HBA: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1C)
[    1.257875] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[    1.258517] mpt2sas0: sending port enable !!
[    1.262039] mpt2sas0: host_add: handle(0x0001), sas_addr(0x5b8ca3a0f298aa00), phys(8)
[    1.292223] mpt2sas0: port enable: SUCCESS
[root@localhost ~]#

This line concerns me/leads me to believe there is something NOT happy w/ the mpt2sas driver/firmware. (port reset: SUCCESS then failure on drivers scsi_h probe instead of port enable: SUCCESS)

[   31.607713] mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:8237/_scsih_probe()!

Also see differences in 64 BIT PCI BUS DMS v.s. 32 BIT PCI BUD DMA between not working and working.

Notice on the rockstor build it doesn’t even get to this: (that is my LSI 2008 HBA in IT mode firmware v19 which I explicitly flash all my LSI HBA’s to, work DARN well in SW raid setups like that)

[    1.257870] mpt2sas0: LSISAS2008: FWVersion(19.00.00.00), ChipRevision(0x03), BiosVersion(07.37.00.00)
[    1.257874] mpt2sas0: Dell 6Gbps SAS HBA: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1C)

@whitey Yet more good findings I would say. There is the matter that the kernel on Rockstor didn’t change between 3.8-10 (working) and it’s .02 (not working for you) update but a whole lot of other stuff did via the CentOS updates and maybe the system was unable to present those disks with the driver in that state after the updates when before them it muddled through and did present them.

I definitely think you are onto something and it’s great to see those drives showing up. So this looks like some kind of regression that was aggravated by the 4.2.5-1 ml elrepo kernel in Rockstor that isn’t suffered by the 4.3.3 elrepo kernel, or the default CentOS kernel for that matter, which is great news. Of course we would need to install the 4.2.5-1 in the vanilla CentOS 7.2 to know but I’m fairly sure that there are plans to upgrade the default kernel in Rockstor anyway soon so this may all fall into place. Given that the btrfs-progs and kernel are best kept in step maybe the next thing to try is installing this newer elrepo kernel in an updated Rockstor 3.8-10.03 and see if it sorts things. But as I say the btrfs-progs are best kept in line with the kernel version however these have recently been updated in Rockstor so probably new enough for this test (non production for now).

yum list installed | grep btrfs
btrfs-progs.x86_64              4.3.1-0.rockstor               @Rockstor-Testing

I suspect we are waiting on elrepo releasing a 4.4 kernel, not sure. I think @suman is currently pretty deep into the rock-on docker stuff at the moment so the kernel upgrade may be taking a back seat.

To my knowledge (which is very limited I might add) Rockstor uses a completely unaltered elrepo kernel, there just isn’t the people power for us to role a custom one. Although @Dragon2611 in the forums has been compiling his own, and making it available for people to try.

Note that change the default kernel expected by Rockstor so that you don’t get the warning message etc see another post and it’s answer from @Dragon2611 again :-
http://forum.rockstor.com/t/override-kernel-autoboot/798

The difference re 32 & 64 bit is strange / interplay with motherboard pci drivers maybe. You could look at available module options eg:-

modinfo mpt2sas
filename:       /lib/modules/4.2.5-1.el7.elrepo.x86_64/kernel/drivers/scsi/mpt2sas/mpt2sas.ko
version:        20.100.00.00
license:        GPL
description:    LSI MPT Fusion SAS 2.0 Device Driver
author:         Avago Technologies <MPT-FusionLinux.pdl@avagotech.com>
srcversion:     F74E004728BCB0A8B19A944
alias:          pci:v00001000d0000007Esv*sd*bc*sc*i*
alias:          pci:v00001000d0000006Esv*sd*bc*sc*i*
alias:          pci:v00001000d00000087sv*sd*bc*sc*i*
alias:          pci:v00001000d00000086sv*sd*bc*sc*i*
alias:          pci:v00001000d00000085sv*sd*bc*sc*i*
alias:          pci:v00001000d00000084sv*sd*bc*sc*i*
alias:          pci:v00001000d00000083sv*sd*bc*sc*i*
alias:          pci:v00001000d00000082sv*sd*bc*sc*i*
alias:          pci:v00001000d00000081sv*sd*bc*sc*i*
alias:          pci:v00001000d00000080sv*sd*bc*sc*i*
alias:          pci:v00001000d00000065sv*sd*bc*sc*i*
alias:          pci:v00001000d00000064sv*sd*bc*sc*i*
alias:          pci:v00001000d00000077sv*sd*bc*sc*i*
alias:          pci:v00001000d00000076sv*sd*bc*sc*i*
alias:          pci:v00001000d00000074sv*sd*bc*sc*i*
alias:          pci:v00001000d00000072sv*sd*bc*sc*i*
alias:          pci:v00001000d00000070sv*sd*bc*sc*i*
depends:        scsi_transport_sas,raid_class
intree:         Y
vermagic:       4.2.5-1.el7.elrepo.x86_64 SMP mod_unload modversions 
parm:           logging_level: bits for enabling additional logging info (default=0)
parm:           max_sectors:max sectors, range 64 to 32767  default=32767 (ushort)
parm:           missing_delay: device missing delay , io missing delay (array of int)
parm:           max_lun: max lun, default=16895  (int)
parm:           diag_buffer_enable: post diag buffers (TRACE=1/SNAPSHOT=2/EXTENDED=4/default=0) (int)
parm:           prot_mask: host protection capabilities mask, def=7  (int)
parm:           max_queue_depth: max controller queue depth  (int)
parm:           max_sgl_entries: max sg entries  (int)
parm:           msix_disable: disable msix routed interrupts (default=0) (int)
parm:           max_msix_vectors: max msix vectors  (int)
parm:           mpt2sas_fwfault_debug: enable detection of firmware fault and halt firmware - (default=0)
parm:           disable_discovery: disable discovery  (int)

Building kernel on rockstor is actually pretty easy, off the top of my head I believe it’s as follows

install the development tools packages

yum group install “Development Tools”

Install gzip if it’s not install (Yum install gzip)

  1. cd to /usr/src

  2. grab the kernel sources from kernel.org

  3. extract the file

  4. symlink link the extracted dir to /usr/src/linux (I don’t think this is actually required but it’s slightly neater)

  5. enter the directory

  6. make clean

  7. cp /boot/config-4.2.2-1.el7.elrepo.x86_64 .config (Note replace the 4.22 with the current kernel version tab should list the available options)

  8. make menu config

8a) press load and load the .config file (should be the default)

8b) make any changes needed to compile any extra drivers you need, if you are careful you can also disable stuff you don’t need, I take the approach if I don’t know what something does I leave it on whatever it was set to before.

8c) press save then exit

  1. make rpm (If you are on SSH and prone to dropping then “screen make rpm” might be a better option)

9a) go get a cup of coffee and find something to do for a while depending on the speed of your rockstor box this could take anything from 15minutes to several hours.

  1. if all worked you should be able to find a load of RPMs in /root for the kernel, headers.etc

  2. install with your favourite package manger.

The only gottya I did run into is Yum may not auto remove custom kernels when you upgrade to a newer one so after a while you will have to delete some older ones or /boot will get full and you won’t be able to install any more.

1 Like

@Dragon2611 Thanks, this is a great post. I didn’t realise the rpm could be made so easily. Is make oldconfig still an option these days as if so it may well stream line the process a little, especially if one is just going with the defaults?

I’ve not actually tried,

Also one thing I did change between the ELREPO and my kernel was to compile BTRFS directly into the kernel rather than rely on loading it as a module, I figured given how extensively rockstor uses it that it didn’t seem like a bad idea.

Cute another reboot and now I see this garbage on the CentOS 7.2 (1511) box I upgraded to kernel 4.3.3)

[root@localhost ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   30G  0 disk
├─sda1            8:1    0  500M  0 part /boot
└─sda2            8:2    0 29.5G  0 part
  ├─centos-root 253:0    0 27.5G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
sr0              11:0    1    4G  0 rom
[root@localhost ~]# dmesg | grep mpt2sas
[    0.775744] mpt2sas version 20.100.00.00 loaded
[    0.781703] mpt2sas0: 32 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (2048740 kB)
[    0.858546] mpt2sas0: MSI-X vectors supported: 1, no of cores: 1, max_msix_vectors: 8
[    0.859041] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 58
[    0.859043] mpt2sas0: iomem(0x00000000fd5f0000), mapped(0xffffc90000460000), size(65536)
[    0.859044] mpt2sas0: ioport(0x0000000000004000), size(256)
[    0.976310] mpt2sas0: Allocated physical memory: size(4964 kB)
[    0.976313] mpt2sas0: Current Controller Queue Depth(3307), Max Controller Queue Depth(3432)
[    0.976313] mpt2sas0: Scatter Gather Elements per IO(128)
[   31.005253] mpt2sas0: _base_event_notification: timeout
[   31.005269] mpt2sas0: sending message unit reset !!
[   31.007239] mpt2sas0: message unit reset: SUCCESS
[   31.055515] mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:8498/_scsih_probe()!
[root@localhost ~]#

If I reboot back into kernel 3.10 on the CentOS 7.2 (1511) tester box then it works.

[root@localhost ~]# lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda               8:0    0    30G  0 disk
├─sda1            8:1    0   500M  0 part /boot
└─sda2            8:2    0  29.5G  0 part
  ├─centos-root 253:0    0  27.5G  0 lvm  /
  └─centos-swap 253:1    0     2G  0 lvm  [SWAP]
sdb               8:16   0  93.2G  0 disk
sdc               8:32   0 186.3G  0 disk
sdd               8:48   0 186.3G  0 disk
sde               8:64   0 186.3G  0 disk
sdf               8:80   0 186.3G  0 disk
sr0              11:0    1     4G  0 rom
[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-327.3.1.el7.x86_64 #1 SMP Wed Dec 9 14:09:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]#

Crazy thing is I KNOW I rebooted into the 4.3.3 kernel and had it working at some point late last night…SMH

Guess I DO have FW v19 on the HBA, looks like CentOS 7.2 is loading driver v20. I know FreeNAS moved to driver 20 and firmware 20 otherwise the GUI complains. I could start there I suppose. I have a couple spare 9201-8i in IT v19 mode, will flash up to v20 and attach the stg appliance VM in vt-D mode adn see what happens.

Even in kernel 3.10 I have same mpt2sas driver so I may be chasing a ghost.

mpt2sas version 20.100.00.00 loaded

mpt2sas0: LSISAS2008: FWVersion(19.00.00.00), ChipRevision(0x03), BiosVersion(07.37.00.00)
[ 1.094891] mpt2sas0: Dell 6Gbps SAS HBA: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1C)

I think this card is a Dell Perc H310 which is essentially a re-branded LSI 9211-8i (all sas 2008 HBA’s), but all I reflash to LSI official IT mode firmware version 19. Maybe it’s time to move to version 20 of firmware as time/drivers/distro’s march on, v19’s been rock solid for me for at least the last 4-5 years though it seems.

Anyone else running a similar setup here? Essentially an AIO (all-in-one) build w/ a hypervisor and vt-d pass thru stg ctrls to the VM driving the array/nas?

EDIT: Manually setup raid10 btrfs pool across those 4 hussl sas ssd’s and configured NFS (still booted back into 3.10 kernel as the 4.3.3 kernel sh|t the bed apparently or had that one flukie ‘Hey I see devices’ complex’, now nothing), mounted to vSphere/ESXi hosts, crushing 350MB/s in/write and 300MB/s out/read to another ZFS NFS backed pool and my vSAN AFA floating a few VM’s around using sVMotion.

mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde

You guys are gonna force me to become a btrfs cli junkie yet! But I LOVE the dashboard!

KNEW I wasnt losing my damn mind, now w/ 4.3.3 booted it AGAIN see’s disks…inconsistent to say the least. Just flashed up a spare LSI 2008 6Gbps HBA/ctrl to firmware v20. Gonna see if that helps.

Have NOT given up yet!

Flash of a LSI 6Gbps HBA to v20 didnt seem to help much, still very inconsistent drive identification/sensing. My CentOS box DOES currently see the disks w/ 4.3.3 kernel, I tried to unblacklist util-linux goodies and apply a yum update -y and it re-upgraded the 4-5 util-linux and deps, even updated to RS 3.8-10.03. NO GO

Tried to lay down a 4.3.3 kernel on RS and boot into that, akin to what I did on my CentOS 7.2 build w/ 3.10 kernel to take it to 4.3.3. NO GO (move from RS 4.2.5 kernel to ELREPO 4.3.3).

I stink.

@whitey I have exactly the same problem. I’m using a Dell H310 with P20 IT firmware with vt-d passthrough in ESXi to linux. Any time I try something with kernel 4.2 or newer, I have the same timeout after 30 seconds.

[    3.944999] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (4047108 kB)
[    4.230006] mpt2sas0: MSI-X vectors supported: 1, no of cores: 2, max_msix_vectors: 8
[    4.232353] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 60
[    4.232355] mpt2sas0: iomem(0x00000000fd3f0000), mapped(0xffffc90000780000), size(65536)
[    4.232356] mpt2sas0: ioport(0x0000000000006000), size(256)
[    4.670263] mpt2sas0: Allocated physical memory: size(7445 kB)
[    4.670266] mpt2sas0: Current Controller Queue Depth(3307), Max Controller Queue Depth(3432)
[    4.670267] mpt2sas0: Scatter Gather Elements per IO(128)
[   34.931893] mpt2sas0: _base_event_notification: timeout
[   34.931917] mpt2sas0: sending message unit reset !!
[   34.939852] mpt2sas0: message unit reset: SUCCESS
[   35.090297] mpt2sas0: failure at /build/linux-AxjFAn/linux-4.2.0/drivers/scsi/mpt2sas/mpt2sas_scsih.c:8237/_scsih_probe()!

The only reliable way I’ve found to make it work in 4.2 is to boot into something older like 4.1 first and then reboot back into 4.2. Every time I do that it works.

4.1.6:

[    1.084177] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (4047648 kB)
[    1.171214] mpt2sas0: MSI-X vectors supported: 1, no of cores: 2, max_msix_vectors: 8
[    1.172392] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 60
[    1.172394] mpt2sas0: iomem(0x00000000fd3f0000), mapped(0xffffc90000800000), size(65536)
[    1.172395] mpt2sas0: ioport(0x0000000000006000), size(256)
[    1.289273] mpt2sas0: Allocated physical memory: size(7445 kB)
[    1.289275] mpt2sas0: Current Controller Queue Depth(3307), Max Controller Queue Depth(3432)
[    1.289276] mpt2sas0: Scatter Gather Elements per IO(128)
[    1.347695] mpt2sas0: LSISAS2008: FWVersion(20.00.04.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
[    1.347698] mpt2sas0: Dell 6Gbps SAS HBA: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1C)
[    1.347698] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[    1.347976] mpt2sas0: sending port enable !!
[    2.952518] mpt2sas0: host_add: handle(0x0001), sas_addr(0x5b8ca3a0f9f4e000), phys(8)
[    9.087861] mpt2sas0: port enable: SUCCESS````

4.2.8 after rebooting from 4.1.6:
````[    0.819676] mpt2sas version 20.100.00.00 loaded
[    0.826784] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (4047696 kB)
[    0.903642] mpt2sas0: MSI-X vectors supported: 1, no of cores: 2, max_msix_vectors: 8
[    0.903946] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 60
[    0.903948] mpt2sas0: iomem(0x00000000fd3f0000), mapped(0xffffc90000780000), size(65536)
[    0.903949] mpt2sas0: ioport(0x0000000000006000), size(256)
[    1.022265] mpt2sas0: Allocated physical memory: size(7445 kB)
[    1.022268] mpt2sas0: Current Controller Queue Depth(3307), Max Controller Queue Depth(3432)
[    1.022268] mpt2sas0: Scatter Gather Elements per IO(128)
[    1.081273] mpt2sas0: LSISAS2008: FWVersion(20.00.04.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
[    1.081276] mpt2sas0: Dell 6Gbps SAS HBA: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1C)
[    1.081277] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[    1.081649] mpt2sas0: sending port enable !!
[    1.084301] mpt2sas0: host_add: handle(0x0001), sas_addr(0x5b8ca3a0f9f4e000), phys(8)
[    1.124829] mpt2sas0: port enable: SUCCESS````

YEA, 3.8.11 build for me just worked w/ my AIO config and my picky huss4010bss600 sas ssd’s! Even laid down a pool and then deleted pool, they showed right back up as importable or wipe. wooohoooo

Pounding some sVMotions into a r10 from vSphere infra. 25K iops consistent!

Probably not the right place for it but it sure would be nice on the Disk activity chart to have another ‘view’ of I/O latency/response time right there in the same dashboard as well as your already provided IOPS across the pool/disks view. I know it can get crowded on a dashboard and you don’t want it to look ‘too busy/scattered’ but maybe something to consider.

I also DID recently do a lab upgrade/refresh from an E3 to E5 platform across my 3-node vSphere cluster but I doubt that had a thing to do w/ it. More resources sure are nice though (double cores/quadruple memory)

The saga continues for me, I stepped away from this for a bit once i realized I rebooted and issues popped up again. I have now been able to re-produce exactly what happens. After initial install/first boot i can see disks and mpt2sas driver/msg in dmesg look happy. If I simply reboot i can then no longer see disks and mpt2sas looks unhappy again.

See images. SMH


Drum roll…wait for it…and the fix for me was:

mpt2sas.msix_disable=1 (for 4.3 or older kernels)
or
mpt3sas.msix_disable=1 (for 4.4 or newer kernels)

Added to kernel boot line parameters, adding here as well for prosperity’s sake.

Under ubuntu or CentOS using grub2 to make it stick edit the /etc/default/grub to the following:

GRUB_CMDLINE_LINUX_DEFAULT=“mpt2sas.msix_disable=1”
(mpt3sas.msix_disable=1 for 4.4 or newer kernels)

Save file:

Then update grub2 files:

Ubuntu - update-grub
CentOS - grub2-mkconfig -o /boot/grub2/grub.cfg (BIOS based machines)
grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg (UEFI based machines)

3 Likes