Btrfs error and critical target errors with KVM disk passthrough

allthestairs · December 23, 2016, 4:08pm

I have a Rockstor 3.8.16-2 installation running on KVM. I have passed two SATA disks through in scsi mode with
scsihw: virtio-scsi-pci or virtio-scsi-single. These were picked up correctly, and I’ve set up at raid 1 type pool across these two disks, and it is apparently working well. While checking /var/log/messages there are many messages similar to the following. While maxing out writes to the pool, I will see a set of these every minute or so.

Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14608, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14609, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14610, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14611, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14612, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14613, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14614, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14615, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14616, rd 0, $
Dec 23 10:34:19 rockstor kernel: BTRFS error (device sdb): bdev /dev/disk/by-id/scsi-350014ee262e982a2 errs: wr 14617, rd 0, $
Dec 23 10:34:21 rockstor kernel: sd 2:0:0:0: [sda] tag#34 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 23 10:34:21 rockstor kernel: sd 2:0:0:0: [sda] tag#34 Sense Key : Illegal Request [current]
Dec 23 10:34:21 rockstor kernel: sd 2:0:0:0: [sda] tag#34 Add. Sense: Invalid field in cdb
Dec 23 10:34:21 rockstor kernel: sd 2:0:0:0: [sda] tag#34 CDB: Write(16) 8a 00 00 00 00 00 0e 3f 93 00 00 00 09 00 00 00
Dec 23 10:34:21 rockstor kernel: blk_update_request: critical target error, dev sda, sector 239047424

I am not sure how to figure out if this is an inherent problem of my SATA controller or whether the virtio-scsi passthrough will just not work with Rockstor. I would appareciate any feedback on this, since this seems like it will not yield a stable btrfs pool.

allthestairs · December 24, 2016, 3:19pm

Installed bare-metal and do not get these errors, so there is apparently some incompatibility between the virtio-scsi setup and the kernel in use currently.

Tomasz_Kusmierz · December 27, 2016, 10:29pm

Let me guess, using proxmox 4.4 ? If so, fall back to 4.3 … I’ve corrupted whole cctv system storage by simply migrating it to 4.4

Right now I’m using a test box (1:1 hardware clone of CCTV server) to find the root cause and everything points at proxmox 4.4 having some broken part in it. I’ve managed to get kernel to panic on disk image file mounted volume without any passthrough disks ( you know that root volume it will automatically create during “create VM” process )

Dragon2611 · December 28, 2016, 1:02pm

Interesting, can’t say i’ve noticed a problem on my 4.4 node yet, and that’s using either intel or marvel controllers in plain old JBOD (Depends which ports the disk connected to on my motherboard)

Edit: Ah difference is I’m not using virtio-SCSI so I don’t think rockstor is talking directly to the disks (I.e they’re still virtual disks but raw devices mapped directly to the underlaying physical devices)

Tomasz_Kusmierz · December 28, 2016, 11:14pm

so in your first post you say that you use virtio_scsi_pci / _single and now you say that you don’t use virtio-SCSI ? you got me lost here …

allthestairs · December 28, 2016, 11:39pm

There are two of us!

I’ve just given up and am running Rockstor on baremetal and manually maintaining lxc and kvm using the CentOS repos instead of using Proxmox and Rockstor in KVM.

Tomasz_Kusmierz · December 31, 2016, 1:06am

I’m slowly starting to share your opinion … proxmox people pissed all over this problem as if did not even existed, it’s more important to advice how much ram you need with zfs (like some dipstick can’t read documentation or use search function) but problem that is corrupting people data and possibly affecting larger pool of people is not worth looking at:

Also I’m starting to come to conclusion that 2 hours of my work cost more than just dropping another server in and not bother with virtualisation.