Devices visible to udevadm but not blkid or btrfs

asapkota · September 6, 2015, 12:12pm

Hey all

I just upgraded to 3.8.7. Now, one of my pool is showing 1 less drive than normal.

sde, sdf, sdg and sdh were assigned to pool called storage1 (RAID10). Now, sde is not assigned to any pool and I cannot access any shares in the storage1 pool.

Please help.

suman · September 6, 2015, 1:39pm

Sorry you ran into this problem, looks like one of you drives(sde) is missing from the system. We’ll need to do some system/BTRFS level troubleshooting to find out more. Can you provide the output of the following commands? ( you need to run them as root user on your Rockstor system)

btrfs fi show

udevadm info --name=sde

blkid

asapkota · September 6, 2015, 3:22pm

I cannot seem to SSH into the box. Keep getting a permission denied error if though I can use the same u/p to login to the web UI. Any ideas on how to get into the system (it is headless)

suman · September 6, 2015, 4:53pm

Can you ssh as root user? You need to be root for all the commands anyway

asapkota · September 7, 2015, 1:37pm

Here is the results from btrfs fi show (sorry I made a mistake about the disks on the previous post. storage2 is the pool missing a disk after the upgrade)

Label: ‘rockstor_rockstor’ uuid: 5df2557b-e818-4fac-8b2d-bbaf5c5ef0a1
Total devices 1 FS bytes used 1.71GiB
devid 1 size 26.35GiB used 5.27GiB path /dev/sda3

Label: ‘storage2’ uuid: c73372a8-2eb3-4be7-973f-221078e0f95b
Total devices 3 FS bytes used 672.00KiB
devid 1 size 232.89GiB used 1.03GiB path /dev/sdf
devid 2 size 232.89GiB used 2.00GiB path /dev/sdg
devid 3 size 232.89GiB used 1.03GiB path /dev/sdh

warning, device 4 is missing
warning devid 4 not found already
Label: ‘storage1’ uuid: f88c55c0-4cb0-421d-84bc-679771f39623
Total devices 4 FS bytes used 174.88GiB
devid 1 size 931.51GiB used 88.53GiB path /dev/sdb
devid 2 size 931.51GiB used 88.53GiB path /dev/sdc
devid 3 size 931.51GiB used 88.53GiB path /dev/sdd
*** Some devices missing

btrfs-progs v4.1.2

Here is the result from udevadm info --name=sde

P: /devices/pci0000:00/0000:00:05.2/ata7/host6/target6:0:0/6:0:0:0/block/sde
N: sde
S: disk/by-id/ata-Hitachi_HDS721010CLA332_JP2921HQ1G5RSA
S: disk/by-id/wwn-0x5000cca35dd48a39
E: DEVLINKS=/dev/disk/by-id/ata-Hitachi_HDS721010CLA332_JP2921HQ1G5RSA /dev/disk/by-id/wwn-0x5000cca35dd48a39
E: DEVNAME=/dev/sde
E: DEVPATH=/devices/pci0000:00/0000:00:05.2/ata7/host6/target6:0:0/6:0:0:0/block/sde
E: DEVTYPE=disk
E: ID_ATA=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_FEATURE_SET_AAM=1
E: ID_ATA_FEATURE_SET_AAM_CURRENT_VALUE=254
E: ID_ATA_FEATURE_SET_AAM_ENABLED=0
E: ID_ATA_FEATURE_SET_AAM_VENDOR_RECOMMENDED_VALUE=128
E: ID_ATA_FEATURE_SET_APM=1
E: ID_ATA_FEATURE_SET_APM_ENABLED=0
E: ID_ATA_FEATURE_SET_HPA=1
E: ID_ATA_FEATURE_SET_HPA_ENABLED=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_PUIS=1
E: ID_ATA_FEATURE_SET_PUIS_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY=1
E: ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=232
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_ROTATION_RATE_RPM=7200
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=0
E: ID_BUS=ata
E: ID_MODEL=Hitachi_HDS721010CLA332
E: ID_MODEL_ENC=Hitachi\x20HDS721010CLA332\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_REVISION=JP4OA39C
E: ID_SERIAL=Hitachi_HDS721010CLA332_JP2921HQ1G5RSA
E: ID_SERIAL_SHORT=JP2921HQ1G5RSA
E: ID_TYPE=disk
E: ID_WWN=0x5000cca35dd48a39
E: ID_WWN_WITH_EXTENSION=0x5000cca35dd48a39
E: MAJOR=8
E: MINOR=64
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=269417

Here is the result of blkid

/dev/block/8:3: LABEL=“rockstor_rockstor” UUID=“5df2557b-e818-4fac-8b2d-bbaf5c5ef0a1” UUID_SUB=“8d44e136-4817-4759-984a-52fd82f21d4c” TYPE=“btrfs”
/dev/block/8:1: UUID=“05bf7468-5aea-4655-b5f1-ed1581b7e69d” TYPE=“ext4”
/dev/block/8:2: UUID=“e6bf7dc0-fbaa-432b-b191-cefe0c098c6b” TYPE=“swap”
/dev/sdb: LABEL=“storage1” UUID=“f88c55c0-4cb0-421d-84bc-679771f39623” UUID_SUB=“5b4f8b4b-1c91-4fa5-a864-bb58578e72ac” TYPE=“btrfs”
/dev/sdc: LABEL=“storage1” UUID=“f88c55c0-4cb0-421d-84bc-679771f39623” UUID_SUB=“5398a98c-f80f-4483-a010-83f036d3a815” TYPE=“btrfs”
/dev/sdd: LABEL=“storage1” UUID=“f88c55c0-4cb0-421d-84bc-679771f39623” UUID_SUB=“f5c15e0a-cd4a-4e0b-94c3-963571cbc2d9” TYPE=“btrfs”
/dev/sdf: LABEL=“storage2” UUID=“c73372a8-2eb3-4be7-973f-221078e0f95b” UUID_SUB=“4c352d18-272f-4ac5-a0de-9971612dd2dd” TYPE=“btrfs”
/dev/sdg: LABEL=“storage2” UUID=“c73372a8-2eb3-4be7-973f-221078e0f95b” UUID_SUB=“bcdd176f-b0ec-46ad-b12a-79766d4ef5fb” TYPE=“btrfs”
/dev/sdh: LABEL=“storage2” UUID=“c73372a8-2eb3-4be7-973f-221078e0f95b” UUID_SUB=“e1bd7591-8ec8-4aa3-b0f1-4b0dda149723” TYPE=“btrfs”

suman · September 7, 2015, 4:55pm

I see the problem. udevadm recognizes /dev/sde but blkid does not. btrfs uses blkid for device scanning, so it doesn’t see your sde(and others with the same problem) and hence you have devices missing. Try the following commands:

btrfs device scan /dev/sde
btrfs fi show

I am wondering if specifying the device explicitly might help the scan. Please provide output of the commands and any extra information you think is relevant. A screenshot of your pools table can also be useful.

Also this is not related to Rockstor update and a BTRFS troubleshooting issue. So, I’ve updated the topic headline.

asapkota · September 7, 2015, 7:12pm

These are the outputs.

[root@nas ~]# btrfs device scan /dev/sde
Scanning for Btrfs filesystems in ‘/dev/sde’
[root@nas ~]#

[root@nas ~]# btrfs fi show
Label: ‘rockstor_rockstor’ uuid: 5df2557b-e818-4fac-8b2d-bbaf5c5ef0a1
Total devices 1 FS bytes used 1.71GiB
devid 1 size 26.35GiB used 5.27GiB path /dev/sda3

Label: ‘storage2’ uuid: c73372a8-2eb3-4be7-973f-221078e0f95b
Total devices 3 FS bytes used 672.00KiB
devid 1 size 232.89GiB used 1.03GiB path /dev/sdf
devid 2 size 232.89GiB used 2.00GiB path /dev/sdg
devid 3 size 232.89GiB used 1.03GiB path /dev/sdh

warning, device 4 is missing
warning devid 4 not found already
Label: ‘storage1’ uuid: f88c55c0-4cb0-421d-84bc-679771f39623
Total devices 4 FS bytes used 174.88GiB
devid 1 size 931.51GiB used 88.53GiB path /dev/sdb
devid 2 size 931.51GiB used 88.53GiB path /dev/sdc
devid 3 size 931.51GiB used 88.53GiB path /dev/sdd
*** Some devices missing

btrfs-progs v4.1.2

Here are the screenshots

http://imgur.com/AHXrUMc

http://imgur.com/ysWF3Su

http://imgur.com/Ypiy3Ht

suman · September 7, 2015, 7:24pm

Thanks for the info @asapkota, it’s very helpful.

It seems like Rockstor doesn’t think sde is part of any Pool, but it does see it. You can add it from the Web-UI. Take a look at this documentation on how to add a disk via pool resize. There’s also a screencast in there that shows you exactly how. Add sde to storage1 and then let’s see if the output of btrfs fi show changes.

Besides that, it’s very mysterious how the device just dropped. If you can, please send me the logs located at /opt/rockstor/var/log. You can just zip it and e-mail to support@rockstor.com.

asapkota · September 7, 2015, 8:15pm

I added the disk back to the pool. Rebalancing now.

[root@nas ~]# btrfs fi show
Label: ‘rockstor_rockstor’ uuid: 5df2557b-e818-4fac-8b2d-bbaf5c5ef0a1
Total devices 1 FS bytes used 1.71GiB
devid 1 size 26.35GiB used 5.27GiB path /dev/sda3

Label: ‘storage2’ uuid: c73372a8-2eb3-4be7-973f-221078e0f95b
Total devices 3 FS bytes used 672.00KiB
devid 1 size 232.89GiB used 1.03GiB path /dev/sdf
devid 2 size 232.89GiB used 2.00GiB path /dev/sdg
devid 3 size 232.89GiB used 1.03GiB path /dev/sdh

Label: ‘storage1’ uuid: f88c55c0-4cb0-421d-84bc-679771f39623
Total devices 4 FS bytes used 175.05GiB
devid 1 size 931.51GiB used 89.53GiB path /dev/sdb
devid 2 size 931.51GiB used 89.53GiB path /dev/sdc
devid 3 size 931.51GiB used 89.53GiB path /dev/sdd
devid 4 size 931.51GiB used 89.53GiB path /dev/sde

btrfs-progs v4.1.2

Sending log via email. Thanks

suman · September 7, 2015, 8:20pm

The output looks perfect. I think the balance is already done. Please double check that it’s done and if you are game, reboot the machine to make sure this doesn’t get triggered again?

asapkota · September 7, 2015, 8:33pm

Actually the rebalance is stuck for some reason

[root@nas log]# btrfs balance status /mnt2/storage1
Balance on ‘/mnt2/storage1’ is running
0 out of about 90 chunks balanced (1 considered), 100% left
[root@nas log]#

suman · September 7, 2015, 8:39pm

Ok, I spoke too soon. It may seem stuck, but it takes time to rebalance what looks like 360GB of data. Let it run and keep monitoring.

asapkota · September 10, 2015, 12:59pm

No luck. Every time I reboot the NAS, drive sde disappears. I add it again and rebalance takes a day.

I tried to change the RAID from 10 to 1 and rebalance so I can mount the drive and get to my data but that also throws an error.

Error running a command. cmd = [’/bin/mount’, u’/dev/disk/by-label/storage1’, u’/mnt2/storage1’, ‘-o’, u’,compress=no’]. rc = 32. stdout = [’’]. stderr = [‘mount: wrong fs type, bad option, bad superblock on /dev/sdb,’, ’ missing codepage or helper program, or other error’, ‘’, ’ In some cases useful info is found in syslog - try’, ’ dmesg | tail or so.’, ‘’]

suman · September 10, 2015, 3:04pm

bummer. So everytime you reboot, btrfs fi show command reports missing device(supposedly sde)? This remains the same even after you explicitly run btrfs device scan? But you are able to add the device via pool resize and rebalance?

I wish I could reproduce this behavior. I brought it up on #btrfs IRC and no one seemed to have seen this before. I suggest you report this to btrfs mailing list. They will help you troubleshoot more and I’ll help anyway I can.

asapkota · September 10, 2015, 9:35pm

I am able to add the device via gui and do a rebalance which takes a while (170gb of data)

the pool does not get mounted and neither do the shares. so i restart the box so they get automatically mounted and the pool show 3 drives and not 4 (sde missing).

At that point, I can mount the pool with the ‘degraded’ option and also mount the shares and access the data

not sure why the addition of sde does not stick

asapkota · September 11, 2015, 1:44am

Seem to have fix the issue somewhat.

I had to resize/rebalance the pool from a RAID10 to a RAID1
Then delete missing drives (sde)
Reboot, which mounted the pool and shares as RAID1
Then I had to delete the superblocks on sde
Then add sde back to the converted RAID1 pool
Rebalance

I think I am going to leave it as a RAID1 and not mess around anymore.

@suman, thanks for all the help.

felixbrucker · September 11, 2015, 6:46am

so it seems btrfs somehow messed up the whole btrfs filesystem, might be of value for btrfs devs/mailing lists

KarstenV · September 11, 2015, 8:36am

I dont know if it is a similar scenario, but I experienced similar problems when I was testing Rockstor the first time.

I had made a pool on a single disk, that I had given a name (TestPool).
Seing that RockStor installed and worked well, I took 4 other disks, reinstalled, and made a Raid5 also named TestPool. This worked well, so I decided to add the first disk, from the first install, to this pool. To my surprise RockStor took that drive and added it to the RAID5 without any intervention, allthough it was not part of it. I at first thought this was by design (which it wasn’t), but later I found out it had mixed the single disk pool and the RAID5 pool, probably because they had the same name.
This eventually messed up both pools completely, and I had to rebuild.
As this was part of a test, I wasn’t that worried, and have just been carefull to remove existing partitions from new disks before adding them.

But perhaps what is being seen is the same sort of behaviour?

asapkota · September 19, 2015, 5:11pm

Still having issues with this.

Now the system gets stuck on boot.

It says “Starting Trigger Flushing of Journal to Persistent Storage” and just gets stuck there.

Any idea what this means or what may be causing this. Thanks

felixbrucker · September 19, 2015, 8:19pm

does it continue after 1-2mins? per default there should be some 1-2min timeout for such jobs