Checksum errors from BTRFS

Hello,

I am running Rockstor 3.8.16-16 as a guest inside VMWare ESXi 6.5.0. I am seeing the following errors in dmesg:

[608715.636355] BTRFS warning (device sdb): csum failed ino 21521 off 176128 csum 2924415985 expected csum 2920221681
[608715.636495] BTRFS warning (device sdb): csum failed ino 21521 off 176128 csum 2924415985 expected csum 2920221681
[612830.374636] BTRFS warning (device sdb): checksum error at logical 150460334080 on dev /dev/sdb, sector 291770688, root 257, inode 17671, offset 170033152, length 4096, links 1 (path: <filename>  )
[612830.374644] BTRFS error (device sdb): bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[612830.375537] BTRFS error (device sdb): unable to fixup (regular) error at logical 150460334080 on dev /dev/sdb
[619025.297794] BTRFS warning (device sdb): checksum error at logical 790130016256 on dev /dev/sdb, sector 1541125536, root 741, inode 21521, offset 176128, length 4096, links 1 (path: doop.sparsebundle/bands/15cd)
[619025.297805] BTRFS error (device sdb): bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[619025.297851] BTRFS error (device sdb): unable to fixup (regular) error at logical 790130016256 on dev /dev/sdb

I’m not sure how to proceed in debugging this. What other info do you need from me to investigate? I looked in the VMware logs, but didn’t see anything for the drive this resides on.

Thanks,

jorgy

The first thing I would do is kick off a scrub on your array. Definitely do that.

The second thing I would do is check (and post) all the smartctl values for all drives in the btrfs array.

We need more information about your array to diagnose. “btrfs fi show” would help us see some of the basics. But definitely kick off a scrub and post the smartctl values for all your drives.

Thanks for your reply.

I did a scrub when this started. Is there a log, other than the start time, end time, and amount of data scrubbed?

Also, the drive went into read-only mode, so I rebooted, and kicked off another scrub. Before the reboot, the errors were similar:

[49669.147804] BTRFS warning (device sdb): csum failed ino 21521 off 176128 csum 2924415985 expected csum 2920221681
[49669.148214] BTRFS warning (device sdb): csum failed ino 21521 off 176128 csum 2924415985 expected csum 2920221681
[49669.148367] BTRFS warning (device sdb): csum failed ino 21521 off 176128 csum 2924415985 expected csum 2920221681
[49934.061215] BTRFS warning (device sdb): csum failed ino 21521 off 176128 csum 2924415985 expected csum 2920221681
[49934.061585] BTRFS warning (device sdb): csum failed ino 21521 off 176128 csum 2924415985 expected csum 2920221681
[49934.061804] BTRFS warning (device sdb): csum failed ino 21521 off 176128 csum 2924415985 expected csum 2920221681

Here is the output from “btrfs fi show”:

Label: 'rockstor_rockstor'  uuid: a831e0c9-6245-47a8-8856-08844e9999da
	Total devices 1 FS bytes used 3.49GiB
	devid    1 size 17.51GiB used 6.04GiB path /dev/sda3

Label: 'UG1'  uuid: add82f87-5f2f-4b02-bb27-75e47b0b9009
	Total devices 1 FS bytes used 842.97GiB
	devid    1 size 3.64TiB used 851.05GiB path /dev/sdb

This system is on a VMware ESXi 6.5.0 host, and sdb is a dedicated 4T SATA drive. SMART isn’t supported in the guest. On the host, here is the SMART output from VMware:

esxcli storage core device smart get -d vml.0100000000202020202057442d574343374b32415
a37335a52574443205744
Parameter                     Value  Threshold  Worst
----------------------------  -----  ---------  -----
Health Status                 OK     N/A        N/A  
Media Wearout Indicator       N/A    N/A        N/A  
Write Error Count             0      0          N/A  
Read Error Count              0      51         N/A  
Power-on Hours                96     0          96   
Power Cycle Count             11     0          N/A  
Reallocated Sector Count      0      140        N/A  
Raw Read Error Rate           0      51         N/A  
Drive Temperature             35     0          N/A  
Driver Rated Max Temperature  N/A    N/A        N/A  
Write Sectors TOT Count       N/A    N/A        N/A  
Read Sectors TOT Count        N/A    N/A        N/A  
Initial Bad Block Count       N/A    N/A        N/A  

Thanks for any help!
jorgy

@jorgy Welcome to the Rockstor community.

As from:

and issue:
https://github.com/rockstor/rockstor-core/issues/1243

Clicking on the link in the Pools details page Scrubs tab table for the relevant scrub should give you more info:
ie:

and an example scrub details page from that link:

Hope that helps.

Thanks Philip.

Would you recommend updating to 3.9.1 before I have this BTRFS issue fixed, or will that help diagnose it? I don’t want to make it worse.

Thanks,

jorgy

Another interesting tidbit:

# btrfs scrub status <dev>
scrub status for add82f87-5f2f-4b02-bb27-75e47b0b9009
	scrub resumed at Mon Oct  2 10:10:43 2017 and finished after 02:27:32
	total bytes scrubbed: 845.65GiB with 0 errors

I am seeing more errors today:

[180094.381247] BTRFS warning (device sdb): csum failed ino 13316 off 3193069568 csum 1290931525 expected csum 3509363373
[180094.510955] BTRFS warning (device sdb): csum failed ino 13316 off 3193069568 csum 1290931525 expected csum 3509363373
[180094.524756] BTRFS warning (device sdb): csum failed ino 13316 off 3193069568 csum 1290931525 expected csum 3509363373
[181120.568733] BTRFS warning (device sdb): csum failed ino 283 off 5521408 csum 3800761431 expected csum 2136320959
[181120.606300] BTRFS warning (device sdb): csum failed ino 283 off 5521408 csum 3800761431 expected csum 2136320959
[181120.607702] BTRFS warning (device sdb): csum failed ino 283 off 5521408 csum 3800761431 expected csum 2136320959
[181171.850855] BTRFS warning (device sdb): csum failed ino 7117 off 11268096 csum 4204857313 expected csum 1736419337
[181195.205130] BTRFS warning (device sdb): csum failed ino 262 off 861052928 csum 186724015 expected csum 2533327175
[181195.315584] BTRFS warning (device sdb): csum failed ino 262 off 861052928 csum 186724015 expected csum 2533327175
[181195.315844] BTRFS warning (device sdb): csum failed ino 262 off 861052928 csum 186724015 expected csum 2533327175
[181267.801920] BTRFS warning (device sdb): csum failed ino 13324 off 589971456 csum 1422596940 expected csum 3373642916
[181267.870052] BTRFS warning (device sdb): csum failed ino 13324 off 589971456 csum 1422596940 expected csum 3373642916
[181267.881462] BTRFS warning (device sdb): csum failed ino 13324 off 589971456 csum 1422596940 expected csum 3373642916
[181288.632398] BTRFS warning (device sdb): csum failed ino 13324 off 1449406464 csum 763958242 expected csum 2958539786
[181344.354888] BTRFS warning (device sdb): csum failed ino 13324 off 3462922240 csum 2452309663 expected csum 267774327
[181344.357320] BTRFS warning (device sdb): csum failed ino 13324 off 3462922240 csum 2452309663 expected csum 267774327
[181436.546175] BTRFS warning (device sdb): csum failed ino 12686 off 417792 csum 2347159363 expected csum 372793515
[181436.554732] BTRFS warning (device sdb): csum failed ino 12686 off 417792 csum 2347159363 expected csum 372793515
[181618.226123] BTRFS warning (device sdb): csum failed ino 13324 off 1849221120 csum 454343424 expected csum 2261390568
[181618.365176] BTRFS warning (device sdb): csum failed ino 13324 off 1849221120 csum 454343424 expected csum 2261390568
[181618.376528] BTRFS warning (device sdb): csum failed ino 13324 off 1849221120 csum 454343424 expected csum 2261390568
[181654.563664] BTRFS warning (device sdb): csum failed ino 13324 off 4209876992 csum 1426024824 expected csum 3374433936
[181654.566488] BTRFS warning (device sdb): csum failed ino 13324 off 4209876992 csum 1426024824 expected csum 3374433936
[181727.941391] BTRFS warning (device sdb): csum failed ino 13324 off 821587968 csum 2607268515 expected csum 112815435
[181728.050714] BTRFS warning (device sdb): csum failed ino 13324 off 821587968 csum 2607268515 expected csum 112815435
[181728.051456] BTRFS warning (device sdb): csum failed ino 13324 off 821587968 csum 2607268515 expected csum 112815435
[181742.666917] BTRFS warning (device sdb): csum failed ino 13324 off 1999724544 csum 2998277834 expected csum 795392290
[181742.670006] BTRFS warning (device sdb): csum failed ino 13324 off 1999724544 csum 2998277834 expected csum 795392290
[181742.670285] BTRFS warning (device sdb): csum failed ino 13324 off 1999724544 csum 2998277834 expected csum 795392290

Is there a fix for this?

Pool name: RAID6_VOLUME
Scrub Status: cancelled
Scrub Statistics
Attribute Value
ID 9
Start Time February 18th 2018, 2:52:52 pm
End Time February 18th 2018, 4:27:41 pm
Data Scrubbed 3.35 TB
Data Extents Scrubbed 61707837
Tree Extents Scrubbed 759296
Tree Bytes Scrubbed 12440305664
Read Errors 2336
Csum Errors 0
Verify Errors 0
No Csum 158016
Csum Discards 0
Super Errors 0
Malloc Errors 0
Uncorrectable Errors 2336
Unverified Errors 0
Corrected Errors 0
Last Physical 3700956856320