Better to pop in edited snippets from logs here, surrounting them in the tripple inverted backslash to quote them. There is currently little time I’m unfortunately within support email and far more folks here on the forum to help.
Could you given more details of the exact situation here.
Can you give output of the CLI support and for SMART. We use the CLI to assess support level and indicate this on the Web-UI. The log snippets should also give the error messages associated with attempts to assess the SMART support. Or it may be we are blanket ignoring this device type, SMART wise, for some reason. See below:
I’ll note a potentially relevant part of the code here to help others chip in on what may be going wrong here:
So from those code coments it does look like we are universally ignoring nvme- devices purely on the grounds of our prior (older) CentOS base.
You could try removing the “nvme-|” entry in that re.match string to see if all is then well. We are, after-all, on much newer smartmontools now.
If you look after the quoted rty (line 473) you will see what we do to ‘test’ for smart capability. Hopefully it will be fine for nvme devices now but otherwise it may need some attention in their case.
Thanks again for the report, and be sure to reboot, or stop and then start all rockstor services after any python change for them to take effect. Might be a nice feature add pull request actually :).
Hope that helps, and do let us know if removing that nvme disable for smart in-code works for you.
I have changed the file as suggested and restarted rockstor services: systemctl restart rockstor rockstor-bootstrap rockstor-pre.service
however the change was not visible.
As a sanity check, I then went so far to temporarily set all smart_available and smart_enabled vars to True and restarted rockstor again. This time all my disks displayed that they had SMART supported!
When I reverted the disk.py file back to its original state (and restarted rockstor), however, it still says that the NVMe disk has SMART supported, even though it should say Not supported.
I’ve tried killing nginx and using systemctl stop instead of restart, but no luck.
I have disabled cache in my browser and have even used a different browser, but no lock as well.
I don’t want to reboot my server.
I am changing /opt/rockstor/src/rockstor/storageadmin/views/disk.py, is this correct? disk.pyc is being replaced, when stop/start-ing the services.
EDIT: OK, I think I got it.
The reason probably lies here:
When I have temporarily set all smart_available and smart_enabled vars to True, that caused that state to be persisted via do.save() at the end of the method.
However, there is no such persistence in place in the code snippet above.
It simply continues with the next object in the for loop, leaving the previous one untouched.
I have added an extra do.save there, so that I’m back to the original state and can test the fix mentioned by @phillxnet
EDIT 2: I have removed “nvme-”, but the smart.available() returns False, False.
Where can I find reference docs for the smart.* module?
@aremiaskfa Hello again, and thanks for chasing this one down.
Yes, that looks to be correct; at least as far as you have tracked this down before the smart.available() discovery anyway.
It very much looks like you are getting to the root of this one actually.
The only developer docs we have for these related functions are within their docstrings. I think the following source file and location should point you in the right direction:
Which is under (a part/file of) the system module (note the init.py within the system dir).
From the file you have been looking at we have the following import from elsewhere in our own code:
And so we get access to “smart” and all it’s goodies as we have defined them: smart.available is thus usable within this views/disk.py file.
And if you do fancy taking your findings to a pull request stage, then note that we don’t as-yet have any unit tests for this part (smart) of the system module:
It may very well be a good time/exercise to add some coverage to this area if it turns out that what we need here is an augmentation in our parsing to decern smart availability. But it may also be that nvme output is quite different. Let us know how your experimentation goes.
Also note that we have an ability to submit problematic smart output via the following shell script:
So it would be helpful to post the output of that script, for the given NVME device to our support email as we would then have access to this particular smart output. The testing mode you see referenced in smart.py is intended for developers to be able to test our code on systems they don’t have direct access to. I unfortunately don’t have access to a real nvme currently, but we do have some in the ‘team’ : but as always it’s a matter of prioritising time. So do, if you can, keep chipping away on this one as I think you are running down a new area of development for us. As stated we have never supported nvme smart and their output may present some new challenges.
Hope that helps, and thanks again for potentially extending our nvme smart capabilities.
Yes, that is what it looks like. So I think we need to adapt our parsing to catch this kind of output. It may well be that this format is common for NVME devices. I have none here currently to test with but maybe in time.
Thanks, hopefully it can be used to establish some tests so we can extend our smart output parsing. When we initially developed this parsing there were many reports of minor failings. But via the previously mentioned shell script we ended up mostly supporting what was out-there. However you report looks to be the start of a new wave of changes likely associated with these devices.
I’m a bit behind on the support emails currently but I should get to it in time.
It will be interesting to see the response from the upstream project.
Nice find/report by the way. Lets keep an eye on this to see who we might augment our smart ‘sensing’ parsing to acomodate. Ideally we need an nvme that doesnt’ support smart. It may be that this doesn’t exist, or/and it may be that it can similarly not be disabled. If so then we may just assume smart capability and enablement for all nvme devices. That should be easy enough.
Hope that helps and thanks for the additional info.
@aremiaskfa Nice find, re your upstream issue with smartmontools.
As you did the exploratory work on this one, do you fancy opening an issue here:
with your findings referencing your upsteam smartmontools issue. We then have attribution to yourself via the issue creation and upstream within issue text. And link back to this forum thread for some context.
That issue can then be used by a pull request to remove our auto-disable of nvme smart (because in our last os host it wasnt’ in smartmontools at the time) and replace it with an auto enable. We may well have some more work to do here also regarding the subsequent parsing if they have changed/lost other entries.
If you fancy having a go at this pr yourself then indicate this on the issue by assigning yourself. Or noting the same if it wont’ let you.
Thanks again for chasing this one down. Major progress here.
Have your tried jury-rigging the code to auto enable given a device name or the like. You could then test the subsequent formatting for our display of the smart details.
After jury-rigging it, the SMART details dialog/page is mostly non-functional.
Only a few Identity attributes are displayed in the table.
The rest of the tabs will need to be reworked specifically for nvme drives.
Identity tab: nvme drives seem to have compatible Identity information information. In this case, I suggest simply outputting the whole === START OF INFORMATION SECTION === into a nice table like it is now, but without the regex matches. Simply build the table per line, splitting by first occurrence of colon.
Attributes tab: nvme drive don’t have SMART Attributes. Instead they offer SMART / Health Information. The two are not compatible.
Capabilities tab: I think, this is one can be hidden for nvme drives.
Error Logs tab: nvme drives report errors differently. Will need to be adapted.
Self Test Logs tab: smartmontools 7.3 do not support such logs for nvme drives yet. One should use the nvme-cli package instead.
Perform Tests tab: smartmontools 7.3 don’t support performing tests either. Again nvme-cli is a superior package in this regard.
One ponders what to do next… Maybe for the time being we can show the user that his nvme drive has SMART, but other stuff is not yet supported.
Yes, that was my suspicion, the output does look to be a little too different.
Sounds like we need a proof of concept pull request for this one. Fancy having a go? We have not had a SMART parsing failure reported for a number of years now so I’m reluctant to tear the whole thing down currently. But we could special-case if nvme type is found.
OK, so another special case likely here, given all other devices are currently reported ok, at least from the silence in recent years feedback anyway.
Oh dear. I think we would be better skipping this initially in favour of watching the smartmontools space. Expecially given all that we see currently also didn’t exist in our CentOS days. It can be a can of worms to perpetually expand into progressively more niche tools.
Same as my last comment on this one I think.
Agreed. However I think this is all likely going to have to wait until our next testing branch phase as the existing testing phase is now wrapping up to the first stable poetry based build. Plus smartmontools is not standing still, so we will likely have more to chew on there when it’s next visited.
Also we could suggest to folks in our fledgling nvme smart support that there is a cli option such as you have suggested: nvme-cli
Some excellent finds here. Thanks for following this issue up so persistently.