NFS Causes Corruption of Filesystem

Walt · February 22, 2023, 7:09pm

This follows on from the thread started in [NFS Advanced Edit Incorrectly Checks Existence of Shares - #8 by Walt]. Due to the severity of the bug recently discovered it needs to have its own topic.

As described in the referenced thread, I used a workaround to add two lines to /etc/exports via the NFS Advanced Edit button in Rockstor. Both lines began with “/export/…”. This did not appear to cause any issues. (See the thread referenced above for all the details.) Recently I tried exporting a third subdirectory using that same method and that bug returned, along with an additional corruption of the filesystem.

For clarity, this will be explained with the following example.

Consider a share named AAA.
There are folders in AAA named bbb, ccc and ddd.
The NFS exports are:
- /export/AAA
- /export/AAA/bbb
- /export/AAA/ccc
- /export/AAA/ddd
When ddd was not exported, all three exports mounted fine on the client. “ls” of the AAA mountpoint on the client listed directories /bbb, /ccc and /ddd, along with other contents in AAA. “ls” of mountpoints of /bbb and /ccc listed the expected contents of those directories. “ls /mnt2/AAA” typed from the console of the exporting server also listed the expected contents. AAA was also exported via Samba and its contents were available on a Windows PC where it was exported to.
When the export of /ddd was added, all four NFS exports mounted successfully on the Linux client. “ls” of the mountpoint of AAA returned an empty list. “ls” of the mountpoints of /bbb and /ccc returned empty lists. “ls” of the mountpoint of /ddd showed the expected contents of /ddd. The contents of /ddd were readable, but the transfer rate was slow. On the console of the server, “ls /mnt2/AAA” returned an empty list. From the Windows PC, AAA also appeared to be empty.

This means that the NFS bug is not only affecting NFS it is also corrupting something in the filesystem of the host. It is not known what would happen if data was written to the file system while it was messed up like this, but data loss or corruption of the share on the drive could reasonably be expected.

The visibility and normal function of the share returned when /export/AAA/ddd was removed. From the server console, “ls /mnt2/AAA” returned the expected contents. From the Windows PC, AAA then showed the expected contents.

In conclusion, NFS, as implemented in openSUSE as shipped with Rockstor 4.5.6, has a serious bug that has the potential to corrupt the filesystem of the exporting server. The bug shows up sporadically when exporting subdirectories of a filesystem. So far I’ve observed no instances of this issue when only the top-level of a share is exported.

stitch10925 · February 27, 2023, 10:43am

That is really good to know! I’m using NFS shares as shared docker volume store, so I’m happy I hit upon this post.

Are any error logged when this happens?

Walt · March 1, 2023, 8:49pm

Admittedly, I did not review the logs when I noticed it get into this state.

I think the takeaway here so far is not to abandon NFS (it is SO much better than Samba) but at to watch for the symptoms seen here when setting up new exports. I backed off to using the initial two subfolder exports that worked for me before, and it seems to be holding steady.

But the developers should see if they can reproduce the bug, and then kick it upstairs to the OpenSUSE guys.

Hooverdan · March 2, 2023, 6:19pm

@Walt, since I will have some time I am trying to recreate your above scenario.
Before I go down the wrong path, here’s what I understood for the basic setup

The directory structure on Rockstor (4.5.7-0)

/mnt2/AAA
/mnt2/AAA/bbb
/mnt2/AAA/ccc
/mnt2/AAA/ddd

Each of the sub-directories populated with 50 unique files (not explicitly called out in your scenario above, but obviously some files are needed to observe the behavior).

Create the first export the “Rockstor” way (i.e. WebUI) resulting in:
/export/AAA *(rw,sync,insecure) in the /etc/exports file.

Manually add some/all (for testing different scenarios) of these to the /etc/exports file:

/export/AAA/bbb *(rw,sync,insecure)
/export/AAA/ccc *(rw,sync,insecure)
/export/AAA/ddd *(rw,sync,insecure)

Stop/restart NFS service

On the client machine (in this case an Ubuntu instance with nfs-common installed, version 1:2.6.1-2ubuntu4.1), create mount point directories, e.g.:

/mnt/rAAA
/mnt/rbbb
/mnt/rccc
/mnt/rddd

start mounting the exports, beginning with:
sudo mount <RockstorIP>:/AAA /mnt/rAAA
and then, depending on what scenario to test:

sudo mount <RockstorIP>:export/AAA/bbb /mnt/rbbb
sudo mount <RockstorIP>:export/AAA/ccc /mnt/rccc
sudo mount <RockstorIP>:export/AAA/ddd /mnt/rddd

Investigate the display of files across the various mountpoints/directories.
Perform some write/change operations in the various shares, observe outcome for obvious corruption.

Does that sound about right?

Walt · March 2, 2023, 8:40pm

Hi Dan,
Yes, you’ve got the basic setup. I’ve commented on some particular points below.

I didn’t use * but specified one particular client computer for all the exports. In case that matters. Also, I actually used Rockstor’s NFS “Advanced Edit” button to export all of the subdirectories (using the workaround described in the referenced thread). Seems to me that it wouldn’t matter, but wadda I know?

My client is Garuda Linux “Dragonized”. Kernel version: 6.2.1-zen1-1-zen. I don’t see how that could matter though, since the corruption was seen on the server and not only on the client.

I didn’t create mount point directories in /mnt/ but in /home/walt/. Again, I don’t see how that could matter to the server, but just mentioning it for completeness.

Yep. Also watch for shading/hiding of files from the server console (log on as root and do “ls”). The first and main thing observed was the shading/hiding of files that should have been visible in the directory, so that would be the main thing to test for. IMHO, if you can definitively reproduce that, then you’re done. If the USA can definitively prove that NORAD defenses are down, they don’t have to lob a missile at themselves to prove that that’s a bad thing.

I initially noticed the issues when I tried to put the subdirectory exports into a file named

/etc/exports.d/rockstor.exports

as described in detail in the thread I linked at the very top of this topic (I won’t repeat all that detail here). Later I changed the export method to the method described here and the issues first showed up when a third subdirectory was exported. I doubt that three is a magic number. You may have to export five or ten, who knows (I had a total of 10 exports when this bug showed up). You might also try giving directories longer names in case string length has something to do with it. Or try export a directory within a directory within a directory, deeper down the tree. This bug behaves a lot like a memory overflow, in which case these sorts of tests should have a good chance of reproducing it. Good luck!

Hooverdan · March 3, 2023, 2:38am

thanks for the clarifications. I have not adjusted for the specific client settings, or put the mount points, etc. in the places you have used. I’ll try that later.

I’ve done a bit of testing, and so far nothing has broken…

I’ve gone up to 11 exports now, all nested underneath AAA
The server directory lay out looks like this currently (tree -d)

AAA
├── bbb
│   ├── eee
│   ├── fff
│   └── ggg
│       ├── evenlongerdirectorynameshere
│       └── wemakelongdirectorynames
│           └── anotherdirectoryoverhere
│               └── andanotherlongname
├── ccc
└── ddd

All but the andanotherlongname directory contain between 50 and 100 files.

On the server side I can ls /mnt2/<directory on NFS export> and I get a complete file list and subdirectory names back.

/export/AAA *(rw,sync,insecure)
/export/AAA/bbb *(rw,sync,insecure)
/export/AAA/ccc *(rw,sync,insecure)
/export/AAA/ddd *(rw,sync,insecure)
/export/AAA/bbb/eee *(rw,sync,insecure)
/export/AAA/bbb/fff *(rw,sync,insecure)
/export/AAA/bbb/ggg *(rw,sync,insecure)
/export/AAA/bbb/ggg/evenlongerdirectorynameshere *(rw,sync,insecure)
/export/AAA/bbb/ggg//wemakelongdirectorynames *(rw,sync,insecure)
/export/AAA/bbb/ggg/wemakelongdirectorynames/anotherdirectoryoverhere *(rw,sync,insecure)
/export/AAA/bbb/ggg/wemakelongdirectorynames/anotherdirectoryoverhere/andanotherlongname *(rw,sync,insecure)

Mapped all these on the Ubuntu client.
Also created a SMB export on directory AAA
Made changes to individual (and different) files on all three instances (server, Ubuntu client, SMB on W11) - was able to view changes on server/client/SMB
Multiple changes to the same files from all three places (one after another) - no issues
Copied files from one directory to another using the server, client and smb - no issues that I could observe (well, since the GID/PGIDs are not quite aligned with how I logged into SMB, I had to chown a few times) - performance seemed fast, too. Of course I am working with 2 VMs on the same host machine, so that doesn’t necessarily mean anything.

I guess, I have to become more elaborate in my nesting layout (at some point I assume W11 will barf at the length of the directory path, though I am not sure whether that’s 255 characters or more). The longest path I currently have, including file name is 99 characters …

Flox · March 3, 2023, 5:11pm

Thanks a lot @Hooverdan for this exemplary investigation here and on the other related thread(s) as well!
Thanks a lot to you too @Walt for providing even more information; overall we seem to have a good picture of what was tried so I’ll try to summarize below where I think we are.

@Walt, forgive me if you’ve already tried the same way as @Hooverdan above, but I think the difference between your observations and @Hooverdan’s might reside in the following bit, actually:

differs from:

Indeed, when “actually” creating the export, Rockstor refreshes the /etc/exports file by first bind mounting the share to be exported at /export:

github.com

rockstor/rockstor-core/blob/578c53e0422b811d096d74feb386b8c21126d247/src/rockstor/system/osi.py#L709-L710


      
          if not is_mounted(e):
              bind_mount(exports[e][0]["mnt_pt"], e)

When using your workaround from the Advanced Edit UI, this may create some sort of conflict after this bind mounting with nested paths that could be a reason for the issues you seem to be having.

Whether or not we should keep using this bind mounting step is probably a different conversation as I can think of several complications that may entail if we do skip that; I can also see the benefit from this simplification, however. It seems we need to have a better idea of the culprit behind the observations you made as it could also be something else such as the question of the NFS version used by the client, as raised by @Hooverdan in another thread.

One thing is clear, however: we should revise our share validation procedure in the Advanced Edit UI as our previous conversation brought up (related issue updated accordingly: exporting arbitrary directories via NFS · Issue #1078 · rockstor/rockstor-core · GitHub).

If you’re game, @Walt, maybe you could try undoing your workarounds and trying with the exports.d/*.exports method but without the parent folder (so that there is no nested bind mount)? This may clarify the situation a bit.

Walt · March 3, 2023, 6:50pm

Awesome. Great start.

It’s hardly surprising that the bug didn’t just immediately show up. Something really easy to reproduce would have been found and fixed before now. I’ve been reluctant to do experimentation on my system until I get all my backups caught up (and the runway to doing that turned out to be unexpectedly long). I should be ready to try a few more tests here in a few days.

You’re running the server and client in VMs on one computer? I have no experience with VMs, but let’s stick a pin in that, in case it becomes relevant further down the road.

Do you (or Flox, or anyone) know if Linux has event logging similar to MS Windows? Windows programs can have multiple channels of event logs, e.g. one channel might log events for security and permissions negotiations, another might log file access events, another might log accounting/billing information, etc. Most of these channels are turned off by default, but users/developers can switch on whatever channels they need. Most importantly, these logs can be turned on or off without having to recompile the program or even exit and restart, so that when an elusive bug shows up the appropriate logging can be switched on without disturbing the system state. Anything like this in Linux?

— Flox —
Just saw your msg pop up while writing this, I’ll reply here…

Good observation on the differences in steps used to export shares, between me and Dan. That for sure needs to be tried. Recall that I first encountered a form of this bug earlier, using a different method of exporting subdirectory shares. So it’s all good, we need to try all kinds of approaches to learn more about this bug and when it strikes and when it does not.

Info for Dan:
The dummy shares that I created were in the ROOT pool, which is automatically created. The detailed description of the workaround is in the thread referenced at the top of this thread. Let me know if any of this needs additional clarification.

Okay, this is odd. I just looked at the dummy shares in my system, and one has a usage of 2.04 GB and the other one is at 1.39 GB. It’s weird because I never exported either of these shares or put anything in them. When first created they both showed usages of 0. Does the BTRFS filesystem alone take up that much space? It’s another data point, might be helpful in narrowing down what’s going on behind the scenes.

In a few days I’ll try some further experiments with NFS export methods and post my results here.

Walt · March 4, 2023, 7:29pm

Hey guys, brief update. I did some work to try reproduce the NFS bug here again.

I am not able to set up the exact same shares and exports I did before because my system config has changed. But I set up a test branch in the server’s filesystem and populated it with some folders and files, and set up some test mountpoints on the client system. One of the folders in the test branch was exported using the “Advanced Edit” workaround method, and mounted on the client system. (So there were a total of three subdirectories exported via this method.)

Result: I didn’t see random folders and files disappearing from view this time, but nevertheless an issue occurred. The NFS mounting on the client apparently succeeded (no error messages), but the mounted directory appears to be empty. The exported folder still shows the expected files inside when viewed via a different method. The ownership of the mountpoint changed to root:root. (I didn’t mention it before but in the previous times when mounted folder contents disappeared, the folder ownership had changed to root:root every time.) The permissions are: Owner: View & Modify; Group: View; Others: View.

I believe this is still the same bug in action, but the particular symptoms vary depending on what random bits’n’bytes fall into the memory overwrite area. Ownership changing to root:root is easy to understand as a symptom of a memory overwrite. UID 0 is assigned to root, and GID 0 is assigned to group root. Unused memory is commonly initialized to zeroes. If a memory overwrite strikes a pointer, that pointer will then point to the wrong place in memory. If it points to an unused memory region, the object or struct at the pointed location will have a whole bunch of zeroes in its fields. NFS will look in certain data fields for the UID and GID information, see zeroes and interpret that as root:root.

This seemed easy to reproduce (on this end) so give it a try.

Hooverdan · March 4, 2023, 11:35pm

Just a general question, if we agree that the Advanced Edit for nfs exports should not have a share is mounted validation, and we were to change the writing of these exports into what @Flox suggested and put them into something like /etc/exports.d/rockstor.exports, then the need to create dummy shares to bypass the validation and possible complexities with mounted subvolumes that share a name with a “real” subdirectory on another share, etc. would not be necessary, correct?

Without being the expert but only having been on the receiving end of mount points ending up “hiding” directories on other Linux servers (where the admin did something that caused temporary file loss, thank you for backups ) I am wondering whether we’re just introducing a variable too many while exploring this?

While I wait for your opinions, I went ahead and left the main export (i.e. share AAA) in its place in /etc/export, managed by the WebUI. I then proceeded and added the other exports to a newly generated directory/file combination: /etc/exports.d/rockstor.exports. All this with export in its path like the original NFS export, which Rockstor would likely continue to set it up (or get rid of it entirely, but then it would have to be consistent paths for those nested folders, too).

Subsequently mounted with the Ubuntu client, I could still see everything as I have reported before, as well as the permissions intact (i.e. no root:root folder ownership like you saw in your setup instances on the client while looking different when viewed from the server/Rockstor side).

Walt · March 5, 2023, 3:12pm

Correct. Other than for the purpose of tracking down the bug in NFS.

If the bug in NFS can be reproduced without doing these actions, then great. I’ve seen this bug show up when using /etc/exports.d/rockstor.exports for exporting subdirectories (without that workaround). Whatever reproduces the bug.

Walt · March 5, 2023, 3:54pm

Potential Security Vulnerability

Another implication of this bug hit me just now. (Sometimes it takes a while. My brain isn’t overclocked.) If the ownership of an NFS mount can change randomly to root:root then if an attacker was to intentionally place a different value into the memory locations being interpreted as UID and GID then they could give it any ownership they want.

Suddenly, the nature of this bug changes from “NFS may not export subdirectories properly in some circumstances,” to “NFS has an unlocked back door that an attacker could use to take over an entire Linux filesystem.”

phillxnet · March 5, 2023, 7:35pm

@Walt Steady now. We are getting a little alarmist here and I’d like just to put this in perspective.

We have no proof this is a buffer overrun or any other type of actual bug in NFS, likely this is how we/you are doing mounts etc, and seems to pertain to the use of our Advanced box and likely ‘fake’ shares created to get around how we do stuff. NFS is used by CERN etc and is extremely well tested. We are looking for what we do here and assuming upstream is solid. And of course how we might accommodate what you would like for us to do (within reason). And yes, I have been involved myself in upstream NFS bug confirmation (Ubuntu kernel bug reported and fixed by the RedHad team a few years ago) so yes I am aware of such things. And reproducers were involved .

Linux is not a toy and some-how behind windows for logging!! You speak of:

It has a little bit more that that. Remember it’s open source and so in a completely different realm all-together than any closed source OS can ever be. Here, in linux land that is, we can go down to the code/machine-code itself, and yes I have done a little of that also (when I was 13 initially 6502 & Z80).

Lets stop the super alarmist speak here shall we. And get back to how we can better accomodate your scenario.

If you need to report upstream what you think you have found then reproduce your issue reliably on a Tumbleweed instance. Here we do our Rockstor stuff. There are no NFS or kernel developers in this team, unless you count a character of mine a couple of decades ago to a driver :). And the project founder @suman_chakravartula has some ZFS contributions I believe. And it wouldn’t suprise me if we don’t have some lurkers with way more experience upstream than that .

Hopefully this has put to rest what I think has gotten way out of context here. NFS ownership (not randomly changing mind) is not a buffer overrun poking strange things into memory that then ends up ending the world. Please lets keep this civil and within our mutual expertise. Look-up linux logging for example; and what linux actually is (read very long in the tooth with unix foundations - and NFS is there from the early days).

NFS is complex (through the ages) and has mutated over the decades/major versions on the preferred way it is to be used. We are likely doing-it-wrong, or sub-optimally - it’s been a while . Lets just stick to that and track down how we can accommodate your requirements if they are sane/doable/likely to help others similarly. Mounting stuf, within stuff from where-ever is non trivial and we have @Flox bind mount mentioned earlier. @Hooverdan seems to have a handle on what you are wanting here so lets stop with the alarmist speak from nowhere other than enthusiasm. With the latter being all good if you arn’t throwing around false accusations for what-ever reason.

We must keep our focus here. We are not NFS developers. We are Rockstor developers and users with some Linux knowledge gained along the way. And mostly happy to help where construcive criticism and active involvement is forth-coming. Which it has been with your numerous active reports by the way. Thanks.

References to inform your ideas of buffer overflow and security concerns re NFS:
https://www.kernel.org/doc/html/latest/filesystems/nfs/index.html

Cheap course (4 days $3250) to acquire some grounding on what you are chasing:
https://training.linuxfoundation.org/training/linux-kernel-debugging-and-security/

Hope that helps. Again thanks for your constrictive involvement to date. And really, we/us are likely just doing it wrong, or sub-optimally. Lets more to working towards doing it at least better. And drop the trolling / alarmist tone. I’ll refrain from the quote as it’s obvious in your last post!!

Walt · March 6, 2023, 3:11pm

Aaah, greetings Phil, welcome to the discussion.

I concede a point to you: no security vulnerability has been proven, it is only suspected. I have amended my previous post to say potential security vulnerability. Also, my statement “NFS has an unlocked back door…” is inaccurate. I didn’t mean “NFS” as the protocol and standard, I meant this particular implementation and instance of NFS. I figured that would be understood from the context. Anyway, moving on…

All of us here really don’t have a way to prove or disprove whether a memory overwrite is actually occurring, that’s out of scope for this project. Our responsibilities would be to recognize the characteristics and symptoms of a memory overwrite, do our best to reproduce it and then kick it upstream to the OS developers and say, “Hey guys, we’ve come across a potential issue here, you might want to look into this”.

So does all this rise to the level of saying that there is a potential security vulnerability here? Well, IMHO it still does. But it’s not my party, so I’ll let you guys decide. I’ve documented my observations in detail and explained my rationale for any conclusions made in these two threads, it is all there. So I invite you guys to review the particulars, come to your own conclusions and decide what you want to do with this.

I’m going to go over here, keep quiet and just be a regular user for a while.