NIC driver bug (was: Need to roll back to 3.8-12)

3.8-13 is not stable for me. Since installing it, I have had frequent crashes. (Perhaps you should have used Chicago numbering!) How safely roll back to 3.8-12? If I simply reinstall from the 3.8-12 ISO, will my storages be preserved? Will anything hinky happen?

I had been running 3.8-12 for a month or two with absolutely no issues of any kind.

As soon as I upgraded to 3-8-13, I began experiencing outages of ~10 minutes’ duration during which the SAMBA shares were not addressable by clients and the Rockstor server did not respond to pings.

But the server was running, as shown by the console.

dmesg showed, at the END of each outage, a single line: “alx 0000:01:00.0 enp1s0: NIC Up: 1 Gbps Full” .

alx is the kernel driver for Atheros wired Ethernet chips.

A bit of googling showed that there is a great deal of chatter out there about a bug or bugs in that driver, going back to kernel 3.x, the common symptom being that traffic stops in one direction. Most manifestations are accompanied by dmesg entries referring to wild interrupts etc. None has ever been reproducible enough to fix.

Time being money, I went out yesterday and got an Intel NIC. This is NOT a product endorsement; my local shop had only this or no-names. I Googled the chip on this NIC (Intel 82574) and found that the Linux kernel driver for this chip, e1000e, is actively maintained by Intel as well as elrepo.org, whose kernels are used by Rockstor. Since installing the Intel NIC yesterday, I have had no further outages.

You may draw your own conclusions as to whether you ought to be running an Atheros NIC under Linux.

It is not at all clear why the Rockstor upgrade to 3.8-13 should have caused this bug to being manifesting itself. There should be no coupling with the upper layers of software. But one researcher found that specific legal bits in the header of a IP packet could either (A) halt the Atheros chip from transmitting or (B) prevent it from choking on subsequent occurrences of the poison bits. This kind of thing could, of course, be pre-emptively worked around in a proprietary driver…

2 Likes

The report cited above is here: http://blog.krisk.org/2013/02/packets-of-death.html

Okay I got COMPLETELY CONFUSED. This report does not refer to the Atheros chip, but to the Intel chip. Disregard the last graf of my first reply above.

SORRY
FW
.

same here it’s the kernel that is the problem centos team is patching soon.