Fixed Versions: Linux SACK Attack – Denial of Service

The recently published CVE-2019-11477 and CVE-2019-11478 attacks enable an attacker with access to a TCP port on your server (most everyone, including those with web or mail servers) to either:

  1. Slow it down severely
  2. Cause a kernel crash

See the NIST publication for more detail:

https://nvd.nist.gov/vuln/detail/CVE-2019-11477

Upstream distributions have released fixes for these as follows.  The 2019-11478 vulnerability is an issue as well, but the -11477 issue has higher impact so we are listing it here.  So far as I have seen, the fix for both is in the same package version so you only need to reference the -11477 articles:

Mitigation

You can mitigate this attack with iptables. If you are using fwtree, our latest release for el6 and el7 includes the mitigation (version 1.0.1-70 or newer). Of course it is best to update your kernel, but this provides a quick fix without rebooting:

# [ -d /etc/fwtree.d ] && yum install -y fwtree && systemctl reload fwtree && iptables-save | grep MITIGATIONS

You can also do it directly with iptables:

# iptables -I INPUT -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP
# ip6tables -I INPUT -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP

You can also disable TCP selective acks in sysctl:

# Add this to /etc/sysctl.conf
net.ipv4.tcp_sack=0

Red Hat / CentOS / Scientific Linux

Vendor security article: https://access.redhat.com/security/cve/cve-2019-11477

Fixed Versions

  • el5: not vulnerable (and EOL, so upgrade already!)
  • el6: kernel-2.6.32-754.15.3.el6
  • el7:  kernel-3.10.0-957.21.3.el7

Ubuntu

Vendor security article: https://usn.ubuntu.com/4017-1/

Fixed Versions

  • Ubuntu 19.04
    • 5.0.0.1008.8
  • Ubuntu 18.10
    • 4.18.0.22.23
  • Ubuntu 18.04 LTS
    • 4.15.0-52.56
  • Ubuntu 16.04 LTS
    • 4.15.0-52.56~16.04.1
    • 4.4.0-151.178

Debian

Vendor security article: https://security-tracker.debian.org/tracker/CVE-2019-11477

Fixed Versions

  • jessie
    • 3.16.68-2
    • 4.9.168-1+deb9u3~deb8u1
  • stretch
    • 4.9.168-1+deb9u3
  • sid
    • 4.19.37-4

SuSE

Vendor security article: https://www.suse.com/security/cve/CVE-2019-11477/

Fixed Versions

For SuSE, there are too many minor version releases to list them all here. To generalize, if you are running a newer kernel than these then you are probably okay, but double-check the vendor security article for your specific release and use case:

  • Pre SLES-15:
    • 3.12.61-52.154.1
    • 4.4.121-92.114.1
    • 4.4.180-94.97.1
    • 4.12.14-95.19.1
  • SLES 15
    • 4.12.14-150.22.1
  • Leap 15
    • 4.12.14-lp150.12.64.1

Vanilla Upstream Kernel (kernel.org)

Security patch: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3b4929f65b0d8249f19a50245cd88ed1a2f78cff

Fixed Versions

  • 5.1.11
  • 4.19.52
  • 4.14.122
  • 4.9.182
  • 4.4.182
  • 3.16.69

librsync error: “RS_DEFAULT_STRONG_LEN” undeclared

We needed to compile an old version of rdiff-backup on CentOS 7 but got the following error:

 _librsyncmodule.c: In function â_librsync_new_sigmakerâ:
_librsyncmodule.c:63:17: error: âRS_DEFAULT_STRONG_LENâ undeclared (first use in this function)
 (size_t)RS_DEFAULT_STRONG_LEN);
 ^
_librsyncmodule.c:63:17: note: each undeclared identifier is reported only once for each function it appears in
_librsyncmodule.c:63:9: error: too few arguments to function ârs_sig_beginâ
 (size_t)RS_DEFAULT_STRONG_LEN);
 ^
In file included from _librsyncmodule.c:25:0:
/usr/include/librsync.h:370:11: note: declared here
 rs_job_t *rs_sig_begin(size_t new_block_len,
 ^
error: command 'gcc' failed with exit status 1

The librsync library changed the colling convention of `rs_sig_begin` so if you get an error like that, then a patch like this might help:

]# diff -uw _librsyncmodule.c.ORIG _librsyncmodule.c
--- _librsyncmodule.c.ORIG 2006-11-11 23:32:01.000000000 -0800
+++ _librsyncmodule.c 2018-02-20 11:22:06.529111816 -0800
@@ -59,8 +59,8 @@
 if (sm == NULL) return NULL;
 sm->x_attr = NULL;

- sm->sig_job = rs_sig_begin((size_t)blocklen,
- (size_t)RS_DEFAULT_STRONG_LEN);
+ sm->sig_job = rs_sig_begin((size_t)blocklen, 8,
+ (size_t)RS_MD4_SIG_MAGIC);
 return (PyObject*)sm;
 }

-Eric

Issues Upgrading CentOS/RHEL/Scientific Linux 7.2 to 7.3

If you’ve been a systems administrator for awhile, then you know it’s best practice to have security updates to install automatically—and you also know that this breaks things from time to time. This happened to use when EL 7.3 came out a few months ago and caused unexpected issues with systems running KVM, libvirt, and LVM2 with large quantities of snapshots (4,480 and counting!).

The first issue that we discovered was virtual machine lockup during live migration. This is related to an MSR_TSC_AUX update that Redhat pushed into 7.3, but for which the Linux 4.1.y stable branches had not yet merged the kernel update to support this. While I’ve not yet tested 4.1.39, it appears to have those patches. Most users will not experience this particular bug if they are using the vendor provided EL7 kernel—but if you are using 4.1 in order to have stable bcache support, then you might run into this. You can read more details on the patches here: https://patchwork.kernel.org/patch/9538171/

Shortly after we discovered the first issue (but before we had time to fix it), we discovered that LUKS passthrough crashes libvirt unless you are using libvirt’s keystore. Since we pass encrypted volumes directly into the virtual machine and let the virtual machine unlock the volume, this was causing endless segmentation faults of libvirtd as systemd restarted it after failure. After much troubleshooting and inspection with GDB to figure out where the problem actually was, we discovered that libvirt was assuming that all LUKS volumes have a key in their keystore. This has been fixed in the latest version, and more information about this is available here: https://bugzilla.redhat.com/show_bug.cgi?id=1411394

Not to be outdone, the 7.2 to 7.3 upgrade was also causing segmentation faults of dmeventd. At the time, we did not know that it was a bug in LVM2—but having a third issue compounded with the two above, it was time for more drastic measures: Revert the packages! After installing the EL7.2 version of libvirt, KVM, and LVM2 (and their dependencies), we were back up and running.

Feeling brave, we decided to try the 7.3 upgrade again today since the first two issues were fixed. At the time, we didn’t really know the third issue was an issue independent of the others, so this was our first opportunity to investigate. This issue is still outstanding, and the actual problem is unclear. We have found the first bad commit (9156c5d dmeventd rework locking code) in LVM2 and posted to the lvm-devel list, so hopefully this will be fixed soon. For the moment we are holding back LVM2 updates which seems to be working fine with the rest of the system packages upgraded to 7.3. You can read more about the beginning of this fix here: https://lvm-devel.redhat.narkive.com/xxKNaNG6/bisect-regression-segv-9156c5d-dmeventd-rework-locking-code

So is it time to 7.3 from 7.2? Yes! But only if you hold back LVM2. The easiest way to do this is to add the following to your /etc/yum.repo.d/CentOS-Base.repo in the [base] and [updates] sections:

exclude=lvm2* device-mapper*

Update: Tue Apr 4 16:25:39 PDT 2017

The LVM problem was related to the reserved_stack value in /etc/lvm/lvm.conf being too high on our system. Somehow this introduced a regression in LVM2 since it certainly worked before in EL7.2 .

So, if you get an error like this, shrink your reserved_stack and see if it fixes the problem:

kernel: dmeventd[28383]: segfault at 7f9477240ea8 ip 00007f9473f24617 sp 00007f9477240eb0 error 6 in liblvm2cmd.so.2.02[7f9473e83000+191000]

-Eric

Show the virtual machine name in dstat instead of showing qemu

Do you run dstat to watch Linux KVM hypervisors, but wish process names showed virtual machine names?  Me too.

This patch does just that:

--- a/usr/bin/dstat	2009-11-24 01:30:11.000000000 -0800
+++ b/usr/bin/dstat	2014-11-07 10:20:09.719148833 -0800
@@ -1946,6 +1946,12 @@
         return os.path.basename(name)
     return name

+def index_containing_substring(the_list, substring):
+	for i, s in enumerate(the_list):
+		if substring in s:
+			return i
+	return -1
+
 def getnamebypid(pid, name):
     ret = None
     try:
@@ -1956,6 +1962,10 @@
         if ret.startswith('-'):
             ret = basename(cmdline[-2])
             if ret.startswith('-'): raise
+        if any("qemu" in s for s in cmdline):
+            idx = index_containing_substring(cmdline, '-name')
+            if idx >= 0:
+                ret = cmdline[idx+1]
         if not ret: raise
     except:
         ret = basename(name)

Linux Kernel bug from 2002?

Really Old Bugs

Apparently there is a bug from kernels as old as 2.5.44 that pop up every so often causing hours of work for developers to hunt down.  Hopefully it can be fixed upstream, or maybe this is a “won’t fix” for some very good reason that I am unaware of:  http://osdir.com/ml/linux.enbd.general/2002-10/msg00176.html .  In my opinion, an issue like this should give some meaningful error rather than causing deadlock.

 

The fix

Basically add_disk (and therefore register_disk() where the problem actually resides) must be called *before* set_capacity() in Linux block device drivers.  This is backwards of the way I would think, as I would configure the device parameters before publishing it into userspace—but that is backwards in the Linux kernel and can (will?) cause deadlock.

Upstream

Recently I encountered this issue/bug in a zfs-git (zfsonlinux) build.  I’ve resolved the kernel hang and I’m working on a minimal patch for ZFS.  For now follow this ZFS ZVOL issue on github: https://github.com/zfsonlinux/zfs/issues/1488 .

Update: a pull request is pending here: https://github.com/zfsonlinux/zfs/pull/1491 and a patch has been listed on the issues page.

 

-Eric

Sparse-file Support for rdiff-backup

Massive LVM snapshots use lots of space on your backup destination. Virtual machine volume images are (often) mostly empty, especially if more disk has been allocated than the VM is currently using. In such a case, it makes sense only to store nonzero blocks of data. 

This is a patch to rdiff-backup 1.2.8 to add sparse file support.
More info is available on the rdiff-backup wiki.

UPDATE [updated Sun Jan 2 19:49:50 PST 2011]
This is an updated (more efficient/faster) patch to support sparse files
I’ve also written a patch that aligns rdiff-blocksizes for files >1GB on “Globals.blocksize” boundaries (currently 1024*128). This works much better for RAID devices than the “square-root” approach for smaller files, as reads are aligned on 128k boundaries instead of 16-byte-aligned boundaries. See the patch for details.

-Eric