Meltdown BUG: What about KVM/Xen/Docker/OpenVZ/LXC/PV-Xen/HyperV?

January 4, 2018 by ewheeler

Different Variants: Meltdown and Spectre

This article discusses only Meltdown and its affect on hypervisor environments since it is the easiest to implement. Note that Spectre is capable leaking hypervisor memory from all hypervisors running on affected processors (Intel and possibly AMD, ARM) but it is both more difficult to exploit and to mitigate. Please read on to understand how Meltdown affects your virtualization stack:

How Meltdown Affects Virtualized Environments

Every hosting provider held their breath over the past week wondering if the as-of-yet undisclosed Intel hardware bug now released as “Meltdown” would affect their visualization stack. They all want to know: is this a hypervisor escalation!? Here in this post we use the word “affected” meaning guest-to-hypervisor memory read access.

The Meltdown bug enables reading memory from address space represented by the same pagetable—anyone using virtual page tables is unaffected between virtual tables. That is, Guest-to-Host pagetables are unaffected, only Guest-to-Guest or Host-to-Host, and of course Host-to-Guest since the host can already access the guest pages.

For a hosting provider this means different customer VMs on the same fully-virtualized hypervisor cannot access each others’ data—but—different users on the same guest instance can access each others’ data. This latter part holds true for non-virtualized hardware as well: users under the same OS kernel can access each others’ data. Thus, containers are affected!

Which Technologies Are Affected?

Fully virtualized technologies are not affected in the sense that guests cannot access host (hypervisor) memory. However, an unprivileged guest process can still access privileged (and other unprivileged) guest process memory pages. Container-based technologies are affected by Meltdown across container boundaries.

Affected Virtualization Technologies

Anything container based: neighbor containers can read other neighbor containers process memory.

Docker
LXC
OpenVZ
UML
Paravirtual Xen
Chroot Jails

Unaffected Virtualization Technologies

Any fully virtualized technology is unaffected.

KVM
Xen HVM
HyperV
VirtualBox (if using VT)

Solutions

Update your distribution kernel if your OS distribution has released an update for CVE-2017-5715, CVE-2017-5753, and CVE-2017-5754. See this post for updated distribution kernel versions that address these CVEs: https://www.linuxglobal.com/spectre-meltdown-security-updates-distribution/
If you cannot do #1, your best option is to install Linux 4.15-rc6 or one of the supported vanilla kernel patches in the link above. On all systems. Yes, 4.15-rc6 is a release candidate, but this kernel is receiving wide spread testing because of this bug.
If this is not an option and you mostly trust the code running inside of the container, then you could run your container instances under KVM to isolate them from eachother to protect your guests and privileged container.
If running Xen-PV, switch everything to Xen-HVM and hope for the best. Many operating systems will boot in either environment unless your guest kernel was built specific to Xen PV—but there could be driver issues between the two.
If you do not trust your users on a single host, then your best option is #1 above.

Remember, the only real fix is to install an updated kernel on all servers, physical or virtual. Solutions 3and 4 only mitigate the problem since the guest is still vulnerable to interprocess memory reads.

Help!

We can help! Just give us a call or send an email so we can make a plan and get you running secure, once again!

-Eric

Recovering from a Cloud Server Failure

June 12, 2017 by ewheeler

If you have received an email or a support ticket from your provider like the message below, then chances are your data is no longer available. We may be able to work with your provider to recover data and would be happy to help if this could be done. Cloud Servers fail for a number of reasons, and not all failures cause permanent data loss.

Your cloud server could not be recovered due to the failure of its host. You have the option to rebuild your server from your most recent server image or from a stock image … We apologize for any inconvenience this may have caused you.

If there is permanent data loss, then your only option is to plan for the future and move ahead—and we can help with that too.

Backups are certainly useful for recovering data, but they take a long time to put back into service. As you deploy your new environment, consider redundant designs for your infrastructure and let us know if we can help prevent this in the future!

-Eric

Protect Your Server from Ransomware

May 12, 2017 by ewheeler

Protect Your Business Servers from Ransomware!

The fundamental problem exploited by Ransomware is a lack of backup. At Linux Global, we protect our customers from Ransomware in a number of ways:

Frequent offsite backups
Realtime data replication for databases
Frequent server snapshots with offsite image backup

With these systems in place your data is safe, even in the event of your files being encrypted by Ransomware.

We can even protect your Windows server by using Linux to virtualize your infrastructure and maintain hourly snapshots with frequent offsite backups.

Call today and Ransomware-proof your infrastructure!

CentOS/RHEL/Scientific Linux 5 404 Not found

April 3, 2017 by ewheeler

Problems Updating Packages

You may have landed here because of errors like the following:

 ~]# yum upgrade
Loaded plugins: fastestmirror, security
Determining fastest mirrors
 * base: mirrors.kernel.org
 * extras: mirror.chpc.utah.edu
 * updates: mirror.steadfast.net
http://mirrors.kernel.org/centos/5.11/os/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404: Not Found
Trying other mirror.

This is because on March 31st, all of the EL5 derivatives are now end of life. It is time to update to EL6 or EL7.

If you are unable to update immediately, but would like to get your OS as up to date as possible, then CentOS users can update their repositories from mirror.centos.org to vault.centos.org in /etc/yum.repo.d/CentOS-*.conf . For Scientific Linux, you will want to use the following link in your repository files: http://ftp.scientificlinux.org/linux/scientific/obsolete/511/$basearch/SL/.

We have migrated many CentOS/RHEL/Scientific Linux systems and would be happy to assist if you need help!

-Eric

Issues Upgrading CentOS/RHEL/Scientific Linux 7.2 to 7.3

March 31, 2017March 16, 2022 by ewheeler

If you’ve been a systems administrator for awhile, then you know it’s best practice to have security updates to install automatically—and you also know that this breaks things from time to time. This happened to use when EL 7.3 came out a few months ago and caused unexpected issues with systems running KVM, libvirt, and LVM2 with large quantities of snapshots (4,480 and counting!).

The first issue that we discovered was virtual machine lockup during live migration. This is related to an MSR_TSC_AUX update that Redhat pushed into 7.3, but for which the Linux 4.1.y stable branches had not yet merged the kernel update to support this. While I’ve not yet tested 4.1.39, it appears to have those patches. Most users will not experience this particular bug if they are using the vendor provided EL7 kernel—but if you are using 4.1 in order to have stable bcache support, then you might run into this. You can read more details on the patches here: https://patchwork.kernel.org/patch/9538171/

Shortly after we discovered the first issue (but before we had time to fix it), we discovered that LUKS passthrough crashes libvirt unless you are using libvirt’s keystore. Since we pass encrypted volumes directly into the virtual machine and let the virtual machine unlock the volume, this was causing endless segmentation faults of libvirtd as systemd restarted it after failure. After much troubleshooting and inspection with GDB to figure out where the problem actually was, we discovered that libvirt was assuming that all LUKS volumes have a key in their keystore. This has been fixed in the latest version, and more information about this is available here: https://bugzilla.redhat.com/show_bug.cgi?id=1411394

Not to be outdone, the 7.2 to 7.3 upgrade was also causing segmentation faults of dmeventd. At the time, we did not know that it was a bug in LVM2—but having a third issue compounded with the two above, it was time for more drastic measures: Revert the packages! After installing the EL7.2 version of libvirt, KVM, and LVM2 (and their dependencies), we were back up and running.

Feeling brave, we decided to try the 7.3 upgrade again today since the first two issues were fixed. At the time, we didn’t really know the third issue was an issue independent of the others, so this was our first opportunity to investigate. This issue is still outstanding, and the actual problem is unclear. We have found the first bad commit (9156c5d dmeventd rework locking code) in LVM2 and posted to the lvm-devel list, so hopefully this will be fixed soon. For the moment we are holding back LVM2 updates which seems to be working fine with the rest of the system packages upgraded to 7.3. You can read more about the beginning of this fix here: https://lvm-devel.redhat.narkive.com/xxKNaNG6/bisect-regression-segv-9156c5d-dmeventd-rework-locking-code

So is it time to 7.3 from 7.2? Yes! But only if you hold back LVM2. The easiest way to do this is to add the following to your /etc/yum.repo.d/CentOS-Base.repo in the [base] and [updates] sections:

exclude=lvm2* device-mapper*

Update: Tue Apr 4 16:25:39 PDT 2017

The LVM problem was related to the reserved_stack value in /etc/lvm/lvm.conf being too high on our system. Somehow this introduced a regression in LVM2 since it certainly worked before in EL7.2 .

So, if you get an error like this, shrink your reserved_stack and see if it fixes the problem:

kernel: dmeventd[28383]: segfault at 7f9477240ea8 ip 00007f9473f24617 sp 00007f9477240eb0 error 6 in liblvm2cmd.so.2.02[7f9473e83000+191000]

-Eric

Dirty COW Fix: (CVE-2016-5195) Root Escalation

October 26, 2016 by ewheeler

DirtyCOW: ptrace strikes again!

Actually, it’s not ptrace’s fault—but having ptrace disabled would mitigate the current proof of concept attacks, and possibly future attacks as well. Linus Torvalds recently fixed a get_dirty_pages race which could be used to escalate to root privileges on all kernels prior to October 13, 2016. This was not pushed into the stable kernels until October 23rd. Today, October 26th, Redhat and other vendors have started propagating these patches in downstream packages. Notably, as of this writing there is no fix for the RHEL/CentOS/Scientific Linux 5.x release, however there is a mitigation available on Redhat’s Bugzilla tracker here:
https://bugzilla.redhat.com/show_bug.cgi?id=1384344#c13

Easy Fix (mitigate) for DirtyCOW CVE-2016-5195 Root Escalation

While the Bugzilla patch above explains what needs to be done, the process is not necessarily straight-forward. This script automates the process for EL5 and derivatives, and it should work more or less on EL6, but only if your devel packages are available for your current kernel. If not, then `yum install kernel` and reboot before running this script; In the EL6 case, there is no need to run this script after a kernel update as long as you are running the latest version noted below.

As noted on the Bugzilla link, “… note that this mitigation disables ptrace functionality which debuggers and programs that inspect other processes (virus scanners) use and thus these programs won’t be operational.”

You can run the script like so:

bash <(wget -qO - https://www.linuxglobal.com/static/blog/harden-dirtycow.sh)

Of course you probably want to review the script before running it, so click here to view:

https://www.linuxglobal.com/static/blog/harden-dirtycow.sh

What is Your Exposure?

You’re only vulnerable to this if a user can execute binaries that they introduce on your server. They might install this binary by compiling it onsite (but you remove gcc, right?), or by uploading a pre-compiled binary. Fortunately this means that environments who trust their users or who have no local users are safe. The bigger problem is with shared hosting environments such as those provided by WHM/cPanel/Plesk/Webmin and others.

These are the summary versions of kernels that contain the DirtyCOW fix:

RHEL/CentOS/Scientific Linux 6.x: 2.6.32-642.6.2
RHEL/CentOS/Scientific Linux 7.x: 3.10.0-327.36.3
Ubuntu 12.04: 3.2.0-113.155
Ubuntu 14.04: 3.13.0-100.147
Ubuntu 16.04: 4.4.0-45.66
Ubuntu 16.10: 4.8.0-26.28
Debian 7: 3.2.82-1
Debian 8: 3.16.36-1+deb8u2

Vanilla kernel longterm releases (or newer) that contain the DirtyCOW fix:

4.4.26
4.1.35
3.18.44
3.16.38
3.12.66
3.10.104
3.4.113
3.2.83

Future Hardening

This is not the first root escalation that has been leveraged by ptrace functionality, as you can see with a quick search for “ptrace root escalation”. I recommend that you turn off ptrace unless you are certain that you need it. Unfortunately the best way to do this is to re-compile your kernel and exclude ptrace support, but that is not always viable. If you’re running EL7 with SELinux enforcing (Check with `getenforce` which should print “Enforcing”), you can simply run this: `setsebool -P deny_ptrace on`.

You can also compile kernel modules such as the systemtap interface used above for the EL5 patch, but of course that requires a rebuild with gcc installed every time you update your kernel. It would be great if there was something in /sys or /proc that could globally disable the sys_ptrace_enter system call, but for the moment that is not an option.

-Eric

PDFtk works on CentOS 7 and RHEL 8!

August 18, 2016June 23, 2023 by ewheeler

Installing PDFtk on CentOS/RHEL/Scientific Linux 7

Update: This procedure works in RHEL/Rocky/Alma/Oracle Linux 8 and Amazon Linux 2023!

In the transition to CentOS 7, the GNU compiler for the Java programming language libgcj was discontinued. This is partially due to it being dropped by the GCC suite. As it turns out, shared library linking of libgcj.so.10 from CentOS 6 is binary compatible with PDFtk. We use PDFtk in our office for collating documents that have been scanned, and it works great!

After searching online and finding lots of links with various levels of success and server admins having used PDFtk for over a decade, we decided to package it and provide it to the community.

Installation is simple, depending on your architecture:

x86_64

yum localinstall https://www.linuxglobal.com/static/blog/pdftk-2.02-1.el7.x86_64.rpm

i686

yum localinstall https://www.linuxglobal.com/static/blog/pdftk-2.02-1.el7.i686.rpm

After Updating

If you are reading this article, then you probably just upgraded your system. Now would be a great time to consider security for your application and server infrastructure. There are many services that we offer including support, security, maintenance, monitoring, backups, and live SQL backup replication. We even offer a security hardened hosting environment!

Please let us know if you have any issues with these packages or if we may be of service!

Update: 2017-01-26

Some have asked for the .spec that we are using. Really are we are doing is repacking libgcj.so.10* which we pulled out of CentOS 6 libgcj-4.4.7-17.el6. PDFtk was downloaded as an RPM from their site unmodified except that we converted it to a tar and added libgcj. You may need to edit the spec to make it build on your system, but it works in our build environment: https://www.linuxglobal.com/static/blog/pdftk.spec

-Eric

Secure package versions by distribution for GHOST CVE-2015-0235 (Debian/Ubuntu/CentOS)

January 28, 2015 by ewheeler

glibc Ghost

Yep. Its a bad one. If an attacker can get your host to do a forward name lookup of their choosing, they may be able to execute arbitrary code. Since libc is linked to (almost) all services in Linux, all services are affected. I expect the first attacks to be against the mail service Exim which will affect all WHM/cPanel users—but that is certainly only the tip of the iceburg. We will see aftershocks from this exploit in unexpected ways for some time to come.

Update your packages ASAP. Here’s a quick reference for the versions you should see on common distributions. glibc 2.19 is not affected.

Reboot after the update, libc stays resident until services restart.

CentOS 5 (RHEL/Scientific Linux 5)

yum install glibc nscd:
glibc 2.5-123.el5_11.1
nscd  2.5-123.el5_11.1

CentOS 6 (RHEL/Scientific Linux 6)

yum install glibc nscd:
glibc 2.12-1.149.el6_6.5
nscd  2.12-1.149.el6_6.5

CentOS 7 (RHEL/Scientific Linux 7)

yum install glibc nscd:
glibc 2.17-55.el7_0.5
nscd  2.17-55.el7_0.5

Debian 6 (squeeze)

apt-get update; apt-get install libc6
libc6  2.11.3-4+deb6u4

Debian 7 (wheezy)

apt-get update; apt-get install libc6
libc6  2.13-38+deb7u7

Debian Testing (jessie)

apt-get update; apt-get install libc6
libc6  2.19-13

Ubuntu 10.04 LTS

apt-get update; apt-get install libc6
libc6  2.11.1-0ubuntu7.20

Ubuntu 12.04 LTS

apt-get update; apt-get install libc6
libc6  2.15-0ubuntu10.10

Ubuntu 14.04 LTS

apt-get update; apt-get install libc6
libc6  2.19-0ubuntu6

Netboot CentOS 7 on ATAoE

December 16, 2014 by ewheeler

This writeup covers the process of configuring CentOS 7 to boot seamlessly from an ATAoE target.

This is our setup:

TFTP server with a MAC-specific boot menu presenting the kernel and initrd. See http://www.syslinux.org/wiki/index.php/PXELINUX
Server exporting an LVM Logical Volume with a CentOS 7 install on it using vblade.
Computer to boot from the network (This computer has no hard drive and is using the exported LV as it’s disk).

Required components:

A kernel that supports AoE and your network card (We are using a custom build 3.17.4 kernel. If you are building your own kernel, the driver for AoE is in Device Drivers —> Block Devices and is called ATA over Ethernet Support. The driver for your network card is in Device Drivers —> Network device support —> Ethernet driver support. The driver for your NIC will be in there somewhere under the appropriate manufacturer).
A custom dracut module to bring up the network and discover AoE targets at boot time.
Packages:
- vblade (This is only required on the server exporting the LV).
- Dracut (dracut, dracut-network, dracut-tools)

Required Steps:

1. Install CentOS 7 onto an LV on the boot server. Since this is not in the scope of this tutorial, I will leave this part up to you. There is plenty of documentation out there (we installed it under KVM first, and exported the resulting disk as an AoE target).
2. Once you have it installed, you need to enter the environment that you deployed in step 1 to install a dracut AoE module and generate an initrd.
3. Next, make sure that you have a kernel that supports aoe and your network card. It is important that both of those kernel modules get built into your initrd or this will not work.
4. At this point (hopefully) we are in the CentOS 7 environment, whether chrooted, virtual machine, or something else; we can now install packages:

1. yum install dracut-network dracut-tools dracut
2. cd /usr/lib/dracut/modules.d/ && ls

You’ll notice that there are a bunch of folders in here, all starting with two digits. The digits signify the order in which the modules are loaded. To make sure all of the prerequisite modules are loaded, we went with 95aoe for our module directory.

3. mkdir 95aoe && cd 95aoe
4. There are 3 files that we need to create — module-setup.sh, parse-aoe.sh, and aoe-up.sh

1. Let’s start with module-setup.sh, since this is the script that will pull into the initrd all of the pieces we need for AoE to work.

#!/bin/bash
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh

check() {
        for i in mknod ip rm bash grep sed awk seq echo mkdir; do
                type -P $i >/dev/null || return 1
        done

        return 0
}

depends() {
        echo network
        return 0
}

installkernel() {
        instmods aoe
}

install() {
        inst_multiple mknod ip rm bash grep sed awk seq echo mkdir

        inst "$moddir/aoe-up.sh" "/sbin/aoe-up"
        inst_hook cmdline 98 "$moddir/parse-aoe.sh"
        dracut_need_initqueue
}

2. Next is parse-aoe.sh. This script will load the modules and queue up our main script to be run by init.

#!/bin/sh

modprobe aoe
udevadm settle --timeout=30
/sbin/initqueue --settled --unique /sbin/aoe-up

3. Finally, we need aoe-up.sh, which is what is actually run by init to bring up the network interfaces and discover the AoE device being exported. It’s worth noting that I am setting the MTU to 9000 on each interface, which won’t necessarily be supported on your system. if you are unsure, remove the line “ip link set dev ${INTERFACES[$i]} mtu 9000”:

#!/bin/bash

PATH=/usr/sbin:/usr/bin:/sbin:/bin

exec >>/run/initramfs/loginit.pipe 2>>/run/initramfs/loginit.pipe

mkdir -p /dev/etherd
rm -f /dev/etherd/discover
mknod /dev/etherd/discover c 152 3

INTERFACES=(`ip link |grep BROADCAST |awk -F ":" '{print $2}' |sed 's/ //g' |sed 's/\n/ /g'`)
TOTAL_INTERFACES=${#INTERFACES[@]}

for i in `seq 0 $(($TOTAL_INTERFACES-1))`; do
    ip link set dev ${INTERFACES[$i]} mtu 9000
    ip link set ${INTERFACES[$i]} up
    wait_for_if_up ${INTERFACES[$i]}
done

ip link show

echo > /dev/etherd/discover

5. Now that we have written our AoE dracut module, we need to rebuild the initrd. Before we do this however, we need to make sure that dracut will pull in the modules we need (aoe and your NIC module). There are a few different ways to do this, but here are two options:

1. modprobe aoe && modprobe NIC_MODULE && dracut –force /boot/initramfs-KERNEL_VER.img KERNEL_VER
2. dracut –force –add-drivers aoe –add-drivers NIC_MODULE /boot/initramfs-KERNEL_VER.img KERNEL_VER

6. Now that we have our new initrd, we need to transfer the kernel and initrd to our tftp server

1. scp /boot/initramfs-KERNEL_VER.img /boot/vmlinuz-KERNEL_VER TFTP_SERVER:/path/to/tftp
2. Make sure that permissions are set correctly (should be chmod 644)

7. Before we shut down our CentOS 7 VM, there is one last bit of configuration we need to adjust. Since we require the network to be active to do anything with our FS, we need to make sure that on shutdown, the network isn’t brought down before the FS is unmounted. In CentOS 6, this was as easy as running `chkconfig –level 0123456 on`. CentOS 7 uses systemd, and so this solution will not work. Fortunately the solution is simple (and probably works in CentOS 6 too): Modify /etc/fstab, adding the option _netdev to each mount point that is being exported with AoE (e.g. change defaults to defaults,_netdev).
8. Now we need to get vblade (http://sourceforge.net/projects/aoetools/files/vblade/) and put it on the system exporting the LV with CentOS 7 on it. If you don’t want to install it onto your system, you can just make it and run it from where you extracted it. To export the disk, just run:

1. vbladed -b 1024 -m MAC 0 1 INTERFACE /path/to/disk (for information on what each command does, check out the vblade man page, which is included with the package)

And that’s it! I recommend at this point to run dracut on your netbooted hardware to make sure everything still loads (at this point you shouldn’t have to specifically install the AoE module or your NIC module, so make sure that this is true). Don’t forget to copy the newly created initrd to the TFTP server, and I recommend not overwriting your working initrd so that you can easily go back to a known working state.

What took me days to get working should now only take you an hour or so! I tried to be as detailed as I could, and I don’t think I left anything out, but if you have any issues, please let me know and I’ll update this guide accordingly.

Show the virtual machine name in dstat instead of showing qemu

November 7, 2014 by ewheeler

Do you run dstat to watch Linux KVM hypervisors, but wish process names showed virtual machine names? Me too.

This patch does just that:

--- a/usr/bin/dstat	2009-11-24 01:30:11.000000000 -0800
+++ b/usr/bin/dstat	2014-11-07 10:20:09.719148833 -0800
@@ -1946,6 +1946,12 @@
         return os.path.basename(name)
     return name

+def index_containing_substring(the_list, substring):
+	for i, s in enumerate(the_list):
+		if substring in s:
+			return i
+	return -1
+
 def getnamebypid(pid, name):
     ret = None
     try:
@@ -1956,6 +1962,10 @@
         if ret.startswith('-'):
             ret = basename(cmdline[-2])
             if ret.startswith('-'): raise
+        if any("qemu" in s for s in cmdline):
+            idx = index_containing_substring(cmdline, '-name')
+            if idx >= 0:
+                ret = cmdline[idx+1]
         if not ret: raise
     except:
         ret = basename(name)