Docker or KVM: Which is Right for You?

Over the years there have been many different technologies to isolate workloads. Isolation is important for security because if one workload is compromised, and they are not isolated, then others can be affected. In today’s ecosystem, there are two predominant forms of workload isolation: containers and virtual machines.

Containers

Containers are similar to chroot jail in that all of the programs running within the container are executed in a way that they believe they have their own root file system. Linux namespaces allow the container to have its own process ID space, so `init` can be process ID 1, whereas, with chroot jails, the namespace was shared, so processes in the jail could not have a process ID of 1 since the host OS `init` process was already using process ID 1.

Containers share the same kernel and they do not have direct access to hardware resources.

 

Virtual Machines

Virtual machines are an emulated hardware environment provided by KVM. They boot their own kernel, have their own disks and attach network devices. If a user has full control over

a virtual machine, then they can install any operating system they wish. Because the hardware is virtualized and running a separate kernel, virtual machines provide greater isolation than containers since they do not share the same kernel. The isolation is provided by hardware optimizations implemented in silicon by CPU manufacturers. This makes

it more difficult to escape a virtual machine environment than a container environment. You might ask: But what about branch prediction attacks, like Spectre?

In this case, branch prediction attacks equally affect containers and virtual machines so we can exclude that as a consideration for choosing containers or virtual machines.

Root File System

In practice, the operating systems running within these isolation technologies both operate from their own root file systems. Traditionally this was a complete distribution installation, however, that has changed in a way that hinders security and increases the difficulty of systems administration. There is a trend of “turn-key” operating system deployments, especially in Docker. If you want a particular application, let’s say, a web server running Word Press, then you simply run a few short commands and your Word Press server is up and running. This makes it easy to install for the novice user, but there is no guarantee that the Docker environment is up to date.

Further compounding container deployment security is the fact that some containers do not have a complete root file system and administrators cannot log in at all. Some would say this is good for security, but this type of monolithic container is still subject to the increasing likelihood of new attack vectors against an aging codebase. If a vulnerability does come along, then the monolithic container can become compromised. Since it can be difficult to log into this kind of container, it is harder to inspect what is happening from within the environment– and even if you can log in, the installation is so minimal that the toolset for inspecting the problem is not available, and the deployment may be so old that even if the container includes a package manager like Yum or APT, the distribution repositories may have been archived and are no longer available without additional effort.

Container intrusions can often be inspected from the outside using a privileged installation with configurable tooling, but the security issues and increased difficulty of maintenance are a counterindication for today’s containerized counter culture.

Our recommendation is always to install a long-term support release of a well known distribution in a virtual machine instead of a container. As a full virtual machine, not only do you get increased isolation, vendor updates, and a better security life cycle, but you also get increased management tooling such as live migration, full block device disks that can be cloned and mounted on other systems or snapshotted with easy rollback.

If you must use containers for your environment, then please use a normal OS distribution, configure security updates and email notifications and centralized logging. This will go a long way to making the system maintainable in the future and save you support costs.

If you are interested in learning more, then call us for a free consultation, so we can help work out what is best for your organization.

-Eric

RedHat/CentOS/RHEL 7 does not copy mdadm.conf into Dracut

Force MD and LUKS Auto-Detection

There is a bug in RedHat 7 releases for some systems when md is used that prevents booting. For some reason it does not copy mdadm.conf into the initrd generated by dracut. The fix recommended on the bug page (https://bugzilla.redhat.com/show_bug.cgi?id=1015204) recommends adding rd.md.uuid=<UUID> but that can be alot of work if you have many volumes. In addition, if you cannot paste the UUID then it is hard to type.

To automatically enable md and luks detection, add “rd.auto=1” to the kernel command line. You can see other command line options in the dracut documentation here: https://www.man7.org/linux/man-pages/man7/dracut.cmdline.7.html

Force Docker to Boot Container into One Program

Specifying the Program that Docker Should Launch at Start Time

Sometimes if you are working with a container that gets stuck in a boot loop you need to force it to start a specific application for debugging purposes. We had this problem using the Dropbox container that was build by  janeczku.

It works great most of the time, but during one of the Dropbox updates, the container would not start so it would start and stop rapidly. This is a persistent container that comes up each time the server boots so we need to start it into something like `sleep 30m` so that we can run `docker -it exec dropbox /bin/bash` and inspect to see what the problem is.

Modifying the Configuration

In /var/lib/dropbox/<hash>/config.v2.json you will see an XML file similar to the one below, but notably, it has not been pretty printed as we have done for you. Somewhere in the list you will find the “Entrypoint” setting. Our system came with this set as “Entrypoint” : [ “/root/run” ], but /root/run is the script that is exiting and causing the container to boot loop. For our example, we have moved the “Entrypoint” line in our config at the top so you can see it, though yours can be anywhere in the file. Note that the array is the command to run followed by its list of arguments and we set it to /bin/sleep 30m as you can see below so that we can log into the container and get a bash prompt.

Note that the Docker service needs to be stopped before modifying the config file. In our case there was only one container so stopping the service was not an issue; if you have multiple containers then you may need to find a way to modify the config so that Docker will accept it without a restart—and if you do find out how to do that please post it in the comments below.

I hope this works for you, we searched all over the place and had trouble finding these configuration details.

{
   "Entrypoint" : [
      "/bin/sleep",
      "30m"
   ],
   "NetworkSettings" : {
      "SecondaryIPAddresses" : null,
      "SandboxID" : "cdb37ab22aecc2d6bbde325c515c034f2aa3a2677f39df75c77712ebe2e9c545",
      "SecondaryIPv6Addresses" : null,
      "SandboxKey" : "/var/run/docker/netns/cdb37ab22aec",
      "LinkLocalIPv6Address" : "",
      "Bridge" : "",
      "HasSwarmEndpoint" : false,
      "Service" : null,
      "Ports" : null,
      "Networks" : {
         "bridge" : {
            "IPAMOperational" : false,
            "EndpointID" : "",
            "GlobalIPv6PrefixLen" : 0,
            "IPPrefixLen" : 0,
            "IPAMConfig" : null,
            "IPAddress" : "",
            "IPv6Gateway" : "",
            "Aliases" : null,
            "Links" : null,
            "NetworkID" : "237c33c976ab7863c1a869b2f19492ecdd4d820763b7033cc50397147b26d324",
            "MacAddress" : "",
            "GlobalIPv6Address" : "",
            "Gateway" : ""
         }
      },
      "HairpinMode" : false,
      "LinkLocalIPv6PrefixLen" : 0,
      "IsAnonymousEndpoint" : false
   },
   "SeccompProfile" : "",
   "RestartCount" : 0,
   "HostsPath" : "/var/lib/docker/containers/18ea58191394f07d288383cc66c5ea58d99c0dac8c9c5007158b9a2378d6b66e/hosts",
   "ExposedPorts" : {
      "17500/tcp" : {}
   },
   "HostnamePath" : "/var/lib/docker/containers/18ea58191394f07d288383cc66c5ea58d99c0dac8c9c5007158b9a2378d6b66e/hostname",
   "MountLabel" : "system_u:object_r:svirt_sandbox_file_t:s0:c57,c401",
   "HasBeenStartedBefore" : true,
   "Labels" : {},
   "NoNewPrivileges" : false,
   "OpenStdin" : false,
   "Hostname" : "18ea58191394",
   "Volumes" : {
      "/dbox/Dropbox" : {},
      "/dbox/.dropbox" : {}
   },
   "MountPoints" : {
      "/dbox/Dropbox" : {
         "Spec" : {
            "Target" : "/dbox/Dropbox/",
            "Type" : "bind",
            "Source" : "/data/Dropbox (GPI)/"
         },
         "RW" : true,
         "Destination" : "/dbox/Dropbox",
         "Type" : "bind",
         "Propagation" : "rprivate",
         "Source" : "/data/Dropbox (GPI)",
         "Driver" : "",
         "Name" : ""
      },
      "/dbox/.dropbox" : {
         "Spec" : {
            "Target" : "/dbox/.dropbox",
            "Type" : "volume"
         },
         "Type" : "volume",
         "RW" : true,
         "Driver" : "local",
         "Destination" : "/dbox/.dropbox",
         "Source" : "",
         "Name" : "eb96b3bab31838496f2ca3ce0b6db476aaba9c16d9b6bc0f7da3d05f1f964120"
      }
   },
   "Env" : [
      "DBOX_UID=501",
      "DBOX_GID=1011",
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "DEBIAN_FRONTEND=noninteractive"
   ],
   "StdinOnce" : false,
   "ArgsEscaped" : true,
   "ResolvConfPath" : "/var/lib/docker/containers/18ea58191394f07d288383cc66c5ea58d99c0dac8c9c5007158b9a2378d6b66e/resolv.conf",
   "HasBeenManuallyStopped" : false,
   "Driver" : "overlay2",
   "AttachStdout" : false,
   "User" : "root",
   "ProcessLabel" : "system_u:system_r:svirt_lxc_net_t:s0:c57,c401",
   "Cmd" : null,
   "AttachStdin" : false,
   "AttachStderr" : false,
   "SecretReferences" : null,
   "AppArmorProfile" : "",
   "ShmPath" : "/var/lib/docker/containers/18ea58191394f07d288383cc66c5ea58d99c0dac8c9c5007158b9a2378d6b66e/shm",
   "Image" : "sha256:a8964074d4f6eac2dfdbf03200c4c73d571b1ea7ad8fcb8d99b918642de2f8d2",
   "Tty" : false,
   "LogPath" : "",
   "Domainname" : "",
   "WorkingDir" : "/dbox/Dropbox",
   "OnBuild" : null,
   "Name" : "/dropbox"
}

-Eric

Apache Performance Tuning

Understanding Apache Tuning

When Apache gets flooded with connections, sometimes the server will stop responding to requests even though the CPU isn’t maxed out and the disk IO is not a problem. You can think of the limiting metrics in Apache like a onion which are ordered as follows:

ServerLimit

This is the number of Apache connections that can be established simultaneously. If you have a large connection load for static content, but there are not enough connections to service the static content, then dynamic content may get starved by serving static content. You want to make sure that your connection limit is high enough to service all connections so that static content can be served quickly.

MaxRequestWorkers

This is the maximum number of requests to the server. Any requests over this value are queued. This number should be smaller than the ServerLimit, and probably should be set to some multiple of the number of CPU cores on your system. Keep raising this value until your CPU is maxed out.

FcgidMinProcessesPerClass

If you are using FastCGI (fcgid) then there are two tuneables that you need to pay attention to. The first is the minimum number of FastCGI processes that are to be spawned when Apache starts. These are ready and waiting to handle new requests. You can set this value as low as 0, but then a new CGI process (typically PHP) will need to be spawned with the first new connection. By having them wait, they are available to service the request immediately which minimizes PHP startup time. On newer versions of PHP that support opcode caching (built-in or using something like xcache), the cache is hot in the running process.

FcgidMaxProcessesPerClass

This is the maximum limit for the number of FastCGI processes. You probably want this to be a multiple of your CPU count but small enough that you do not run out of memory. If you have plenty of memory and your CPU is not saturated, then increase the maximum limit to handle as many PHP processes as possible.

MaxConnectionsPerChild

This is the maximum number of connections a child will handle before being reaped. If child processes have memory leak issues, this will limit its impact on the server.

KeepAlive

This allows connections to persist and handle multiple requests.

KeepAliveTimeout

If KeepAlive is enabled, this sets how long (in seconds) a connection should persist. If the timeout is too long then a connection to the web server is open but not being used which will push your server connection count toward your ServerLimit. A range from 3-60 seconds is probably reasonable, but test for your configuration. If you have severe contention because of lots of contention and KeepAlive‘s are starving the server, then you can disable KeepAlive.

MaxKeepAliveRequests

If KeepAlive is enabled, this sets the maximum number of requests a single connection can handle before closing.

Timeout

This sets the maximum time (in seconds) a request can be made before it gives up and times out.

We have tuned many Apache servers, so let us know if you need help!

-Eric

Fixed Versions: Linux SACK Attack – Denial of Service

The recently published CVE-2019-11477 and CVE-2019-11478 attacks enable an attacker with access to a TCP port on your server (most everyone, including those with web or mail servers) to either:

  1. Slow it down severely
  2. Cause a kernel crash

See the NIST publication for more detail:

https://nvd.nist.gov/vuln/detail/CVE-2019-11477

Upstream distributions have released fixes for these as follows.  The 2019-11478 vulnerability is an issue as well, but the -11477 issue has higher impact so we are listing it here.  So far as I have seen, the fix for both is in the same package version so you only need to reference the -11477 articles:

Mitigation

You can mitigate this attack with iptables. If you are using fwtree, our latest release for el6 and el7 includes the mitigation (version 1.0.1-70 or newer). Of course it is best to update your kernel, but this provides a quick fix without rebooting:

# [ -d /etc/fwtree.d ] && yum install -y fwtree && systemctl reload fwtree && iptables-save | grep MITIGATIONS

You can also do it directly with iptables:

# iptables -I INPUT -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP
# ip6tables -I INPUT -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP

You can also disable TCP selective acks in sysctl:

# Add this to /etc/sysctl.conf
net.ipv4.tcp_sack=0

Red Hat / CentOS / Scientific Linux

Vendor security article: https://access.redhat.com/security/cve/cve-2019-11477

Fixed Versions

  • el5: not vulnerable (and EOL, so upgrade already!)
  • el6: kernel-2.6.32-754.15.3.el6
  • el7:  kernel-3.10.0-957.21.3.el7

Ubuntu

Vendor security article: https://usn.ubuntu.com/4017-1/

Fixed Versions

  • Ubuntu 19.04
    • 5.0.0.1008.8
  • Ubuntu 18.10
    • 4.18.0.22.23
  • Ubuntu 18.04 LTS
    • 4.15.0-52.56
  • Ubuntu 16.04 LTS
    • 4.15.0-52.56~16.04.1
    • 4.4.0-151.178

Debian

Vendor security article: https://security-tracker.debian.org/tracker/CVE-2019-11477

Fixed Versions

  • jessie
    • 3.16.68-2
    • 4.9.168-1+deb9u3~deb8u1
  • stretch
    • 4.9.168-1+deb9u3
  • sid
    • 4.19.37-4

SuSE

Vendor security article: https://www.suse.com/security/cve/CVE-2019-11477/

Fixed Versions

For SuSE, there are too many minor version releases to list them all here. To generalize, if you are running a newer kernel than these then you are probably okay, but double-check the vendor security article for your specific release and use case:

  • Pre SLES-15:
    • 3.12.61-52.154.1
    • 4.4.121-92.114.1
    • 4.4.180-94.97.1
    • 4.12.14-95.19.1
  • SLES 15
    • 4.12.14-150.22.1
  • Leap 15
    • 4.12.14-lp150.12.64.1

Vanilla Upstream Kernel (kernel.org)

Security patch: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3b4929f65b0d8249f19a50245cd88ed1a2f78cff

Fixed Versions

  • 5.1.11
  • 4.19.52
  • 4.14.122
  • 4.9.182
  • 4.4.182
  • 3.16.69

[FIXED] Libvirtd QEMU / KVM monitor unexpectedly closed – failed to create chardev during live migration or virsh start

Fixing Libvirt/QEMU KVM Permission Errors in RHEL 7/CentOS 7

If you get errors like these while trying to live-migrate a virtual machine or run `virsh start`, then there is a simple fix.  If this is a live migration, the fix probably needs to be applied to the destination, but updating both sides is a good idea.

libvirtd: error : qemuMonitorIORead:610 : Unable to read from monitor: Connection reset by peer
libvirtd: error : qemuProcessReportLogError:1912 : internal error: qemu unexpectedly closed the monitor: qemu-kvm: -chardev pty,id=charserial0: Failed to create chardev
libvirtd: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor

A Simple Fix

Just add this to fstab:

devpts     /dev/pts devpts     gid=5,mode=620     0 0

then remount:

mount -o remount,rw /dev/pts

-Eric

Server Security Update Best Practices

Securing the Server with Update Patches

In any environment it is important to keep systems up to date with security patches that fix vulnerabilities. In large deployments with many use cases, you may have application requirements that depend on certain versions of packages being installed and upgrading those packages could create undesired side effects. There are 3 basic ways to manage updates in this case in order to balance security patching with usability.

Before you start

In all cases it is a good idea to prioritize based on severity. Vendors typically publish how important the vulnerability is and how broad its exposure may be, and by reviewing the security notes for the update you can decide whether or not the vulnerability affects your implementation.

Always test deployment of the update in an environment intended to replicate your application requirements before applying them to production systems. If there are any problems, solving them in the test environment will make it easier to apply to production and minimize downtime. It is a good idea to schedule downtime or a maintenance window to keep you from surprising end-users with server interruption.

Keep a complete backup of your operating system or use snapshots to roll back to an earlier version in case something breaks during an update.

Update daily and follow the latest release

Updating all packages is the easiest thing to do. Unfortunately this can have problems if specific versions of software are needed, so it could break functionality.

Exclude critical packages from being updated and install all updates

This assumes that you know which packages are critical and should not be updated. By excluding those packages from the update process, the rest of the system can remain up to date. However, excluding a package from update could cause package dependencies to break. If this happens, it is likely that no package will install at all because it will not be able to install dependencies, so you will need to watch your update logs to make sure they are successful or alert on failure.

Review release information and only install packages that require security updates

This requires additional administrative overhead, but allows you to precisely update the packages that you need to maintain security while keeping all other packages at their current version. Sometimes an updated package requires a dependency; dependency chains can be quite long if the updated package is from a newer minor release of the operating system (for example, applying a version 7.6 patch to a 7.4 system). In these cases, it is sometimes possible to download the .src.rpm package and rebuild it on the earlier release platform (7.4) so that the newer rebuilt package will install cleanly in an older environment.

 

 

Scan Servers For SSH Password Authentication

Make Sure Password Authentication is Off

Sometimes it is useful to find out what hosts on your network allow Password authentication so that you can turn it off. Here is a simple script to facilitate that. Create a file called “iplist” filled with each IP that you wish to test and then run the script below. Optionally, you can set ‘password’ to something you expect to work and it will tell you if it authenticated, or if it only asked for a password but failed to authenticate:

#!/bin/bash

password=mypassword

while read HOST; do
echo "echo '$password'; echo >&2 '$HOST asked for password'; rm \$0" > /dev/shm/askpass
chmod 755 askpass
export SSH_ASKPASS="/dev/shm/askpass"
export DISPLAY=x
setsid ssh -o PreferredAuthentications=password,keyboard-interactive -o StrictHostKeyChecking=no -o ConnectTimeout=3 -o GSSAPIAuthentication=no $HOST "echo $HOST logged in" < /dev/null
done < iplist

-Eric

Debug spinning PHP script on a WHM/cPanel Server

Getting A Backtrace

Sometimes PHP will spin at 100% CPU and it is difficult to figure out why. The `strace` command is too noisy, and without knowing where in the code there is a problem, you cannot insert your own backtrace. The newer version of WHM has support for multiple PHP versions, so make sure you run this for whatever PHP version the site is using. In our case, this is using php-fpm.

First, install xdebug:

/opt/cpanel/ea-php72/root/usr/bin/pecl install xdebug

After that, follow the instructions here: https://stackoverflow.com/questions/14261821/get-a-stack-trace-of-a-hung-php-script#53056294

Basically you just need to run the following:

gdb --batch --readnever --pid=$pid --command=/tmp/dumpstack.gdbscript

And the content of dumpstack.gdbscript is:

set $xstack = ((long**)&xdebug_globals)[2]
if ($xstack !=0) && ($xstack[0]!=0)
set $pcurrent = (long*)$xstack[0]
while $pcurrent
set $xptr = (long*)$pcurrent[0]
set $xptr_s = (char**)$xptr
set $xptr_i = (int*)$xptr
set $filename = $xptr_s[4]
set $funcname = $xptr_s[1]
set $linenum = $xptr_i[10]
if ($funcname!=0)
printf "%s@%s:%d\\n", $funcname, $filename, $linenum
else
printf "global@%s:%d\\n", $filename, $linenum
end
set $pnext = (long*)$pcurrent[2]
set $pcurrent = $pnext
end
else
printf "no stack"
end

Fix LVM Thin Can’t create snapshot, Failed to suspend vg01/pool0 with queued messages

Fix LVM Thin Snapshot Creation Errors

From time to time you might see errors like the following:

~]# lvcreate -s -n foo-snap data/foo
Can’t create snapshot bar-snap as origin bar is not suspended.
Failed to suspend vg01/pool0 with queued messages.

You will note that foo and bar have nothing to do with each other, but the error message prevents creating additional thin volumes. While the cause is unknown, the fix is easy. Something caused LVM to try to create an LVM that it was unable to complete, so it generates this in its metadata:

message1 {
create = "bar-snap"
}

The Fix

  1. deactivate the thinpool
  2. dump the VG metadata
  3. backup the file
  4. remove the message1 section
  5. restore the metadata.

The Procedure

  • vgcfgbackup -f /tmp/pool0-current vg01
  • cp /tmp/pool0-current /tmp/pool0-current-orig # backup the file before making changes
  • vim /tmp/pool0-current # remove the message1 section in vg01 -> logical_volumes -> pool0
  • vgcfgrestore -f /tmp/pool0-current vg01 –force

Hopefully this works for you, and hopefully whatever causes this gets fixed upstream.