FIPS: Failed to start Cryptography Setup

Enabling FIPS mode for CentOS 7 changes the way that the kernel initramfs loads crypto modules. If you simply enable FIPS mode with fips=1 on the kernel command line, then it will fail to boot with an error message like the following:

[FAILED] Failed to start Cryptography Setup for luks-....

FIPS LUKS Error
Settings fips=1 on the kernel commandline causes this error.

After digging a little bit deeper in the logs, you might find the following:

Libgcrypt error: integrity check using `/lib64/.libgcrypt.so.11.hmac' failed: No such file or directory

This is because Dracut is not packaging the .hmac file when it builds the initramfs, so you have to yum install dracut-fips-aesni and then rebuild the initramfs with dracut --force . Be sure you are running the latest kernel version, because by default Dracut will build the initramfs for the kernel that you are running, so if there is a new version available, then it will load if you reboot without the .hmac file.

If you do not have hardware AES support, then you can omit -aesni and install dracut-fips. Even if you do not have hardware support, however, installing the aesni version should still work, but without the performance boost.

Once enabling FIPS mode, we discovered on a CentOS 7 install that the drbg kernel module was not loaded which prevents aes-xts-plain64 formatted LUKS volumes (and possibly others) from being activated by cryptsetup. To fix this, add the following to your kernel commandline: rd.driver.post=drbg .  This problem is evident if you see the error error allocating crypto tfm at boot time or in the Dracut journal.

Finally, it is common to mount the boot partition on a different volume. If this is the case, then Dracut in FIPS mode will require the .hmac for vmlinuz and may give an error like the following: /boot/.vmlinuz-3.10.0-693.21.1-el7.x86_64.hmac does not exist . To fix this, specify the boot partition so that Dracut will mount it before validation with one of these options:

boot=<boot device>
specify the device, where /boot is located. e.g.
boot=/dev/sda1
boot=/dev/disk/by-path/pci-0000:00:1f.1-scsi-0:0:1:0-part1
boot=UUID=<uuid>
boot=LABEL=<label>

Red Hat has documentation about this problem here: https://bugzilla.redhat.com/show_bug.cgi?id=1014527#c7

I hope this helps, when you are done you will have the following added to your kernel commandline:

fips=1 rd.driver.post=drbg boot=/dev/sdaX

You will of course need to specify the correct boot volume for your machine.

-Eric

Choosing Linux Server Hardware

Hardware Options

Let’s assume that you are trying to decide between two different servers:

  • Server 1: 8 cores / 16 threads at 2.1GHz
  • Server 2: 4 cores / 8 threads at 3.8GHz

Considering Your Options

Choosing a server will depend on your workload. If you know you will be running lots of simultaneous connections and that each individual connection does not have a low latency requirement, then the first server would work best because it can handle more parallel processing. On the other hand, if you have fewer concurrent connections and each connection must complete quickly, then the second server is better.

We tend to prefer higher clock speed to higher core count because operations complete faster. For example, PHP pages will load almost 2x faster if the processors are not saturated. If they do saturate to somewhere between 150% to 180% of load, then it will run at about the same speed as server 1 based on the clock speeds and a 10% guess on context switch overhead.

We always try to build with redundancy in mind for long-term stability, so these are some considerations when thinking about your deployment:

  • You might get two identical servers. We could then configure them to be redundant so that either server can take over if one fails.
  • You could get your own routable network block so you can have a DMZ separate from your provider’s shared public network. This will reduce your clients’ security exposure to man-in-the-middle attacks.
  • If you get a routable network, then firewalls can be configured as separate virtual instances on the server(s) that will automatically recover if one of them has a problem.

 

Choosing the right hardware is important whether you are moving an existing server or deploying a new one, so please let us know if we can help you with your server planning.

-Eric

Linux Keeps Amazing Uptimes

I’ve seen lots of great uptimes, but I don’t remember seeing one over 1000 days.  More than 3 years and counting, awesome:

~]# w
 22:53:44 up 1129 days, 6:22, 2 users, load average: 0.52, 0.73, 0.57
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root tty1 - 21Jan15 1129days 0.27s 0.27s -bash
root pts/0 support.ewheeler 22:53 0.00s 0.01s 0.00s w

 ~]# uname -a
Linux progress-quest.localdomain 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 21:36:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

~]# ifconfig 
eth0 Link encap:Ethernet HWaddr 00:0C:29:00:00:01
 inet addr:10.11.11.11 Bcast:10.11.255.255 Mask:255.255.0.0
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:534204074 errors:0 dropped:0 overruns:0 frame:0
 TX packets:589891379 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000 
 RX bytes:115062111449 (107.1 GiB) TX bytes:111461390980 (103.8 GiB)

lo Link encap:Local Loopback 
 inet addr:127.0.0.1 Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING MTU:16436 Metric:1
 RX packets:18149132 errors:0 dropped:0 overruns:0 frame:0
 TX packets:18149132 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0 
 RX bytes:2011869682 (1.8 GiB) TX bytes:2011869682 (1.8 GiB)

What are your greatest uptimes?  Please post in the comments!

-Eric

Check Authorize.net TLS 1.2 Support: tlsv1 alert protocol version

TLS v1.0 and v1.1 to be Disabled on February 28th, 2018

As you may be aware, Authorize.net is disabling TLS v1.0 and v1.1 at the end of this month.  More information about the disablement schedule is available here.

You may begin to see errors like the following if you have not already updated your system:

error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version

We can help you solve this issue as well as provide security hardening or PCI compliance service for your server. Please call or email if we may be of service!

Checking for TLS v1.2 Support

Most modern Linux releases support TLS v1.2, however, it would be best to check to avoid a surprise. These tests should work on most any Linux version including SUSE, Red Hat, CentOS, Debian, Ubuntu, and many others.

PHP

To check your server, you can use this simple PHP script. Make sure you are running this PHP code from the same PHP executable that runs your website. For example, you might have PHP compiled from source and also have it installed as a package. In some cases, one will work and the other will not:

<?php
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://apitest.authorize.net');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

if (($response = curl_exec($ch)) === false) {
 $error = curl_error($ch);
 print "$error\n";
}
else {
 $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
 print "TLS OK: " . strlen($response) . " bytes received ($httpcode).\n";
}

curl_close($ch);
?>

Perl

As above, make sure that you are using the same Perl interpreter that your production site is using or you can end up with a false positive/false negative test. If you get output saying “403 – Forbidden: Access is denied” then it is working because TLS connected successfully.

# perl -MLWP::UserAgent -e 'print LWP::UserAgent->new->get("https://apitest.authorize.net")->decoded_content'
Can't connect to apitest.authorize.net:443

LWP::Protocol::https::Socket: SSL connect attempt failed with unknown errorerror:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version at /usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol/http.pm line 57.

OpenSSL/Generic

To check from the command line without PHP, you can use the following which shows a failed TLS negotiation:

# openssl s_client -connect apitest.authorize.net:443
CONNECTED(00000003)
30371:error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version:s23_clnt.c:605

Other Languages

If you use any language, we can help verify that your application is set up to work correctly.  Just let us know and we can work with you directly.  I hope this post helps, please comment below!

-Eric

Spectre / Meltdown: Security Updates by Distribution

Secured Kernel Versions by Distribution and Version

CentOS/RHEL/Scientific Linux

  • Version 7.x: 3.10.0-693.11.6
  • Version 6.x: 2.6.32-696.18.7

Debian

  • Sid: 4.14.12-2
  • Stretch: 4.9.65-3+deb9u2
  • Jessie: 3.16.51-3+deb8u1
  • Wheezy: 3.2.96-3

Ubuntu

Testing kernels are available here: ppa:canonical-kernel-team/pti

  • Trusty 14.04: 4.4.0-108.131~14.04.1 (linux-lts-xenial)
  • Xenial 16.04: 4.4.0-108.131
  • Artful 17.10: 4.13.0-24.28

SUSE

Arch Linux

  • 4.14.11-1
  • 4.9.75-1

Vanilla from Kernel.org

These are per stable branch release.  Eg, 4.9.75 is protected, but that doesn’t mean 4.14.0 is.  The minor ‘y‘ of 4.x.y is the important number here:

  • >= 4.15-rc6
  • >= 4.14.11
  • >= 4.9.75
  • >= 4.4.110
  • >= 3.2.98

Meltdown BUG: What about KVM/Xen/Docker/OpenVZ/LXC/PV-Xen/HyperV?

Different Variants: Meltdown and Spectre

This article discusses only Meltdown and its affect on hypervisor environments since it is the easiest to implement.  Note that Spectre is capable leaking hypervisor memory from all hypervisors running on affected processors (Intel and possibly AMD, ARM) but it is both more difficult to exploit and to mitigate.  Please read on to understand how Meltdown affects your virtualization stack:

How Meltdown Affects Virtualized Environments

Every hosting provider held their breath over the past week wondering if the as-of-yet undisclosed Intel hardware bug now released as “Meltdown” would affect their visualization stack. They all want to know: is this a hypervisor escalation!?  Here in this post we use the word “affected” meaning guest-to-hypervisor memory read access.

The Meltdown bug enables reading memory from address space represented by the same pagetable—anyone using virtual page tables is unaffected between virtual tables.  That is, Guest-to-Host pagetables are unaffected, only Guest-to-Guest or Host-to-Host, and of course Host-to-Guest since the host can already access the guest pages.

For a hosting provider this means different customer VMs on the same fully-virtualized hypervisor cannot access each others’ data—but—different users on the same guest instance can access each others’ data.  This latter part holds true for non-virtualized hardware as well: users under the same OS kernel can access each others’ data.  Thus, containers are affected!

Which Technologies Are Affected?

Fully virtualized technologies are not affected in the sense that guests cannot access host (hypervisor) memory.  However, an unprivileged guest process can still access privileged (and other unprivileged) guest process memory pages.  Container-based technologies are affected by Meltdown across container boundaries.

Affected Virtualization Technologies

Anything container based: neighbor containers can read other neighbor containers process memory.

  • Docker
  • LXC
  • OpenVZ
  • UML
  • Paravirtual Xen
  • Chroot Jails

Unaffected Virtualization Technologies

Any fully virtualized technology is unaffected.

  • KVM
  • Xen HVM
  • HyperV
  • VirtualBox (if using VT)

Solutions

  1. Update your distribution kernel if your OS distribution has released an update for CVE-2017-5715, CVE-2017-5753, and CVE-2017-5754. See this post for updated distribution kernel versions that address these CVEs: https://www.linuxglobal.com/spectre-meltdown-security-updates-distribution/
  2. If you cannot do #1, your best option is to install Linux 4.15-rc6 or one of the supported vanilla kernel patches in the link above.  On all systems.  Yes, 4.15-rc6 is a release candidate, but this kernel is receiving wide spread testing because of this bug.
  3. If this is not an option and you mostly trust the code running inside of the container, then you could run your container instances under KVM to isolate them from eachother to protect your guests and privileged container.
  4. If running Xen-PV, switch everything to Xen-HVM and hope for the best.  Many operating systems will boot in either environment unless your guest kernel was built specific to Xen PV—but there could be driver issues between the two.
  5. If you do not trust your users on a single host, then your best option is #1 above.

Remember, the only real fix is to install an updated kernel on all servers, physical or virtual.  Solutions 3and 4 only mitigate the problem since the guest is still vulnerable to interprocess memory reads.

Help!

We can help!  Just give us a call or send an email so we can make a plan and get you running secure, once again!

-Eric

 

 

 

Recovering from a Cloud Server Failure

If you have received an email or a support ticket from your provider like the message below, then chances are your data is no longer available. We may be able to work with your provider to recover data and would be happy to help if this could be done. Cloud Servers fail for a number of reasons, and not all failures cause permanent data loss.

Your cloud server could not be recovered due to the failure of its host. You have the option to rebuild your server from your most recent server image or from a stock image … We apologize for any inconvenience this may have caused you.

If there is permanent data loss, then your only option is to plan for the future and move ahead—and we can help with that too.

Backups are certainly useful for recovering data, but they take a long time to put back into service. As you deploy your new environment, consider redundant designs for your infrastructure and let us know if we can help prevent this in the future!

-Eric

Protect Your Server from Ransomware

Protect Your Business Servers from Ransomware!

The fundamental problem exploited by Ransomware is a lack of backup. At Linux Global, we protect our customers from Ransomware in a number of ways:

  • Frequent offsite backups
  • Realtime data replication for databases
  • Frequent server snapshots with offsite image backup

With these systems in place your data is safe, even in the event of your files being encrypted by Ransomware.

We can even protect your Windows server by using Linux to virtualize your infrastructure and maintain hourly snapshots with frequent offsite backups.

Call today and Ransomware-proof your infrastructure!

 

 

 

 

CentOS/RHEL/Scientific Linux 5 404 Not found

Problems Updating Packages

You may have landed here because of errors like the following:

 ~]# yum upgrade
Loaded plugins: fastestmirror, security
Determining fastest mirrors
 * base: mirrors.kernel.org
 * extras: mirror.chpc.utah.edu
 * updates: mirror.steadfast.net
http://mirrors.kernel.org/centos/5.11/os/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404: Not Found
Trying other mirror.

This is because on March 31st, all of the EL5 derivatives are now end of life. It is time to update to EL6 or EL7.

If you are unable to update immediately, but would like to get your OS as up to date as possible, then CentOS users can update their repositories from mirror.centos.org to vault.centos.org in /etc/yum.repo.d/CentOS-*.conf . For Scientific Linux, you will want to use the following link in your repository files: http://ftp.scientificlinux.org/linux/scientific/obsolete/511/$basearch/SL/.

We have migrated many CentOS/RHEL/Scientific Linux systems and would be happy to assist if you need help!

-Eric

Issues Upgrading CentOS/RHEL/Scientific Linux 7.2 to 7.3

If you’ve been a systems administrator for awhile, then you know it’s best practice to have security updates to install automatically—and you also know that this breaks things from time to time. This happened to use when EL 7.3 came out a few months ago and caused unexpected issues with systems running KVM, libvirt, and LVM2 with large quantities of snapshots (4,480 and counting!).

The first issue that we discovered was virtual machine lockup during live migration. This is related to an MSR_TSC_AUX update that Redhat pushed into 7.3, but for which the Linux 4.1.y stable branches had not yet merged the kernel update to support this. While I’ve not yet tested 4.1.39, it appears to have those patches. Most users will not experience this particular bug if they are using the vendor provided EL7 kernel—but if you are using 4.1 in order to have stable bcache support, then you might run into this. You can read more details on the patches here: https://patchwork.kernel.org/patch/9538171/

Shortly after we discovered the first issue (but before we had time to fix it), we discovered that LUKS passthrough crashes libvirt unless you are using libvirt’s keystore. Since we pass encrypted volumes directly into the virtual machine and let the virtual machine unlock the volume, this was causing endless segmentation faults of libvirtd as systemd restarted it after failure. After much troubleshooting and inspection with GDB to figure out where the problem actually was, we discovered that libvirt was assuming that all LUKS volumes have a key in their keystore. This has been fixed in the latest version, and more information about this is available here: https://bugzilla.redhat.com/show_bug.cgi?id=1411394

Not to be outdone, the 7.2 to 7.3 upgrade was also causing segmentation faults of dmeventd. At the time, we did not know that it was a bug in LVM2—but having a third issue compounded with the two above, it was time for more drastic measures: Revert the packages! After installing the EL7.2 version of libvirt, KVM, and LVM2 (and their dependencies), we were back up and running.

Feeling brave, we decided to try the 7.3 upgrade again today since the first two issues were fixed. At the time, we didn’t really know the third issue was an issue independent of the others, so this was our first opportunity to investigate. This issue is still outstanding, and the actual problem is unclear. We have found the first bad commit (9156c5d dmeventd rework locking code) in LVM2 and posted to the lvm-devel list, so hopefully this will be fixed soon. For the moment we are holding back LVM2 updates which seems to be working fine with the rest of the system packages upgraded to 7.3. You can read more about the beginning of this fix here: https://www.redhat.com/archives/lvm-devel/2017-March/msg00354.html

So is it time to 7.3 from 7.2? Yes! But only if you hold back LVM2. The easiest way to do this is to add the following to your /etc/yum.repo.d/CentOS-Base.repo in the [base] and [updates] sections:

exclude=lvm2* device-mapper*

Update: Tue Apr 4 16:25:39 PDT 2017

The LVM problem was related to the reserved_stack value in /etc/lvm/lvm.conf being too high on our system. Somehow this introduced a regression in LVM2 since it certainly worked before in EL7.2 .

So, if you get an error like this, shrink your reserved_stack and see if it fixes the problem:

kernel: dmeventd[28383]: segfault at 7f9477240ea8 ip 00007f9473f24617 sp 00007f9477240eb0 error 6 in liblvm2cmd.so.2.02[7f9473e83000+191000]

-Eric