RedHat/CentOS/RHEL 7 does not copy mdadm.conf into Dracut

Force MD and LUKS Auto-Detection

There is a bug in RedHat 7 releases for some systems when md is used that prevents booting. For some reason it does not copy mdadm.conf into the initrd generated by dracut. The fix recommended on the bug page (https://bugzilla.redhat.com/show_bug.cgi?id=1015204) recommends adding rd.md.uuid=<UUID> but that can be alot of work if you have many volumes. In addition, if you cannot paste the UUID then it is hard to type.

To automatically enable md and luks detection, add “rd.auto=1” to the kernel command line. You can see other command line options in the dracut documentation here: https://www.man7.org/linux/man-pages/man7/dracut.cmdline.7.html

Force Docker to Boot Container into One Program

Specifying the Program that Docker Should Launch at Start Time

Sometimes if you are working with a container that gets stuck in a boot loop you need to force it to start a specific application for debugging purposes. We had this problem using the Dropbox container that was build by  janeczku.

It works great most of the time, but during one of the Dropbox updates, the container would not start so it would start and stop rapidly. This is a persistent container that comes up each time the server boots so we need to start it into something like `sleep 30m` so that we can run `docker -it exec dropbox /bin/bash` and inspect to see what the problem is.

Modifying the Configuration

In /var/lib/dropbox/<hash>/config.v2.json you will see an XML file similar to the one below, but notably, it has not been pretty printed as we have done for you. Somewhere in the list you will find the “Entrypoint” setting. Our system came with this set as “Entrypoint” : [ “/root/run” ], but /root/run is the script that is exiting and causing the container to boot loop. For our example, we have moved the “Entrypoint” line in our config at the top so you can see it, though yours can be anywhere in the file. Note that the array is the command to run followed by its list of arguments and we set it to /bin/sleep 30m as you can see below so that we can log into the container and get a bash prompt.

Note that the Docker service needs to be stopped before modifying the config file. In our case there was only one container so stopping the service was not an issue; if you have multiple containers then you may need to find a way to modify the config so that Docker will accept it without a restart—and if you do find out how to do that please post it in the comments below.

I hope this works for you, we searched all over the place and had trouble finding these configuration details.

{
   "Entrypoint" : [
      "/bin/sleep",
      "30m"
   ],
   "NetworkSettings" : {
      "SecondaryIPAddresses" : null,
      "SandboxID" : "cdb37ab22aecc2d6bbde325c515c034f2aa3a2677f39df75c77712ebe2e9c545",
      "SecondaryIPv6Addresses" : null,
      "SandboxKey" : "/var/run/docker/netns/cdb37ab22aec",
      "LinkLocalIPv6Address" : "",
      "Bridge" : "",
      "HasSwarmEndpoint" : false,
      "Service" : null,
      "Ports" : null,
      "Networks" : {
         "bridge" : {
            "IPAMOperational" : false,
            "EndpointID" : "",
            "GlobalIPv6PrefixLen" : 0,
            "IPPrefixLen" : 0,
            "IPAMConfig" : null,
            "IPAddress" : "",
            "IPv6Gateway" : "",
            "Aliases" : null,
            "Links" : null,
            "NetworkID" : "237c33c976ab7863c1a869b2f19492ecdd4d820763b7033cc50397147b26d324",
            "MacAddress" : "",
            "GlobalIPv6Address" : "",
            "Gateway" : ""
         }
      },
      "HairpinMode" : false,
      "LinkLocalIPv6PrefixLen" : 0,
      "IsAnonymousEndpoint" : false
   },
   "SeccompProfile" : "",
   "RestartCount" : 0,
   "HostsPath" : "/var/lib/docker/containers/18ea58191394f07d288383cc66c5ea58d99c0dac8c9c5007158b9a2378d6b66e/hosts",
   "ExposedPorts" : {
      "17500/tcp" : {}
   },
   "HostnamePath" : "/var/lib/docker/containers/18ea58191394f07d288383cc66c5ea58d99c0dac8c9c5007158b9a2378d6b66e/hostname",
   "MountLabel" : "system_u:object_r:svirt_sandbox_file_t:s0:c57,c401",
   "HasBeenStartedBefore" : true,
   "Labels" : {},
   "NoNewPrivileges" : false,
   "OpenStdin" : false,
   "Hostname" : "18ea58191394",
   "Volumes" : {
      "/dbox/Dropbox" : {},
      "/dbox/.dropbox" : {}
   },
   "MountPoints" : {
      "/dbox/Dropbox" : {
         "Spec" : {
            "Target" : "/dbox/Dropbox/",
            "Type" : "bind",
            "Source" : "/data/Dropbox (GPI)/"
         },
         "RW" : true,
         "Destination" : "/dbox/Dropbox",
         "Type" : "bind",
         "Propagation" : "rprivate",
         "Source" : "/data/Dropbox (GPI)",
         "Driver" : "",
         "Name" : ""
      },
      "/dbox/.dropbox" : {
         "Spec" : {
            "Target" : "/dbox/.dropbox",
            "Type" : "volume"
         },
         "Type" : "volume",
         "RW" : true,
         "Driver" : "local",
         "Destination" : "/dbox/.dropbox",
         "Source" : "",
         "Name" : "eb96b3bab31838496f2ca3ce0b6db476aaba9c16d9b6bc0f7da3d05f1f964120"
      }
   },
   "Env" : [
      "DBOX_UID=501",
      "DBOX_GID=1011",
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "DEBIAN_FRONTEND=noninteractive"
   ],
   "StdinOnce" : false,
   "ArgsEscaped" : true,
   "ResolvConfPath" : "/var/lib/docker/containers/18ea58191394f07d288383cc66c5ea58d99c0dac8c9c5007158b9a2378d6b66e/resolv.conf",
   "HasBeenManuallyStopped" : false,
   "Driver" : "overlay2",
   "AttachStdout" : false,
   "User" : "root",
   "ProcessLabel" : "system_u:system_r:svirt_lxc_net_t:s0:c57,c401",
   "Cmd" : null,
   "AttachStdin" : false,
   "AttachStderr" : false,
   "SecretReferences" : null,
   "AppArmorProfile" : "",
   "ShmPath" : "/var/lib/docker/containers/18ea58191394f07d288383cc66c5ea58d99c0dac8c9c5007158b9a2378d6b66e/shm",
   "Image" : "sha256:a8964074d4f6eac2dfdbf03200c4c73d571b1ea7ad8fcb8d99b918642de2f8d2",
   "Tty" : false,
   "LogPath" : "",
   "Domainname" : "",
   "WorkingDir" : "/dbox/Dropbox",
   "OnBuild" : null,
   "Name" : "/dropbox"
}

-Eric

Debug spinning PHP script on a WHM/cPanel Server

Getting A Backtrace

Sometimes PHP will spin at 100% CPU and it is difficult to figure out why. The `strace` command is too noisy, and without knowing where in the code there is a problem, you cannot insert your own backtrace. The newer version of WHM has support for multiple PHP versions, so make sure you run this for whatever PHP version the site is using. In our case, this is using php-fpm.

First, install xdebug:

/opt/cpanel/ea-php72/root/usr/bin/pecl install xdebug

After that, follow the instructions here: https://stackoverflow.com/questions/14261821/get-a-stack-trace-of-a-hung-php-script#53056294

Basically you just need to run the following:

gdb --batch --readnever --pid=$pid --command=/tmp/dumpstack.gdbscript

And the content of dumpstack.gdbscript is:

set $xstack = ((long**)&xdebug_globals)[2]
if ($xstack !=0) && ($xstack[0]!=0)
set $pcurrent = (long*)$xstack[0]
while $pcurrent
set $xptr = (long*)$pcurrent[0]
set $xptr_s = (char**)$xptr
set $xptr_i = (int*)$xptr
set $filename = $xptr_s[4]
set $funcname = $xptr_s[1]
set $linenum = $xptr_i[10]
if ($funcname!=0)
printf "%s@%s:%d\\n", $funcname, $filename, $linenum
else
printf "global@%s:%d\\n", $filename, $linenum
end
set $pnext = (long*)$pcurrent[2]
set $pcurrent = $pnext
end
else
printf "no stack"
end

Fix LVM Thin Can’t create snapshot, Failed to suspend vg01/pool0 with queued messages

Fix LVM Thin Snapshot Creation Errors

From time to time you might see errors like the following:

~]# lvcreate -s -n foo-snap data/foo
Can’t create snapshot bar-snap as origin bar is not suspended.
Failed to suspend vg01/pool0 with queued messages.

You will note that foo and bar have nothing to do with each other, but the error message prevents creating additional thin volumes. While the cause is unknown, the fix is easy. Something caused LVM to try to create an LVM that it was unable to complete, so it generates this in its metadata:

message1 {
create = "bar-snap"
}

The Fix

  1. deactivate the thinpool
  2. dump the VG metadata
  3. backup the file
  4. remove the message1 section
  5. restore the metadata.

The Procedure

  • vgcfgbackup -f /tmp/pool0-current vg01
  • cp /tmp/pool0-current /tmp/pool0-current-orig # backup the file before making changes
  • vim /tmp/pool0-current # remove the message1 section in vg01 -> logical_volumes -> pool0
  • vgcfgrestore -f /tmp/pool0-current vg01 –force

Hopefully this works for you, and hopefully whatever causes this gets fixed upstream.

LSI Megaraid Storage Manager Does Nothing

Installing Broadcom MSM for LSI Megaraid Cards

On a minimal CentOS install I found that MSM would refuse to load when I ran “/usr/local/MegaRAID\ Storage\ Manager/startupui.sh”.  It would just exit without an error.  If you cat the script you will notice java running into /dev/null, thus hiding useful errors—so remove the redirect!  At least then we can see the error.

Since this was a minimal install, I was missing some of the X libraries that MSM wanted.  This fixed it:

yum install libXrender libXtst

-Eric

 

Redirect Directory Trailing Slash (/) with Restricted Access

Securing Apache and Maintaining Usability

First, you should always avoid .htaccess and use it as a last resort. Still, this example holds whether or not you are using .htaccess.

Let’s say you have a directory you wish to secure so that only the index and some file (test.txt) is available. Other other content in the directory should be denied. For example:

These links should load:

  • www.example.com/foo
  • www.example.com/foo/
  • www.example.com/foo/test.txt

In addition, the link without the trailing / should redirect to the link with the trailing / (from /foo to /foo/) for ease of access for your users.

These links should give a 403:

  • www.example.com/foo/bar
  • www.example.com/foo/letmein.txt

To accomplish this, you might write a .htaccess as follows:

Apache 2.2

Order allow,deny
<Files ~ ^$|^index.html$|^test.txt$>
     Order deny,allow
</Files>

Apache 2.4

Require all denied
<Files ~ ^$|^index.html$|^test.txt$>
     Require all granted
</Files>

However, you will run into a problem: The link without a trailing / will not work (www.example.com/foo) because permissions are evaluated before the mod_dir module’s DirectorySlash functionality evaluates whether or not this is a directory. While not intuitive, we also must add the directory as a file name to be allowed as follows:

Apache 2.2

Order allow,deny
<Files ~ ^foo$|^$|^index.html$|^test.txt$>
     Order deny,allow
</Files>

Apache 2.4

Require all denied
<Files ~ ^foo$|^$|^index.html$|^test.txt$>
     Require all granted
</Files>

Hopefully this will help anyone else dealing with a similar issue because it took us a lot of troubleshooting to pin this down. Here are some search terms you might try to find this post:

  • Apache 403 does not add trailing /
  • Apache does not add trailing slash
  • .htaccess deny all breaks trailing directory slash
  • .htaccess Require all denied breaks trailing directory slash

-Eric

 

librsync error: “RS_DEFAULT_STRONG_LEN” undeclared

We needed to compile an old version of rdiff-backup on CentOS 7 but got the following error:

 _librsyncmodule.c: In function â_librsync_new_sigmakerâ:
_librsyncmodule.c:63:17: error: âRS_DEFAULT_STRONG_LENâ undeclared (first use in this function)
 (size_t)RS_DEFAULT_STRONG_LEN);
 ^
_librsyncmodule.c:63:17: note: each undeclared identifier is reported only once for each function it appears in
_librsyncmodule.c:63:9: error: too few arguments to function ârs_sig_beginâ
 (size_t)RS_DEFAULT_STRONG_LEN);
 ^
In file included from _librsyncmodule.c:25:0:
/usr/include/librsync.h:370:11: note: declared here
 rs_job_t *rs_sig_begin(size_t new_block_len,
 ^
error: command 'gcc' failed with exit status 1

The librsync library changed the colling convention of `rs_sig_begin` so if you get an error like that, then a patch like this might help:

]# diff -uw _librsyncmodule.c.ORIG _librsyncmodule.c
--- _librsyncmodule.c.ORIG 2006-11-11 23:32:01.000000000 -0800
+++ _librsyncmodule.c 2018-02-20 11:22:06.529111816 -0800
@@ -59,8 +59,8 @@
 if (sm == NULL) return NULL;
 sm->x_attr = NULL;

- sm->sig_job = rs_sig_begin((size_t)blocklen,
- (size_t)RS_DEFAULT_STRONG_LEN);
+ sm->sig_job = rs_sig_begin((size_t)blocklen, 8,
+ (size_t)RS_MD4_SIG_MAGIC);
 return (PyObject*)sm;
 }

-Eric

Issues Upgrading CentOS/RHEL/Scientific Linux 7.2 to 7.3

If you’ve been a systems administrator for awhile, then you know it’s best practice to have security updates to install automatically—and you also know that this breaks things from time to time. This happened to use when EL 7.3 came out a few months ago and caused unexpected issues with systems running KVM, libvirt, and LVM2 with large quantities of snapshots (4,480 and counting!).

The first issue that we discovered was virtual machine lockup during live migration. This is related to an MSR_TSC_AUX update that Redhat pushed into 7.3, but for which the Linux 4.1.y stable branches had not yet merged the kernel update to support this. While I’ve not yet tested 4.1.39, it appears to have those patches. Most users will not experience this particular bug if they are using the vendor provided EL7 kernel—but if you are using 4.1 in order to have stable bcache support, then you might run into this. You can read more details on the patches here: https://patchwork.kernel.org/patch/9538171/

Shortly after we discovered the first issue (but before we had time to fix it), we discovered that LUKS passthrough crashes libvirt unless you are using libvirt’s keystore. Since we pass encrypted volumes directly into the virtual machine and let the virtual machine unlock the volume, this was causing endless segmentation faults of libvirtd as systemd restarted it after failure. After much troubleshooting and inspection with GDB to figure out where the problem actually was, we discovered that libvirt was assuming that all LUKS volumes have a key in their keystore. This has been fixed in the latest version, and more information about this is available here: https://bugzilla.redhat.com/show_bug.cgi?id=1411394

Not to be outdone, the 7.2 to 7.3 upgrade was also causing segmentation faults of dmeventd. At the time, we did not know that it was a bug in LVM2—but having a third issue compounded with the two above, it was time for more drastic measures: Revert the packages! After installing the EL7.2 version of libvirt, KVM, and LVM2 (and their dependencies), we were back up and running.

Feeling brave, we decided to try the 7.3 upgrade again today since the first two issues were fixed. At the time, we didn’t really know the third issue was an issue independent of the others, so this was our first opportunity to investigate. This issue is still outstanding, and the actual problem is unclear. We have found the first bad commit (9156c5d dmeventd rework locking code) in LVM2 and posted to the lvm-devel list, so hopefully this will be fixed soon. For the moment we are holding back LVM2 updates which seems to be working fine with the rest of the system packages upgraded to 7.3. You can read more about the beginning of this fix here: https://www.redhat.com/archives/lvm-devel/2017-March/msg00354.html

So is it time to 7.3 from 7.2? Yes! But only if you hold back LVM2. The easiest way to do this is to add the following to your /etc/yum.repo.d/CentOS-Base.repo in the [base] and [updates] sections:

exclude=lvm2* device-mapper*

Update: Tue Apr 4 16:25:39 PDT 2017

The LVM problem was related to the reserved_stack value in /etc/lvm/lvm.conf being too high on our system. Somehow this introduced a regression in LVM2 since it certainly worked before in EL7.2 .

So, if you get an error like this, shrink your reserved_stack and see if it fixes the problem:

kernel: dmeventd[28383]: segfault at 7f9477240ea8 ip 00007f9473f24617 sp 00007f9477240eb0 error 6 in liblvm2cmd.so.2.02[7f9473e83000+191000]

-Eric

CentOS6 initrd says “already mounted or /sysroot busy”

If you are booting a CentOS 6 system after having migrated its root filesystem to a new volume, you might get the following errors if /proc or /sys is missing:

EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
mount: /dev/mapper/vg0-root already mounted or /sysroot busy
mount: according to mtab, /dev/mapper/vg0-root is already mounted on /sysroot
dracut: Remounting /dev/mapper/vg0-root with -o relatime,ro
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
mount: /dev/mapper/vg0-root already mounted or /sysroot busy
mount: according to mtab, /dev/mapper/vg0-root is already mounted on /sysroot
dracut: Remounting /dev/mapper/vg0-root with -o relatime,ro
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
dracut Warning: Can't mount root filesystem

To fix this, all you need to do is mount the root filesystem and “mkdir proc/ sys/”. You can even do this from inside of dracut if you add the “rdshell” argument to the end of your kernel command line:

dracut:/# mount -o remount,rw /sysroot
dracut:/# mkdir /sysroot/proc /sysroot/sys
dracut:/# mount -o remount,ro /sysroot
dracut:/# exit
(You may need to reboot the server)

-Eric