Debug spinning PHP script on a WHM/cPanel Server

Getting A Backtrace

Sometimes PHP will spin at 100% CPU and it is difficult to figure out why. The `strace` command is too noisy, and without knowing where in the code there is a problem, you cannot insert your own backtrace. The newer version of WHM has support for multiple PHP versions, so make sure you run this for whatever PHP version the site is using. In our case, this is using php-fpm.

First, install xdebug:

/opt/cpanel/ea-php72/root/usr/bin/pecl install xdebug

After that, follow the instructions here: https://stackoverflow.com/questions/14261821/get-a-stack-trace-of-a-hung-php-script#53056294

Basically you just need to run the following:

gdb --batch --readnever --pid=$pid --command=/tmp/dumpstack.gdbscript

And the content of dumpstack.gdbscript is:

set $xstack = ((long**)&xdebug_globals)[2]
if ($xstack !=0) && ($xstack[0]!=0)
set $pcurrent = (long*)$xstack[0]
while $pcurrent
set $xptr = (long*)$pcurrent[0]
set $xptr_s = (char**)$xptr
set $xptr_i = (int*)$xptr
set $filename = $xptr_s[4]
set $funcname = $xptr_s[1]
set $linenum = $xptr_i[10]
if ($funcname!=0)
printf "%s@%s:%d\\n", $funcname, $filename, $linenum
else
printf "global@%s:%d\\n", $filename, $linenum
end
set $pnext = (long*)$pcurrent[2]
set $pcurrent = $pnext
end
else
printf "no stack"
end

Fix LVM Thin Can’t create snapshot, Failed to suspend vg01/pool0 with queued messages

Fix LVM Thin Snapshot Creation Errors

From time to time you might see errors like the following:

~]# lvcreate -s -n foo-snap data/foo
Can’t create snapshot bar-snap as origin bar is not suspended.
Failed to suspend vg01/pool0 with queued messages.

You will note that foo and bar have nothing to do with each other, but the error message prevents creating additional thin volumes. While the cause is unknown, the fix is easy. Something caused LVM to try to create an LVM that it was unable to complete, so it generates this in its metadata:

message1 {
create = "bar-snap"
}

The Fix

  1. deactivate the thinpool
  2. dump the VG metadata
  3. backup the file
  4. remove the message1 section
  5. restore the metadata.

The Procedure

  • vgcfgbackup -f /tmp/pool0-current vg01
  • cp /tmp/pool0-current /tmp/pool0-current-orig # backup the file before making changes
  • vim /tmp/pool0-current # remove the message1 section in vg01 -> logical_volumes -> pool0
  • vgcfgrestore -f /tmp/pool0-current vg01 –force

Hopefully this works for you, and hopefully whatever causes this gets fixed upstream.

Dropbox Workaround For Pre-glibc 2.19 Support

Make Dropbox Run On CentOS 7

You may have noticed that Dropbox has dropped support for operating systems with glibc earlier than 2.19, and they are also requiring that everyone use ext4. In reality, they just need xattr support, which many filesystems support. I am guessing they just don’t want to deal with possible bugs on other filesystems so they are only certifying for ext4. At this time, they do not appear to be using glibc 2.19 specific functionality, so we can use a LD_PRELOAD to inform Dropbox that it is running under glibc 2.19 and that it is running on an ext4 filesystem.

We have this working and tested at Stanford University’s Physics department for KIPAC and their terabytes of scientific data.

Important: This is a workaround and is not supported by Dropbox. We have no affiliation with Dropbox and it is possible that this fix will stop working if Dropbox starts requiring functionality from glibc 2.19 that is not available in your glibc release. WE ARE NOT RESPONSIBLE FOR ANY DATA LOSS. This is provided WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Now that you have agreed to the legal disclaimer, you may try it for yourself: https://www.linuxglobal.com/src/unbox/unbox.c

To build, do this:
gcc -shared -fPIC unbox.c -o unbox.so -ldl

You can then test it by running it under LD_PRELOAD:

]# LD_PRELOAD=`pwd`/unbox.so df -Th
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs ext4 2.0G 0 2.0G 0% /dev
tmpfs ext4 2.0G 8.0K 2.0G 1% /dev/shm
tmpfs ext4 2.0G 209M 1.8G 11% /run
tmpfs ext4 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mapper/centos-root ext4 66G 63G 966M 99% /
/dev/vda1 ext4 477M 326M 127M 73% /boot
/dev/mapper/centos-var ext4 2.0G 1.4G 532M 72% /var
/dev/mapper/centos-tmp ext4 969M 4.4M 898M 1% /tmp
tmpfs ext4 396M 0 396M 0% /run/user/0

 

Note that all of the filesystems are listed as ext4, and Dropbox will see it this way, too. To run Dropbox, you will need to put the LD_PRELOAD in front of it like so:

LD_PRELOAD=/path/to/unbox.so exec /usr/local/dropbox-2.10.0/bin/dropbox "$@"

The `dropbox` file being referenced above is a Python script provided by Dropbox which you can get from here: https://www.dropbox.com/install-linux

Note that because Dropbox now requires xattrs, NFS will not work (even with this work-around) until NFS supports xattrs. Conceivably, one could extend Unbox to implement xattrs in userspace using something like sqlite and linking them by inode. If anyone is interested in doing this, contact us and we may be able to help you develop that.

I hope this helps, feel free to forward this link to others that may benefit!

-Eric

LSI Megaraid Storage Manager Does Nothing

Installing Broadcom MSM for LSI Megaraid Cards

On a minimal CentOS install I found that MSM would refuse to load when I ran “/usr/local/MegaRAID\ Storage\ Manager/startupui.sh”.  It would just exit without an error.  If you cat the script you will notice java running into /dev/null, thus hiding useful errors—so remove the redirect!  At least then we can see the error.

Since this was a minimal install, I was missing some of the X libraries that MSM wanted.  This fixed it:

yum install libXrender libXtst

-Eric

 

Redirect Directory Trailing Slash (/) with Restricted Access

Securing Apache and Maintaining Usability

First, you should always avoid .htaccess and use it as a last resort. Still, this example holds whether or not you are using .htaccess.

Let’s say you have a directory you wish to secure so that only the index and some file (test.txt) is available. Other other content in the directory should be denied. For example:

These links should load:

  • www.example.com/foo
  • www.example.com/foo/
  • www.example.com/foo/test.txt

In addition, the link without the trailing / should redirect to the link with the trailing / (from /foo to /foo/) for ease of access for your users.

These links should give a 403:

  • www.example.com/foo/bar
  • www.example.com/foo/letmein.txt

To accomplish this, you might write a .htaccess as follows:

Apache 2.2

Order allow,deny
<Files ~ ^$|^index.html$|^test.txt$>
     Order deny,allow
</Files>

Apache 2.4

Require all denied
<Files ~ ^$|^index.html$|^test.txt$>
     Require all granted
</Files>

However, you will run into a problem: The link without a trailing / will not work (www.example.com/foo) because permissions are evaluated before the mod_dir module’s DirectorySlash functionality evaluates whether or not this is a directory. While not intuitive, we also must add the directory as a file name to be allowed as follows:

Apache 2.2

Order allow,deny
<Files ~ ^foo$|^$|^index.html$|^test.txt$>
     Order deny,allow
</Files>

Apache 2.4

Require all denied
<Files ~ ^foo$|^$|^index.html$|^test.txt$>
     Require all granted
</Files>

Hopefully this will help anyone else dealing with a similar issue because it took us a lot of troubleshooting to pin this down. Here are some search terms you might try to find this post:

  • Apache 403 does not add trailing /
  • Apache does not add trailing slash
  • .htaccess deny all breaks trailing directory slash
  • .htaccess Require all denied breaks trailing directory slash

-Eric

 

FIPS: Failed to start Cryptography Setup

Enabling FIPS mode for CentOS 7 changes the way that the kernel initramfs loads crypto modules. If you simply enable FIPS mode with fips=1 on the kernel command line, then it will fail to boot with an error message like the following:

[FAILED] Failed to start Cryptography Setup for luks-....

FIPS LUKS Error
Settings fips=1 on the kernel commandline causes this error.

After digging a little bit deeper in the logs, you might find the following:

Libgcrypt error: integrity check using `/lib64/.libgcrypt.so.11.hmac' failed: No such file or directory

This is because Dracut is not packaging the .hmac file when it builds the initramfs, so you have to yum install dracut-fips-aesni and then rebuild the initramfs with dracut --force . Be sure you are running the latest kernel version, because by default Dracut will build the initramfs for the kernel that you are running, so if there is a new version available, then it will load if you reboot without the .hmac file.

If you do not have hardware AES support, then you can omit -aesni and install dracut-fips. Even if you do not have hardware support, however, installing the aesni version should still work, but without the performance boost.

Once enabling FIPS mode, we discovered on a CentOS 7 install that the drbg kernel module was not loaded which prevents aes-xts-plain64 formatted LUKS volumes (and possibly others) from being activated by cryptsetup. To fix this, add the following to your kernel commandline: rd.driver.post=drbg .  This problem is evident if you see the error error allocating crypto tfm at boot time or in the Dracut journal.

Finally, it is common to mount the boot partition on a different volume. If this is the case, then Dracut in FIPS mode will require the .hmac for vmlinuz and may give an error like the following: /boot/.vmlinuz-3.10.0-693.21.1-el7.x86_64.hmac does not exist . To fix this, specify the boot partition so that Dracut will mount it before validation with one of these options:

boot=<boot device>
specify the device, where /boot is located. e.g.
boot=/dev/sda1
boot=/dev/disk/by-path/pci-0000:00:1f.1-scsi-0:0:1:0-part1
boot=UUID=<uuid>
boot=LABEL=<label>

Red Hat has documentation about this problem here: https://bugzilla.redhat.com/show_bug.cgi?id=1014527#c7

I hope this helps, when you are done you will have the following added to your kernel commandline:

fips=1 rd.driver.post=drbg boot=/dev/sdaX

You will of course need to specify the correct boot volume for your machine.

-Eric

Choosing Linux Server Hardware

Hardware Options

Let’s assume that you are trying to decide between two different servers:

  • Server 1: 8 cores / 16 threads at 2.1GHz
  • Server 2: 4 cores / 8 threads at 3.8GHz

Considering Your Options

Choosing a server will depend on your workload. If you know you will be running lots of simultaneous connections and that each individual connection does not have a low latency requirement, then the first server would work best because it can handle more parallel processing. On the other hand, if you have fewer concurrent connections and each connection must complete quickly, then the second server is better.

We tend to prefer higher clock speed to higher core count because operations complete faster. For example, PHP pages will load almost 2x faster if the processors are not saturated. If they do saturate to somewhere between 150% to 180% of load, then it will run at about the same speed as server 1 based on the clock speeds and a 10% guess on context switch overhead.

We always try to build with redundancy in mind for long-term stability, so these are some considerations when thinking about your deployment:

  • You might get two identical servers. We could then configure them to be redundant so that either server can take over if one fails.
  • You could get your own routable network block so you can have a DMZ separate from your provider’s shared public network. This will reduce your clients’ security exposure to man-in-the-middle attacks.
  • If you get a routable network, then firewalls can be configured as separate virtual instances on the server(s) that will automatically recover if one of them has a problem.

 

Choosing the right hardware is important whether you are moving an existing server or deploying a new one, so please let us know if we can help you with your server planning.

-Eric

Linux Keeps Amazing Uptimes

I’ve seen lots of great uptimes, but I don’t remember seeing one over 1000 days.  More than 3 years and counting, awesome:

~]# w
 22:53:44 up 1129 days, 6:22, 2 users, load average: 0.52, 0.73, 0.57
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root tty1 - 21Jan15 1129days 0.27s 0.27s -bash
root pts/0 support.ewheeler 22:53 0.00s 0.01s 0.00s w

 ~]# uname -a
Linux progress-quest.localdomain 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 21:36:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

~]# ifconfig 
eth0 Link encap:Ethernet HWaddr 00:0C:29:00:00:01
 inet addr:10.11.11.11 Bcast:10.11.255.255 Mask:255.255.0.0
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:534204074 errors:0 dropped:0 overruns:0 frame:0
 TX packets:589891379 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000 
 RX bytes:115062111449 (107.1 GiB) TX bytes:111461390980 (103.8 GiB)

lo Link encap:Local Loopback 
 inet addr:127.0.0.1 Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING MTU:16436 Metric:1
 RX packets:18149132 errors:0 dropped:0 overruns:0 frame:0
 TX packets:18149132 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0 
 RX bytes:2011869682 (1.8 GiB) TX bytes:2011869682 (1.8 GiB)

What are your greatest uptimes?  Please post in the comments!

-Eric

librsync error: “RS_DEFAULT_STRONG_LEN” undeclared

We needed to compile an old version of rdiff-backup on CentOS 7 but got the following error:

 _librsyncmodule.c: In function â_librsync_new_sigmakerâ:
_librsyncmodule.c:63:17: error: âRS_DEFAULT_STRONG_LENâ undeclared (first use in this function)
 (size_t)RS_DEFAULT_STRONG_LEN);
 ^
_librsyncmodule.c:63:17: note: each undeclared identifier is reported only once for each function it appears in
_librsyncmodule.c:63:9: error: too few arguments to function ârs_sig_beginâ
 (size_t)RS_DEFAULT_STRONG_LEN);
 ^
In file included from _librsyncmodule.c:25:0:
/usr/include/librsync.h:370:11: note: declared here
 rs_job_t *rs_sig_begin(size_t new_block_len,
 ^
error: command 'gcc' failed with exit status 1

The librsync library changed the colling convention of `rs_sig_begin` so if you get an error like that, then a patch like this might help:

]# diff -uw _librsyncmodule.c.ORIG _librsyncmodule.c
--- _librsyncmodule.c.ORIG 2006-11-11 23:32:01.000000000 -0800
+++ _librsyncmodule.c 2018-02-20 11:22:06.529111816 -0800
@@ -59,8 +59,8 @@
 if (sm == NULL) return NULL;
 sm->x_attr = NULL;

- sm->sig_job = rs_sig_begin((size_t)blocklen,
- (size_t)RS_DEFAULT_STRONG_LEN);
+ sm->sig_job = rs_sig_begin((size_t)blocklen, 8,
+ (size_t)RS_MD4_SIG_MAGIC);
 return (PyObject*)sm;
 }

-Eric