Quickly fill a disk with random bits (without /dev/urandom)

When an encrypted medium is prepared for use, it is best practice to fill the disk end-to-end with random bits.  If the disk is not prepared with random bits, then an attacker could see which blocks have and have not been written, simply by running a block-by-block statistical analysis:  if the average 1/0 ratio is near 50%, its probably encrypted.  It may be simpler than this for new disks, since they tend to default with all-zero’s.

This is a well-known problem, and many will encourage you to use /dev/urandom to fill the disk.  Unfortuntaly, /dev/urandom is much slower than even rotational disks, let-alone GB/sec RAID on SSD’s:

root@geekdesk:~# dd if=/dev/urandom of=/dev/null bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 7.24238 s, 14.5 MB/s
root@geekdesk:~#

So how can we fill a block device with random bits, quickly?  The answer might be surprising:  we use /dev/zero—but write to the encrypted device.  Once the encrypted device is full, we erase the LUKS header with /dev/urandom.  The second step is of course slower, but we need only overwrite the first 1MB so it takes a fraction of a second.

Note that the password we are using (below) needn’t be remembered—in fact, you shouldn’t be able to remember it.  Use something long and random for a password, and keep it just long enough to erase the volume.  I use base64 from /dev/urandom for password generation:

# 256 random bits
dd if=/dev/urandom bs=1 count=32 | base64

Now format the volume and map it with luksOpen.  Note that we are not using a filesystem—this is all at block-layer:

root@geekdesk:~# cryptsetup luksFormat /dev/loop3
This will overwrite data on /dev/loop3 irrevocably.
Are you sure? (Type uppercase yes): YES
Enter LUKS passphrase: <random one-time-use password>
Verify passphrase:
root@geekdesk:~# cryptsetup luksOpen /dev/loop3 testdev
Enter passphrase for /dev/loop3: <same password as above>

root@geekdesk:~# dd if=/dev/zero of=/dev/mapper/testdev bs=1M
dd: writing `/dev/mapper/testdev': No space left on device
99+0 records in
98+0 records out
103804928 bytes (104 MB) copied, 1.21521 s, 85.4 MB/s

See, more than 6x faster (the disk is most likely the 85MB/s bottleneck)!  This will save hours (or days) when preparing multi-terabyte volumes.  Now remove the device mapping, and urandom the first 1MB of the underlying device:

# This line is the same as "cryptsetup luksClose testdev"
root@geekdesk:~# dmsetup remove /dev/mapper/testdev
root@geekdesk:~# dd if=/dev/urandom of=/dev/loop3 bs=512 count=2056
2056+0 records in
2056+0 records out
1052672 bytes (1.1 MB) copied, 0.0952705 s, 11.0 MB/s

Note that we overwrote the first 2056 blocks from /dev/urandom.  2056 is the default LUKS payload offset, but you can verify that you’ve overwritten the correct number of blocks using luksDump:

root@geekdesk:~# cryptsetup luksDump /dev/loop3
LUKS header information for /dev/loop3

Version:           1
Cipher name:       aes
Cipher mode:       cbc-essiv:sha256
Hash spec:         sha1
Payload offset:    2056   [...snip...]

Now your volume is prepared with random bits, and you may format it with any cryptographic block-device mechanism you prefer, safe knowing that an attacker cannot tell which blocks are empty, and which are in use (assuming the attacker has a single point-in-time copy of the block device).

I like LUKS since it is based on PKCS#11 and includes features such multiple passphrase slots and passphrase changes (it never reveals the actual device key, your passphrase unlocks the real key), but other volume encryption devices exist—or you might export the volume via iSCSI/ATAoE/FCoE and use a proprietary block-layer encryption mechanism.

If someone can explain an attack against this mechanism, I would be glad to hear about it.  In this example we used AES in CBC mode so we are spreading the IV bits across the entire volume.  Conceivably one could write an AES-CTR mode tool with a random key to do the same thing and this may be a stronger mechanism.  (To my knowledge, the dm-crypt toolchain does not have a CTR mode, nor would you want one for general use).

The method above fails when an attacker can tell the difference between the original AES-CBC wipe with random bits (where all plaintext bits are set to zero)—and the new encryption mechanism with a different key that will be used in production atop of the prepared disk volume.  While there may be an attack for AES-CBC with all-zero’s (though I don’t think there is), AES-CTR mode by its definition would make this method more effective since each block is independent of the next.  One might be able to argue that AES-CBC creates an AES-CTR mode implementation where the counter is a permutation of AES itself.  If this can be proven, then both methods are equally secure.

Either way, this is likely better (and definitely faster) than /dev/urandom for filling a disk, since /dev/urandom is a pseudo-random number generator.  Using /dev/urandom for terabytes of data may begin to develop a pattern once its effective entropy pool is spread too thin.  Even with seed-help from /dev/random, /dev/urandom might run out of steam.

In the end, random bits XORed with random bits still look like random bits when placed next to other random bits—but you’re welcome to debate this.  Yay for crypto!

ok, now back to work 🙂

-Eric

Edit: Mon Sep 17 19:43:22 PDT 2012
Come to think of it, you don’t even need a password at the luksFormat stage.  LUKS generates its own strong random bits for the actual block-cipher key.  The passphrase just unlocks that key.  For the purposes of wiping the disk with random bits, you can use “<enter><enter>” as your passphrase… just make sure you wipe the LUKS header in step 2 from /dev/urandom.

Recovering an overflowed LVM volume configured with –virtualsize

/dev/vg/somevolume: read failed after 0 of 4096 at nnnnn: Input/output error

If you’ve ever seen the above error, this usually means you have run out of disk space on the CoW-volume of a snapshot volume.

…but there is another uses for snapshots, and that is thin provisioning for sparse data use.  If you create an LVM volume using the –virtualsize option, you can provide a logical size that is much larger than the actual underlying volume.  If you exceed the space for such a volume, you will get the same error above—and all data on the volume will be invalidated and inaccessible.

LVM silently uses the ‘zero’ devicemapper target as the underlying volume.  Thus, even though the data is invalidated nothing is lost.  By overlaying the lost data over the top of a zero device, we can resurrect the data.

We have prepared our example file with the following:

lvcreate -L 100m --virtualsize 200m -n virtual_test vg
mkfs.ext4 /dev/vg/virtual_test
 [...]
mount /dev/vg/virtual_test /mnt/tmp/

And now we fill the disk:

dd if=/dev/zero of=/mnt/tmp/overflow-file
dd: writing to `/mnt/tmp/overflow-file': Input/output error

Message from syslogd@backup at Aug 27 15:17:27 ...
 kernel:journal commit I/O error
272729+0 records in
272728+0 records out
139636736 bytes (140 MB) copied
[I had to reboot here.  The kernel still thought
 the filesystem was mounted and I could not continue.
 Obviously we are working near the kernel's limits on
 this CentOS 6.2 2.6.32-based kernel]

Now we have a 200MB volume with 100MB allocated to it, which is now full.  LVM has marked the volume as invalid and the data is no longer available.

First, resize the volume so we have room after resizing.  Otherwise, the first byte written to the volume would, again, invalidate the disk:

lvresize -L +100m /dev/vg/virtual_test
 [errors, possibly, just ignore them]
  Extending logical volume virtual_test to 200.00 MiB
  Logical volume virtual_test successfully resized

Now we edit the -cow file directly with a short perl script.  The 5th byte is the ‘valid’ flag (see http://web.archive.org/web/20200808212331/http://www.redhat.com/archives/linux-lvm/2006-September/msg00132.html) so all we need to is set it to ‘1’:

 perl -e 'open(F, ">>", "/dev/mapper/vg-virtual_test-cow"); seek(F, 4, SEEK_SET); syswrite(F,"\x01",1); close(F);'

Now have lvm re-read the CoW metadata and you’re in business:

lvchange -an /dev/backup/virtual_test
  [ignore errors]
lvchange -ay /dev/backup/virtual_test
  [shouldn't have any errors]
lvs
  LV                    VG       Attr     LSize   Pool Origin               Data% 
  virtual_test          vg   swi-a-s- 200.00m      [virtual_test_vorigin]   33.63

At this point you should probably fsck your filesystem, it may be damaged—or at least nead a journal-replay since it stopped abruptly at the end of its allocated space.  And as you can see, the “overflow” file is there up until the point of filling the disk.

[root@backup mapper]# e2fsck /dev/vg/virtual_test
e2fsck 1.41.12 (17-May-2010)
/dev/vg/virtual_test: recovering journal
/dev/vg/virtual_test: clean, 12/51200 files, 66398/204800 blocks
[root@backup mapper]# mount /dev/vg/virtual_test /mnt/tmp/
[root@backup mapper]# ls -lh /mnt/tmp/
total 54M
drwx------. 2 root root 12K Aug 27 15:16 lost+found
-rw-r--r--. 1 root root 54M Aug 27 15:17 overflow-file

-Eric

Linux and Open Source: Internet Security and Vulnerability Disclosure

Internet attacks and vulnerabilities are increasingly held secret and sold to the highest bidder.  Unfortunately, this encourages developers to hide back doors and seel them on the open (black/grey) market.   This compromises the security of the Internet at large, and our personal security as well.

Open-source software provides the ability for many eyes to publicly vet the security of software, particularly when software patch commits are audited by more than one person.  While open-source software may not solve the problem, the open philosophy provides a community for public code review.  Certainly a closed-source backdoor would be more difficult to detect than an open-source backdoor—though I am sure others may debate my argument.

I encourage you to read Bruce Schneier’s most recent Crypto Gram for further discussion on this topic:

The Vulnerabilities Market and the Future of Security

-Eric

Naming “$@” or “$*” as values in Bash

Ever wanted $1 .. $9 to be more meaningful in a clean one-liner?

echo "$*" | ( 
	read cmd  tun_dev tun_mtu link_mtu ifconfig_local_ip ifconfig_remote_ip rest
	echo "the rest of your $cmd program and its arguments $link_mtu"
	echo "go here..."

)

It would be great if one could just

				echo "$@" | read a b c

but since the pipe forks the built-in shell command ‘read’, the variables $a, $b, $c are set in the sub-shell, not the shell you would want. The parenthesis force a subshell for your operation, and while its not pretty, it works quite well!

Also, thanks to Uwe Waldmann for his great Bash/sh/ksh quoting guide.

Cheers,

-Eric

Block Device Replication with rdiff

I’ve written a few articles on rdiff-backup, and if you need an increment history to go back in time, rdiff-backup is your tool. 

But what if you just want to replicate a large block device over the Internet? Well, then we turn to the utility that inspired rdiff-backup: rdiff

For our example, we wil assume you are using LVM to create device snapshots—but really, this could be any snapshot or SAN flash implementation. I’ve just written it for Linux’s LVM.

  • /dev/remote-vg0/source will be the device we are replicating from
  • /dev/local-vg0/dest will be the device we are replicating to
  • remotehost is the system that hosts /dev/remote-vg0/source
  • This script is being executed on the destination system.
# Define our source and destination
# (Note: spaces in these paths could break the script)
SOURCE=/dev/remote-vg0/source
DEST=/dev/local-vg0/dest
SSHUSER=root@remotehost

# Choose a size large enough for the remote write-activity during
# replication
SOURCE_SNAPSHOT_SIZE=4G

# Must be the same size as $DEST, because rdiff writes in sequential
# order (it thinks the destination is an empty file, so it re-writes
# everything.)
#
# See Feb 18, 2011 update notes below.  This can be much smaller now if you 
# use the patch below, since writes are avoided unless necessary.
#DEST_SNAPSHOT_SIZE=50G

# This is probably safe with the librsync patch discussed below
DEST_SNAPSHOT_SIZE=$SOURCE_SNAPSHOT_SIZE

# Enable compression
SSHOPTS='-C'

# 32k I/O buffers, and 16k blocksize.
RDIFF_OPT='-I 32768 -O 32768 -b 16384 -s'

SOURCE_NAME=`basename "$SOURCE"`
SOURCE_SNAP="`dirname $SOURCE`/$SOURCE_NAME-snap"
SOURCE_SNAP_NAME="$SOURCE_NAME-snap"

DEST_NAME=`basename "$DEST"`
DEST_SNAP="`dirname $DEST`/$DEST_NAME-snap"
DEST_SNAP_NAME="$DEST_NAME-snap"

# remove the previous snapshots, if any
ssh $SSHOPTS "$SSHUSER" "lvremove -f '$SOURCE_SNAP'"
lvremove -f "$DEST_SNAP"

# Snapshot the remote host:
ssh $SSHOPTS "$SSHUSER" "lvcreate -s -n '$SOURCE_SNAP_NAME' -L $SOURCE_SNAPSHOT_SIZE '$SOURCE'"

# Snapshot the local destination host:
lvcreate -s -n "$DEST_SNAP_NAME" -L $DEST_SNAPSHOT_SIZE "$DEST"

rdiff $RDIFF_OPT -- signature "$DEST_SNAP" - | \
  ssh $SSHOPTS "$SSHUSER" "rdiff $RDIFF_OPT -- delta - '$SOURCE_SNAP' -" | \
  rdiff $RDIFF_OPT -- patch "$DEST_SNAP" - "$DEST"

# Compare the volumes, if you like
ssh $SSHOPTS "$SSHUSER" "md5sum '$SOURCE_SNAP'"
md5sum $DEST

# cleanup, remove the snapshots.
ssh $SSHOPTS $SSHUSER "lvremove -f '$SOURCE_SNAP'"
lvremove -f "$DEST_SNAP"

This is a convenient single-pipe process for replication, and it uses the librsync rolling-checksum process, using minimal bandwidth on the network that ssh traverses.

Executing this script yields something like this on a 50GB volume; note that the md5sum’s match perfectly.

  Logical volume "source-snap" created
  Logical volume "dest-snap" created
rdiff: signature statistics: signature[3276800 blocks, 16384 bytes per block]
rdiff: loadsig statistics: signature[3276800 blocks, 16384 bytes per block]
rdiff: delta statistics: literal[27842 cmds, 805715968 bytes, 83492 cmdbytes] copy[1462034 cmds, 52881375232 bytes, 372798 false, 10746386 cmdbytes]
rdiff: patch statistics: literal[27842 cmds, 805715968 bytes, 83492 cmdbytes] copy[1462034 cmds, 52881375232 bytes, 0 false, 10746386 cmdbytes]
7fddc578cdbf5f4e30b7f815e72acebd  /dev/local-vg0/dest
7fddc578cdbf5f4e30b7f815e72acebd  /dev/remote-vg0/source-snap
  Logical volume "source-snap" successfully removed
  Logical volume "dest-snap" successfully removed

Since rdiff does not know the destination is a snapshot of the basis-file, it rewrites the whole thing. Indeed, it could simply seek instead of copy from the basis-file, but the stock rdiff tool does not support this. If I write a patch, it will get posted here—and if you write a patch, please let me know! (see update below!)

Until then, keep double the space free in your volume group that you need to run a snapshot and it should work great!

Update: Fri Feb 18 12:39:51 PST 2011 I just wrote patch for rdiff (within the librsync package) that updates in place, by patching the file-stream-sink code in buf.c. Basically, it reads before writing. If the data read in is the same that it would have written, it skips the write and advances the write pointer; otherwise, it writes as normal. Since this avoids writing to the device that had a snapshot except where necessary, much less snapshot-backing-store is required. This code passes all of the ‘make check’ tests that come with librsync, and I believe it to be stable. On my system, rdiff syncs are about 2x faster due to the much reduced write-overhead of the original implementation.

  • The patch is here
  • and the patched code, ready to compile, is here

-Eric

Sparse-file Support for rdiff-backup

Massive LVM snapshots use lots of space on your backup destination. Virtual machine volume images are (often) mostly empty, especially if more disk has been allocated than the VM is currently using. In such a case, it makes sense only to store nonzero blocks of data. 

This is a patch to rdiff-backup 1.2.8 to add sparse file support.
More info is available on the rdiff-backup wiki.

UPDATE [updated Sun Jan 2 19:49:50 PST 2011]
This is an updated (more efficient/faster) patch to support sparse files
I’ve also written a patch that aligns rdiff-blocksizes for files >1GB on “Globals.blocksize” boundaries (currently 1024*128). This works much better for RAID devices than the “square-root” approach for smaller files, as reads are aligned on 128k boundaries instead of 16-byte-aligned boundaries. See the patch for details.

-Eric

BlockFuse to the Rescue: rdiff-backup of LVM Snapshots and Block Devices

Over the years I have used rdiff-backup as an incremental backup solution. It works really well on many platforms, supports files >4GB, ACLs, and much much more.

Unfortunately, rdiff-backup does not support backing up block device content; instead, it will replicate the block device inode’s major/minor numbers on the destination system (doesn’t backup the internal content). If you are backing up all of your root filesystem (/), this is probably what you want. But, what if you’re backing up large virtual machine LVM snapshots?

Not finding a solution on the web, I wrote my own using the Linux FUSE filesystem.

BlockFuse takes two arguments:

	# ./block-fuse
	usage: ./block-fuse /dev/directory /mnt/point

For example:

	# # Take an LVM snapshot:
	# lvm -s /dev/vgBoot/asterisk -n _snap-asterisk -L 1G
	   ...
	# # Mount /dev/mapper as /mnt/block-devices:
	# ./block-fuse /dev/mapper /mnt/block-devices
	# ls -l /mnt/block-devices
	-r-------- 1 root root  10G 2010-12-21 16:07 vgBoot-_snap--asterisk
	-r-------- 1 root root 1.0G 2010-12-21 16:07 vgBoot-_snap--asterisk-cow
	   ...
	# # Perform your backup:
	# rdiff-backup --include '/mnt/block-devices/*_snap*' --exclude '*' \
		/mnt/block-devices \
		/mnt/backup/lvm-snapshots
	# #
	# ls -l /mnt/backup/lvm-snapshots/
	drwx------ 3 root root         4096 2010-12-21 16:19 rdiff-backup-data
	-r-------- 1 root root  10737418240 1969-12-31 16:00 vgBoot-_snap--asterisk
	-r-------- 1 root root   1073741824 1969-12-31 16:00 vgBoot-_snap--asterisk-cow

Thus, rdiff-backup is able to backup block-device content, including LVM snapshots using BlockFuse. BlockFuse is quite simple: it enumerates the content of the mount-source directory, and exports all block devices with non-zero size as a file with 0400 permissions, owned by your fuse user (probably root for this).

Notes:

  • BlockFuse does not support writing, so your data is read-only-safe. In a catastrophic recovery where you cannot restore a snapshot and must recover from rdiff- backup, just use rdiff-backup’s –restore-as-of argument, and ‘dd’ the recovered “file” back onto the original block device.
  • BlockFuse uses the mount-time as the modification time (st_mtime) for the mounted filesystem. This will force rdiff-backup to scan the block devices for changes. Therefore you must unmount and re-mount your BlockFuse filesystem after updating your snapshots. If you do not, rdiff-backup will skip the “files” because their modification timestamp had not changed since the last backup. (It would be easy to write a SIGHUP handler for this, so send me a patch if you do!)

Incidentally, I have this working in production, backing up snapshots as large as 350GB, so this is well tested. Still, this software is TO BE USED AT YOUR OWN RISK! Patches are welcome if you have a novell idea or change to add to BlockFuse.

Wed Dec 22 15:19:06 PST 2010: BlockFuse v0.01 initial release
Tue Dec 21 16:39:41 PST 2010: BlockFuse v0.02 now uses mmap’ed IO!
Tue Jan 14 10:53:54 PST 2014: BlockFuse v0.03 now follows symlinks and supports i386 architectures.

Download BlockFuse v0.03.

2014-01-14: Thank you for your patience waiting for the current version to be uploaded.  If someone would like to maintain BlockFuse and open a public git repo to maintain the package I would greatly appreciate it.

Cheers,

-Eric

Perl script-fu: Downloading UPS Invoices

UPS provides their shipping invoices and tracking history via their website, and you can even download .CSV and .XML files. Unfortunately, they do not have an easy way to automate this process. After scouring the web for something someone else has written, I decided to write my own: 

pull-ups-invoices.pl

Its a terse 100 line program, and your config options are at the top of the script. It depends on WWW::Mechanize, so make sure its installed. Oh, and be forewarned: this script does not validate UPS’s SSL certificate, so I hope you trust your link to UPS 😉

-Eric

A call—and hang-up, from a root-signing CA

Fri Oct 8 13:34:11 PDT 2010

Today I received an interesting call from a well-known Signing CA. One of my SSL certificates was soon to expire, and they called looking for my business. I spoke with them briefly, genuinely interested in their service, and the sales rep hung up on me when I asked a specific question.

You see, certificate authorities (CAs) have “sales spiders” that crawl the web looking for soon-to-expire certificates to renew using a competing CA. There are many out there, and I expressed some of the challenges with Trust on the Internet in my previous post.

When the CA called the first time, I barely missed the call, and immediately called the number back, not knowing who had called. I introduced myself politely and was greeted politely by the the CA sales rep. I could hear the call-center chatter of keyboards and whisper of voices in the background. He explained to me how certificates work, why you would want one from the CA, and I learned something interesting: It is much easier to get a root-signing certificate than I had thought.

There are (depending on who you ask) three “levels” of trust in certificates:

  • Domain Validation (DV)
  • Organization Validation (OV)
  • Extended Validation (EV)

The Padlock: Domain Validation simply verifies that the certificate presented by a server to which you connect. This is the least intensive validation, and also the least expensive. They simply make sure that an email address at that domain (like admin@example.com) can validate that the domain exists, and that the person requesting the SSL certificate actually manages the domain.

The Gold Padlock: Organization Validation is a bit more intense, and you must submit your business entity record (eg, Articles of Incorporation) for verification. The sales rep informed me that some phishing sites have managed to get the “Gold Padlock OV validation” and that one really needs an EV certificate to be trusted on the Internet.

The Green Bar and Padlock: According to the the CA Sales rep, for extended validation, they actually pull a Dunn & Bradstreet report (which costs $45 from D&B) and verify your entities existence, external to the business registration papers that you might submit (depending on the CA) for the OV version of a certificate. This is the certificate the the CA rep wished to sell.

So I asked if the CA issues root-signing certificates, to which he did not understand and began talking about code signing certificates (which are similar, but completely unrelated). When he paused for a moment, I asked again, explaining what I meant: “No”, I said, “I mean—do you offer a root-signing services such that you would offer a signed intermediate ceritificate for large organizations to sign domain names for their organization?”

There was a pause.

He mumbled something to pass time and I heard the clatter of fingers on a keyboard which could only be an uncertain sales rep quickly looking for an answer by posting a question on a sales-rep “IRC” channel for “silently” asking questions of the guys that know the answers when he does not have one himself.

After another moment, he said that intermediate root-signing certificates are available for a $20,000 deposit.

“Great!”, I said, “and what form of validation do you perform to guarantee that the recipient is who they say they are? Is it a more intense validation than the EV validation?” “No, no”, he said: “Intermediate root-signing certificates are only audited at the OV level, perhaps a bit more—but not as intense as the EV”.

So I said: “doesn’t that mean that anyone with $20,000 can get a root-signing certificate and sign what they wish?” He agreed, but said “if they have $20,000 they must be a serious business”. I pushed on: “So, any institution could ‘deposit’ $20,000 for an intermediate signing certificate—that the CA would verify—and that institution could then sign any certificate with any name, effectively enabling a man-in-the-middle attack for any domain?—–
hello?—–hello?—-”

Not even a click. He was gone. I dialed the number back (remember he quickly answered last time) and found some pretty string quartet hold music, but no recording except after 5 minutes, a place to leave your phone number; I did, and I do not expect a call back. Fascinating!

A thought to ponder for the reader: Can EV domains be trusted more than OV domains when an OV-validated intermediate-signed certificate can forge EV certificates? I suspect they can, but I would like to be proven wrong.

-Eric

Circumventing Privacy, the “legal” way.

I strong recommend reading, or at least skimming:

I have known this attack is possible for some time—and have even performed this attack in consentual environments without an intermediate certificate.

These types of attacks are simple to implement in principle, but difficult to execute because root certificates are well controlled by root certificate authorities. This is changing, however, since well-known certificate authorities will provide signed intermediate certificates for a price—or—perhaps a writ from a judge. I will compile CAs providing this service on this blog entry as I find them. For now, this is a start:

With that, I recommend the use of Certificate Patrol to be aware of the problem; Certificate Patrol is also available here.

Update Thu Sep 23 22:53:25 PDT 2010 

I find it interesting that Google is now a CA that my browser trusts:
http://www.gstatic.com/GoogleInternetAuthority/GoogleInternetAuthority.crt

Click here and view the certificate rendered:
https://sandbox.google.com/checkout/view/buy?o=shoppingcart&shoppingcart=

This is not good, bad, or otherwise. It simply shows the state of trust on the Internet. Google can sign any “common name” into existence and have it trusted by all modern browsers.

-Eric