I use racktables to keep track of our devices and ip space. To prevent duplicate work and differences in naming I wrote (as all sysadmins 😉 ) a script to export a rancid config file from Racktables. To be able to enable or disable configuration backup via Rancid, I created a Dictionary ‘chapter’ called Rancid, with a Yes and No option. I added this Dictionary as an Attribute and mapped this to the Firewall, Router and Switch objects.

Now I can set the Rancid backup from the properties of the object. To create the Rancid config file I created the following script:


#!/usr/bin/perl

use DBI;

$db="racktables";
$host="localhost";
$user="XXXX";
$passwd="XXXX";
$connectinfo="dbi:mysql:$db;$host";
$filename="racktables-rancid-devices.txt";
$dbh = DBI->connect($connectinfo,$user,$passwd);

$query = "select inet_ntoa(IPBonds.ip), RackObject.name from RackObject JOIN AttributeValue JOIN IPBonds ON RackObject.id=AttributeValue.object_id AND AttributeValue.object_id=IPBonds.object_id WHERE AttributeValue.attr_id=10003 AND AttributeValue.uint_value=50030 group by RackObject.name";

open FILE, ">", "$filename" or die $!;

$sth=$dbh->prepare($query);
$sth->execute();
$sth->bind_columns(\$IP, \$Name);
while($sth->fetch()) {
print FILE "# $Name \n$IP:cisco:up\n";
}

$sth->finish();

$dbh->disconnect;

This script creates the rancid ‘router.db’ configuration format. I created a keypair and used ssh-agent to be able to run the following script to copy over the file to our rancid server.


#!/bin/sh
/home/rancid/export-rancid.pl
scp racktables-rancid-devices.txt rancidserver:.
ssh rancidserver 'cp router-manual.db router.db'
ssh rancidserver 'cat racktables-rancid-devices.txt >> router.db'
ssh rancidserver 'mv router.db /usr/local/rancid/var/networking/router.db'

I encountered the an error while experimenting with the OpenNebula (ONE) EC2 interface. I tried to upload an image file, to a OpenNebula host running CentOS 5.3 with ONE 1.3.8. After a couple of seconds the command exited with the following error:

[rogierm@cloudtest3 ~]$ econe-upload /home/rogierm/centos5.img
image /home/rogierm/centos5.img
/usr/local/one/lib/ruby/econe/EC2QueryClient.rb:164:in `http_post': server returned nothing (no headers, no data) (Curl::Err::GotNothingError)
from /usr/local/one/lib/ruby/econe/EC2QueryClient.rb:164:in `upload_image'
from /usr/local/one/bin/econe-upload:116

I informed the ONE developers of this issue on their mailing list and Sebastien Goasguen pointed me to the correct solution. There seems to be an error in the curl implementation on CentOS. I installed the multipart-post gem and executed the econe-upload with the (yet undocumented) switch ‘-M’. This fixed the problem.

Install gem:

[root@cloudtest3 ~]# gem install multipart-post

Run the working econe-upload command:

[rogierm@cloudtest3 ~]$ econe-upload -M /home/rogierm/centos5.img

ZX-Spectrum, my first computer
ZX-Spectrum, my first computer

My collegues at work got me my first computer for my birthday! A fully function Sinclair ZX-Spectrum with tape drive, printer, manuals and lots of games! Not to forget, a couple of old issues of the MCN Magazine. This certainly brings back some good memories 🙂

While experimenting with OpenNebula and trying to build a public cloud with the EC2 interface to OpenNebula I encountered the following problem in the code:

[rogierm@cloudtest3 one]$ econe-upload /home/rogierm/test.img
/usr/lib/ruby/1.8/rdoc/ri/ri_options.rb:53: uninitialized constant RI::Paths (NameError)
from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'
from /usr/lib/ruby/1.8/rdoc/usage.rb:72
from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'
from /usr/local/one/bin/econe-upload:61

I fixed this problem by adding the following line (above the other require statements) in econe-upload, or any other command giving the same error:

require 'rdoc/ri/ri_paths'

OpenQRM uses dropbear for the communication and exchange of messages between the server and the appliances. When something goes wrong in this communication OpenQRM can’t function correctly. It can’t access the applicances for status updates and commands. These communication problems are often caused by a misconfiguration in dropbear. The most seen problem is a misconfiguration in the the public and private dropbear key.

The keys should be synchronized between the server and the appliance. On the server grep the public key with the following command:

[root@localhost log]# /usr/lib/openqrm/bin/dropbearkey -t rsa -f /usr/lib/openqrm/etc/dropbear/dropbear_rsa_host_key -y
Public key portion is:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgwCBvwSO7vBBL2avDMds...pVn root@localhost.localdomain
Fingerprint: md5 65:ca:5b:3b:05:c3:61:6d:fb:75:2f:c0:d2:7e:02:cf

Copy the ssh-rsa public key in /root/.ssh/authorized_keys on the appliance.

Now communication should be established.

OpenQRM event log with example of error message caused by communication problem:

openqrm-cmd-queue ERROR executing command with token 64d478dcac6670e5fb000e7c4954863b : /usr/lib/openqrm/bin/dbclient


Aug 26 23:19:45 localhost httpd: openQRM resource-monitor: (update_info) Processing statistics from resource 2
Aug 26 23:19:48 localhost logger: openQRM-cmd-queu: Running Command with token 64d478dcac6670e5fb000e7c4954863b 1. retry : /usr/lib/openqrm/bin/dbclient -I 0 -K 10 -y -i /usr/lib/openqrm/etc/dropbear/dropbear_rsa_host_key -p 1667 root@192.168.42.243 "/usr/lib/openqrm/bin/openqrm-cmd /usr/lib/openqrm/plugins/xen/bin/openqrm-xen post_vm_list -u openqrm -p openqrm"
Aug 26 23:19:52 localhost logger: openQRM-cmd-queu: ERROR executing command with token 64d478dcac6670e5fb000e7c4954863b 2. retry : /usr/lib/openqrm/bin/dbclient -I 0 -K 10 -y -i /usr/lib/openqrm/etc/dropbear/dropbear_rsa_host_key -p 1667 root@192.168.42.243 "/usr/lib/openqrm/bin/openqrm-cmd /usr/lib/openqrm/plugins/xen/bin/openqrm-xen post_vm_list -u openqrm -p openqrm" -----
Aug 26 23:19:52 localhost logger: Host '192.168.42.243' key accepted unconditionally.
Aug 26 23:19:52 localhost logger: (fingerprint md5 64:d5:c7:8e:7a:11:08:3f:43:bc:3c:2b:bf:4a:c8:ce)
Aug 26 23:19:52 localhost logger: root@192.168.42.243's password: root@192.168.42.243's password: root@192.168.42.243's password: root@192.168.42.243's password: root@192.168.42.243's password: root@192.168.42.243's password: root@192.168.42.243's password: root@192.168.42.243's password: root@192.168.42.243's password: root@192.168.42.243's password: /usr/lib/openqrm/bin/dbclient: connection to root@192.168.42.243:1667 exited: remote closed the connection

OpenQRM uses dropbear for communication between the OpenQRM server and the appliances. Dropbear is basically a simple version of SSH, so it uses host keys which are cached in /root/.ssh/known_hosts. Dropbear uses a different key than sshd, ssh and dropbear share the known_hosts file and ports are not included in this file.

When you ssh once into the appliance from the OpenQRM server the ssh hostkey is cached in the known_hosts file. Now if OpenQRM wants to connect to the appliance, dropbear checks the know_hosts file for the cached hostkey. This contains the ssh hostkey instead of the dropbear hostkey, so dropbear stops the connection because the hostkeys don’t matc which could be caused by a security compromise.

To solve the problem remove the hostkey entry for the host from /root/.ssh/known_hosts.


Aug 24 23:24:26 localhost logger: openQRM-cmd-queu: Running command with token 34b3e7ddd93ffa548d34ccea1e4aa7e5 : /usr/lib/openqrm/bin/dbclient -I 0 -K 10 -y -i /usr/lib/openqrm/etc/dropbear/dropbear_rsa_host_key -p 1667 root@192.168.42.235 "/usr/lib/openqrm/bin/openqrm-cmd openqrm_server_set_boot local 1 00:00:5A:11:21:B7 0.0.0.0"
Aug 24 23:24:26 localhost logger: openQRM-cmd-queu: ERROR while running command with token bc7c6de1b59370dd8019bcae2d7bfa45 : /usr/lib/openqrm/bin/dbclient -I 0 -K 10 -y -i /usr/lib/openqrm/etc/dropbear/dropbear_rsa_host_key -p 1667 root@192.168.42.235 "/usr/lib/openqrm/bin/openqrm-cmd openqrm_server_set_boot local 1 00:00:5A:11:21:B7 0.0.0.0" ----- /usr/lib/openqrm/bin/dbclient: connection to root@192.168.42.235:1667 exited:
Aug 24 23:24:26 localhost logger:
Aug 24 23:24:26 localhost logger: Host key mismatch for 192.168.42.235 !
Aug 24 23:24:26 localhost logger: Fingerprint is md5 65:ca:5b:3b:05:c3:61:6d:fb:75:2f:c0:d2:7e:02:cf
Aug 24 23:24:26 localhost logger: Expected md5 a8:e5:d4:62:36:d2:98:b2:c3:74:a9:0c:d5:d1:56:f9
Aug 24 23:24:26 localhost logger: If you know that the host key is correct you can
Aug 24 23:24:26 localhost logger: remove the bad entry from ~/.ssh/known_hosts

On a new Xen server I encounterd the following error while starting a fully virtualized guest:

[root@resource1 xen]# xm create test-vps.cfg
Using config file "./test-vps.cfg".
VNC= 1
Error: Unable to connect to xend: Name or service not known. Is xend running?

This problem was caused by a problem in the name resolving. I solved this by adding the hostname and ip address of the server in /etc/hosts
After this change the guest booted without problems.

After a yum upgrade of one of our CentOS 5 Xen server, xend would not start properly. The logs contained the following error messages below.
xend-debug.log:

Xend started at Wed Aug 26 18:15:57 2009.
sysctl operation failed -- need to rebuild the user-space tool set?
Exception starting xend: (13, 'Permission denied')

xend.log

[2009-08-26 18:15:57 3310] ERROR (SrvDaemon:347) Exception starting xend ((13, 'Permission denied'))Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 339, in run servers = SrvServer.create() File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvServer.py", line 251, in create root.putChild('xend', SrvRoot()) File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvRoot.py", line 40, in __init__ self.get(name) File "/usr/lib/python2.4/site-packages/xen/web/SrvDir.py", line 82, in get val = val.getobj() File "/usr/lib/python2.4/site-packages/xen/web/SrvDir.py", line 52, in getobj self.obj = klassobj() File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvNode.py", line 30, in __init__ self.xn = XendNode.instance()
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 752, in instance
inst = XendNode()
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 87, in __init__
self.other_config["xen_pagesize"] = self.xeninfo_dict()["xen_pagesize"]
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 741, in xeninfo_dict
return dict(self.xeninfo())
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 685, in xeninfo
info['xen_scheduler'] = self.xenschedinfo()
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 675, in xenschedinfo
sched_id = self.xc.sched_id_get()
Error: (13, 'Permission denied')

After some investigation this was quite easy to solve. The yum upgrade updated the kernel and modified the grub.conf. So after the reboot, the new xen kernel booted. However, this kernel did not match the xen tools installed. This is easily fixed by changing the grub.conf to boot the correct xen kernel. See the examples below for the exact change.

The grub.conf after the yum update that caused the problem:

title CentOS (2.6.18-128.7.1.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.7.1.el5
module /vmlinuz-2.6.18-128.7.1.el5xen ro root=/dev/VolGroup00/LogVol00
module /initrd-2.6.18-128.7.1.el5xen.img

The changed grub.conf after the yum updated that fixed the problem:

title CentOS (2.6.18-128.7.1.el5xen)
root (hd0,0)
kernel /xen.gz-3.3.1
module /vmlinuz-2.6.18-128.7.1.el5xen ro root=/dev/VolGroup00/LogVol00
module /initrd-2.6.18-128.7.1.el5xen.img

There are couple of differences between IPv6 and IPv4 address allocation.

  • The prefix length for an IPv6 subnet will always be /64; no more, no less. It allows you to place as many IPv6 devices as the underlying network medium allows. The 128 bit IPv6 address is automatically created from the /64 bit subnet extended with the 64 bit MAC-address of the NIC.

With IPv4, prefix length varies between subnets to subnets, and it caused painful costs when renumbering subnets (for example, imagine when you renumber an IPv4 subnet from /28 to /29 or vice versa).

  • An ordinary leaf site will always get /48 of address space. This will be sufficient for most small to medium sized networks.

With IPv4, the allocation varies by the size of the site, and made it very painful when you migrated from one ISP to another, for example.

A SAN is often implemented as a dedicated network that is considered to be a secure network. However, the nature of a SAN is that it is a shared network. This involves some serious security risks, that should be evaluated when using an iSCSI based SAN. Some vendors consider an iSCSI network save when it is implemented as a dedicated switches network (Dell EqualLogic. Securing storage area networks with iSCSI. EqualLogic Inc., 2008.). They consider it virtually impossible to snoop or inject packets in a switched network. We all know this is not the case. If this is true, why do we use firewalls, ids and tons of other security measures? Even if iSCSI runs on an isolated network, and only the management interface of the storage devices are connected to a shared/general-purpose network, security is just as good as the hosts that are connected to the dedicated network. A single compromised host connected to the dedicated iSCSI network can attack the storage devices to get access to LUNs for other hosts.

When implementing an iSCSI network you should be aware of the security risks that this imposes on the environment. To estimate the risk, awareness of the methods that can be used to secure iSCSI is paramount. The iSCSI protocol allows for the following security measures to prevent unintended or unauthorized access to storage resources:

  • Authorization
  • Authentication
  • Encryption

Because iSCSI setups are generally shared environments access to the storage elements (LUNs) by unauthorized initiators should be blocked. Authorization is implemented by means of the iQN. The iQN is the initiator node name (iSCSI Qualified Name), this can be seen as a mac-address. During an audit, storage systems must demonstrate controls to ensure that a server under one regime cannot access the storage assets of a server under another.
Typically, iSCSI storage arrays explicitly map initiators to specific target LUNs; an initiator authenticates not to the storage array, but to the specific storage asset it intends to use.

As an added security method, the iSCSI protocol allows initiators and targets to use CHAP to authenticate each other. This prevents simple access by spoofing the iQN. And last, because iSCSI runs on IP, IPSec can be used to secure and encrypt the data flowing between the client (initiator) and the storage server (target).

Now that we know there are multiple ways to secure access to the storage resouces, you might conclude that iSCSI must be safe and secure to use. Unfortunately this is not evident. There are several flaws in the iSCSI security design:

  • iQN’s are trusted, but are easy to spoof, sniff and guessed
  • iSCSI authorization is the only required security method, and this uses only the iQN
  • Authentication is disabled by default
  • Authentication is (mostly) only implemented as CHAP
  • IPSec is difficult to implement

Because iQN’s are manually configured in the iSCSI driver on the client, it is easy to change them. To get access to a LUN that is only protected by a iQN restriction, you can sniff the communication to get the iQN, or guess the iQN as it is often a default string (eg.: iqn.1991-05.com.microsoft.hostname), configure the iscsi driver to use this name and get access to the LUN.

The CHAP protocol is basically the only authentication mechanism that is supported by iSCSI vendors. The protocol allows for other mechanisms like Kerberos. The CHAP protocol is not a protocol know for its strong security on shared networks. The CHAP protocol is vulnerable to dictionary attacks, spoofing, or reflection attacks. Because the security issues with CHAP are well known, the RFC even mentions ways to deal with the limitations of CHAP (http://tools.ietf.org/html/rfc3720#section-8.2.1).

While IPSec could stop or reduce most of the security issues outlined above, it is hard to implement and manage. Therefor not many administrators will feel the need to use it. It should not only be possible to make a secure network, it should also be made easy.

To reduce the risk, and make your iSCSI network as safe as possible, you should do the following:

  • Enable mutual (incoming/outgoing) authentication
  • Follow advice to secure CHAP
  • Enable CRC checksums
  • Do not only rely on iQN for authorization
  • Enable IPSec (if performance allows it)

Also vendors/distributors should enable authentication by default, and add other authentication mechanisms to the iSCSI target and initiator software.

References:
http://www.blackhat.com/presentations/bh-usa-05/bh-us-05-Dwivedi-update.pdf
http://en.wikipedia.org/wiki/ISCSI#Authentication
http://weird-hobbes.nl/reports/iSCSI%20security/