Several blogs and manuals with examples on kvm or xen setups use NFS as storage backend. Mostly they state that for production use iSCSI is recommended. However there are examples where NFS is part of the architecture, eg. OpenNebula. I tried to find specific statistics on the performance differences between NFS, iSCSI and local storage. During this search I encountered some pointers that NFS and Xen is not a good combination, but never a straight comparison.

I decided to invest some time and setup a small test environment and run some bonnie++ statistics. This is not a scientific designed experiment, but a test to show the differences between the platforms. Two test platforms are setup, 1 with a Xen server (DL360G6) (xen1) and a 12 disk SATA storage server (storage1), and another with a KVM server (DL360G5) (kvm1) and a 2 disk SATA storage server (storage2) . Both servers are connected with a gigabit network. I’ve also run a test with a 100mb/s network between the kvm1 and storage2 server. For reference I’ve also done tests with the images on localdisk.

I realize that LVM and iSCSI storage is most efficient, but storage with image files is very convenient and in case of cloud setups sometimes the only option.

Seq output Seq input Random
Per Chr Block Rewrite Per Chr Block Seeks
Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
Xen-guest-via-nfs-tapaio 1G 3570 5 2436 0 1366 0 26474 41 24831 0 6719.0 1
xen-guest-via-iscsi 1G 25242 40 12071 1 15175 0 32071 42 47742 0 7331.3 1
kvm-guest-nfs-1gb-net 1G 8140 16 17308 3 11864 2 40861 81 71711 3 2126.6 54
kvm-guest-nfs-qcow-100mb 1G 1922 3 9874 1 3994 0 10720 22 10441 0 595.4 33
kvm-guest-nfs-qcow-100mb-2nd 1G 9735 21 2039 0 3197 0 10729 22 10463 0 685.3 38
kvm-guest-nfs-qcow-100mb-3rd 1G 5327 10 7378 1 4421 0 10655 18 10512 0 706.3 39
xenserver-nfsmount 1G 41507 60 60921 7 29687 1 33427 48 64147 0 4674.4 11
kvmserver-nfs-1G 20G 31158 52 32044 17 10749 2 19152 28 18987 1 90.3 1
localdisk-on-nfs-server-cloudtest3 4G 41926 65 43805 7 18928 3 52943 72 56616 3 222.6 0

The  conclusion of the tests is that local storage is fastest. NFS storage with Xen is not a good combination. Xen runs best with iSCSI backed storage. KVM with NFS runs significantly better. It is safe to say that if you want to use NFS use it with KVM, not with Xen. In any case iSCSI is always the best option for Xen. I have not yet tested KVM with iSCSI but I expect this to perform better than NFS.

Libvirt is a toolkit to interact with several virtualization platform from a single interface. Considering you can stop and start virtual machines through this API, security is quite important. Libvirt offers several options to give authenticated access from remote machines. By default most distributions disable remote network access for libvirtd. However, I would like to access libvirtd on some of my KVM servers from a single management host to gather some information. The documentation on how to set this up is not too good, so I decided to write up a short how-to.

Step 1: Enable network access for libvirtd
First enable network access for libvirtd on the KVM server(s). On CentOS/RHEL this is done by uncommenting or adding the following line in /etc/sysconfig/libvirtd:

LIBVIRTD_ARGS="--listen"

Step 2: Install a CA on the management server
Install the Perl certificate tools:

yum install openssl-perl

Create Certificate authority:

cd /etc/pki/tls/misc/
./CA.pl -newca

Example output:

./CA.pl -newca
CA certificate filename (or enter to create)

Making CA certificate ...
Generating a 1024 bit RSA private key
..........++++++
.............++++++
writing new private key to '../../CA/private/cakey.pem'
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:XX
State or Province Name (full name) [Berkshire]:XX
Locality Name (eg, city) [Newbury]:XXXXX
Organization Name (eg, company) [My Company Ltd]:XXXXX
Organizational Unit Name (eg, section) []:XXXX
Common Name (eg, your name or your server's hostname) []:CA XXX XXX
Email Address []:XXX

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
Using configuration from /etc/pki/tls/openssl.cnf
Enter pass phrase for ../../CA/private/cakey.pem:
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number:
d8:95:24:xx:xx:xx:13:9b
Validity
Not Before: Feb 25 23:14:08 2010 GMT
Not After : Feb 24 23:14:08 2013 GMT
Subject:
countryName = XX
stateOrProvinceName = XX
organizationName = XXXX
organizationalUnitName = XXXX
commonName = CA XXX XXX
emailAddress = XXXXX
X509v3 extensions:
X509v3 Subject Key Identifier:
XXX
X509v3 Authority Key Identifier:
keyid:XXXX
DirName:/C=XX/ST=XX/O=XXX/OU=XXXX/CN=CA XXX XXX/emailAddress=XXX
serial:XXX

X509v3 Basic Constraints:
CA:TRUE
Certificate is to be certified until Feb 24 23:14:08 2013 GMT (1095 days)

Write out database with 1 new entries
Data Base Updated

Step 3: Create CSR’s

openssl genrsa -des3 -out kvm-server1.tmp
openssl rsa -in kvm-server1.tmp -out kvm-server1.key
openssl genrsa -des3 -out mgmt-host.tmp
openssl rsa -in mgmt-host.tmp -out mgmt-host.key
openssl req -new -key kvm-server1.key -out kvm-server1.csr
openssl req -new -key mgmt-host.key -out mgmt-host.csr

Step 4: Sign the certificates

openssl ca -config /etc/pki/tls/openssl.cnf -policy policy_anything -out /root/mgmt-host.crt -infiles /root/mgmt-host.csr
openssl ca -config /etc/pki/tls/openssl.cnf -policy policy_anything -out /root/kvm-server1.crt -infiles /root/kvm-server1.csr

Example output:

Using configuration from /etc/pki/tls/openssl.cnf
Enter pass phrase for /etc/pki/CA/private/cakey.pem:
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number:
d8:95:24:4b:4e:b1:13:9c
Validity
Not Before: Feb 25 23:31:40 2010 GMT
Not After : Feb 25 23:31:40 2011 GMT
Subject:
countryName = XX
stateOrProvinceName = XX
localityName = XX
organizationName = XX
organizationalUnitName = XX
commonName = mgmt-host.xxx.nl
emailAddress = xxxxx
X509v3 extensions:
X509v3 Basic Constraints:
CA:FALSE
Netscape Comment:
OpenSSL Generated Certificate
X509v3 Subject Key Identifier:
6C:EA:8B:C1:D6:XX:B6:6B:5B:18:02
X509v3 Authority Key Identifier:
keyid:C9:36:4A:XXXX:6F:FD:2E:86

Certificate is to be certified until Feb 25 23:31:40 2011 GMT (365 days)
Sign the certificate? [y/n]:y

1 out of 1 certificate requests certified, commit? [y/n]y
Write out database with 1 new entries
Data Base Updated

Step 5: Copy over the certificates to the correct location
On the management host (mgmt-host):

mkdir /etc/pki/libvirt
mkdir /etc/pki/libvirt/private
mkdir /etc/pki/libvirt-vnc

cp /root/mgmt-host.key /etc/pki/libvirt/private/clientkey.pem
cp /root/mgmt-host.key /etc/pki/libvirt-vnc/clientkey.pem
cp /root/mgmt-host.crt /etc/pki/libvirt/clientcert.pem
cp /root/mgmt-host.crt /etc/pki/libvirt-vnc/clientcert.pem

Transfer the key and certificate files to the KVM server (kvm-server1). Ideally, you create the key and CSR on the host itself, so you only have to transfer the certificate. Then, copy the certificates and CA to the correct location on the KVM (libvirtd) server:


mkdir /etc/pki/libvirt
mkdir /etc/pki/libvirt/private
mkdir /etc/pki/libvirt-vnc

cp kvm-server1.key /etc/pki/libvirt/private/serverkey.pem
cp kvm-server1.key /etc/pki/libvirt-vnc/server-key.pem

cp kvm-server1.crt /etc/pki//libvirt/servercert.pem
cp kvm-server1.crt /etc/pki/libvirt-vnc/server-cert.pem

Make sure the CA generated on the management server is available on the KVM server in the following file:
/etc/pki/CA/cacert.pem

Step 6: Reload libvirtd

/etc/init.d/libvirtd reload

Step 7: Test
With these certificates setup, you should be able to access libvirtd on kvm-server1 from mgmt-host. Use the following command to test:

virsh -c qemu://kvm-server1.xxxx.nl/system
Welcome to virsh, the virtualization interactive terminal.

Type: 'help' for help with commands
'quit' to quit

virsh #

Use the list command to see a list of running guests on the server. This only works if these guests have also been created via libvirtd. Manually started KVM guests will not show up in this list.

I’ve made some quick changes to ONEMC to show the VNC port in the interface. I’ve updated the template that onemc creates with a GRAPHICS section. This enables vnc on the quest.

As a workaround until ONE can use the VMID in the graphics section, I use a virsh command to get the vncport. To get this working the webserver user should be allowed to execute the virsh command via sudo. Add the following to sudoers:

apache ALL=(ALL) NOPASSWD: /usr/bin/virsh *

Also I encountered some problems with the model section in the KVM template so I commented that out as well.

Below the patch and a screenshot listing the vnc ports in ONEMC
ONEMC screenshot
onemc_funcs.patch

I encountered the an error while experimenting with the OpenNebula (ONE) EC2 interface. I tried to upload an image file, to a OpenNebula host running CentOS 5.3 with ONE 1.3.8. After a couple of seconds the command exited with the following error:

[rogierm@cloudtest3 ~]$ econe-upload /home/rogierm/centos5.img
image /home/rogierm/centos5.img
/usr/local/one/lib/ruby/econe/EC2QueryClient.rb:164:in `http_post': server returned nothing (no headers, no data) (Curl::Err::GotNothingError)
from /usr/local/one/lib/ruby/econe/EC2QueryClient.rb:164:in `upload_image'
from /usr/local/one/bin/econe-upload:116

I informed the ONE developers of this issue on their mailing list and Sebastien Goasguen pointed me to the correct solution. There seems to be an error in the curl implementation on CentOS. I installed the multipart-post gem and executed the econe-upload with the (yet undocumented) switch ‘-M’. This fixed the problem.

Install gem:

[root@cloudtest3 ~]# gem install multipart-post

Run the working econe-upload command:

[rogierm@cloudtest3 ~]$ econe-upload -M /home/rogierm/centos5.img

OpenQRM uses dropbear for communication between the OpenQRM server and the appliances. Dropbear is basically a simple version of SSH, so it uses host keys which are cached in /root/.ssh/known_hosts. Dropbear uses a different key than sshd, ssh and dropbear share the known_hosts file and ports are not included in this file.

When you ssh once into the appliance from the OpenQRM server the ssh hostkey is cached in the known_hosts file. Now if OpenQRM wants to connect to the appliance, dropbear checks the know_hosts file for the cached hostkey. This contains the ssh hostkey instead of the dropbear hostkey, so dropbear stops the connection because the hostkeys don’t matc which could be caused by a security compromise.

To solve the problem remove the hostkey entry for the host from /root/.ssh/known_hosts.


Aug 24 23:24:26 localhost logger: openQRM-cmd-queu: Running command with token 34b3e7ddd93ffa548d34ccea1e4aa7e5 : /usr/lib/openqrm/bin/dbclient -I 0 -K 10 -y -i /usr/lib/openqrm/etc/dropbear/dropbear_rsa_host_key -p 1667 root@192.168.42.235 "/usr/lib/openqrm/bin/openqrm-cmd openqrm_server_set_boot local 1 00:00:5A:11:21:B7 0.0.0.0"
Aug 24 23:24:26 localhost logger: openQRM-cmd-queu: ERROR while running command with token bc7c6de1b59370dd8019bcae2d7bfa45 : /usr/lib/openqrm/bin/dbclient -I 0 -K 10 -y -i /usr/lib/openqrm/etc/dropbear/dropbear_rsa_host_key -p 1667 root@192.168.42.235 "/usr/lib/openqrm/bin/openqrm-cmd openqrm_server_set_boot local 1 00:00:5A:11:21:B7 0.0.0.0" ----- /usr/lib/openqrm/bin/dbclient: connection to root@192.168.42.235:1667 exited:
Aug 24 23:24:26 localhost logger:
Aug 24 23:24:26 localhost logger: Host key mismatch for 192.168.42.235 !
Aug 24 23:24:26 localhost logger: Fingerprint is md5 65:ca:5b:3b:05:c3:61:6d:fb:75:2f:c0:d2:7e:02:cf
Aug 24 23:24:26 localhost logger: Expected md5 a8:e5:d4:62:36:d2:98:b2:c3:74:a9:0c:d5:d1:56:f9
Aug 24 23:24:26 localhost logger: If you know that the host key is correct you can
Aug 24 23:24:26 localhost logger: remove the bad entry from ~/.ssh/known_hosts

On a new Xen server I encounterd the following error while starting a fully virtualized guest:

[root@resource1 xen]# xm create test-vps.cfg
Using config file "./test-vps.cfg".
VNC= 1
Error: Unable to connect to xend: Name or service not known. Is xend running?

This problem was caused by a problem in the name resolving. I solved this by adding the hostname and ip address of the server in /etc/hosts
After this change the guest booted without problems.

After a yum upgrade of one of our CentOS 5 Xen server, xend would not start properly. The logs contained the following error messages below.
xend-debug.log:

Xend started at Wed Aug 26 18:15:57 2009.
sysctl operation failed -- need to rebuild the user-space tool set?
Exception starting xend: (13, 'Permission denied')

xend.log

[2009-08-26 18:15:57 3310] ERROR (SrvDaemon:347) Exception starting xend ((13, 'Permission denied'))Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 339, in run servers = SrvServer.create() File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvServer.py", line 251, in create root.putChild('xend', SrvRoot()) File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvRoot.py", line 40, in __init__ self.get(name) File "/usr/lib/python2.4/site-packages/xen/web/SrvDir.py", line 82, in get val = val.getobj() File "/usr/lib/python2.4/site-packages/xen/web/SrvDir.py", line 52, in getobj self.obj = klassobj() File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvNode.py", line 30, in __init__ self.xn = XendNode.instance()
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 752, in instance
inst = XendNode()
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 87, in __init__
self.other_config["xen_pagesize"] = self.xeninfo_dict()["xen_pagesize"]
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 741, in xeninfo_dict
return dict(self.xeninfo())
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 685, in xeninfo
info['xen_scheduler'] = self.xenschedinfo()
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 675, in xenschedinfo
sched_id = self.xc.sched_id_get()
Error: (13, 'Permission denied')

After some investigation this was quite easy to solve. The yum upgrade updated the kernel and modified the grub.conf. So after the reboot, the new xen kernel booted. However, this kernel did not match the xen tools installed. This is easily fixed by changing the grub.conf to boot the correct xen kernel. See the examples below for the exact change.

The grub.conf after the yum update that caused the problem:

title CentOS (2.6.18-128.7.1.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.7.1.el5
module /vmlinuz-2.6.18-128.7.1.el5xen ro root=/dev/VolGroup00/LogVol00
module /initrd-2.6.18-128.7.1.el5xen.img

The changed grub.conf after the yum updated that fixed the problem:

title CentOS (2.6.18-128.7.1.el5xen)
root (hd0,0)
kernel /xen.gz-3.3.1
module /vmlinuz-2.6.18-128.7.1.el5xen ro root=/dev/VolGroup00/LogVol00
module /initrd-2.6.18-128.7.1.el5xen.img

In general it is a good idea to configure password aging as part of your password/security policy. In some cases however, this might cause unexpected problems. I’ve seen cases where an expired password prevented a machine from booting. In this specific case this was caused by a service that ran as the user with the expired password. In general you should not run services as a normal user account, but sometimes you just have to deal with things you can’t change. Generally the documentation states that to disable password aging you have to edit the /etc/shadow file, and remove the part where the password age is stored. This is quite error prone. If you do it this way, be sure to use vipw to prevent errors in this critical file. To disable password aging I recommend just using the command to enable it as well:

# chage -m 0 -M 99999 -E -1 username

Check the before and after:

# chage -l username
Minimum: 7
Maximum: 90
Warning: 7
Inactive: -1
Last Change: Jun 26, 2009
Password Expires: Sep 24, 2009
Password Inactive: Never
Account Expires: Never

After disabling password aging:

# chage -l username
Minimum: 0
Maximum: 99999
Warning: 7
Inactive: -1
Last Change: Jun 26, 2009
Password Expires: Never
Password Inactive: Never
Account Expires: Never

As a note, please only disable password aging when there is no other way to fix the problem.