Some time ago I wrote a post on how to get a Cisco IP-Sec VPN working with client certificates on OS-X Lion. Now I upgraded to Mavericks and of course this broke my VPN connection again. Fortunately a friend already had the same problem on Mountain Lion and his fix also worked on Mavericks. To get my connections working on Mavericks I followed the instructions from my previous post: . After that I had to “allow all applications to access this item” on the certificate in Keychain.

Also see the last comment in https://discussions.apple.com/thread/4158642?start=15&tstart=0 saying to allow all access to the cert in Keychain.

I had already set the cert to be always “Trusted” but you have to expand the cert to get to the private key and always “Allow” access to it. It’s a different setting.

See the screenshot below.
VPN cert

One of our customers moved to a new spam filter platform. For this we needed to get an overview of the ip addresses of the mailserver the domains currently used. Instead of doing this manually I created a quick script to do this. It reads the domains from a file (domains.txt) in the current directory and outputs the domains and ip addresses as a tabbed line. This makes it easy to import in Excel.


#!/bin/bash
while read line
do
myline=`echo -n $line | tr -d "\n"`
output=`dig +short $myline mx | sort -n | awk -v pref=65536 '($1<=pref) {pref=$1; print $2}' | dig +short -f - | uniq` myoutput=`echo -n $output | tr -d "\n"` echo -e "$myline \t $myoutput" done < domains.txt

Example:
$ ./findmx.sh > domains-mx-ip.csv

Last week I tried to connect to a Windows 2012 server with the Microsoft Remote Desktop Client (2.1.1). This failed with an error: “You were disconnected from the Widows-based computer because of problems during the licensing protocol.”

Screen Shot 2013-07-23 at 3.51.54 PM

I’ve searched online and some people suggest installing a beta version of the Microsoft RDP Client (version 2.1.2 or 2.12). This is not yet release by Microsoft but available from several sites. Before you try this, make sure the md5sum is consistent with known good versions.

I did try this version, but it did not make any difference. I tried switching to Cord, as this was working for some people. For me, again, it was not. This might be due to the version of Windows I was connecting to, which is Windows 2012 Server with the Licensing server enabled to allow multiple simultaneous logins.

It seems like Windows 2012 is configured by default to use NLA (Network Level Authentication). The only way I could connect from my Mac to this server is by disabling NLA in the group policy on the 2012 server:

Disable the Require user authentication for remote connections by using Network Level Authentication Group Policy setting.

This Group Policy setting is located in Computer Configuration\Administrative Templates\Windows Components\Remote Desktop Services\Remote Desktop Session Host\Security and can be configured by using either the Local Group Policy Editor or the Group Policy Management Console (GPMC). Note that the Group Policy setting will take precedence over the setting configured in Remote Desktop Session Host Configuration or on the Remote tab.

from: http://technet.microsoft.com/en-us/library/cc732713.aspx

other reference: http://frankdenneman.nl/2013/02/13/using-remote-desktop-connection-on-a-mac-switch-to-cord/

I’ve been using Idera (previously R1soft) CDP backup for some time now and am very happy with it. It works fine and sends out a daily email with the backup status. While this is fine for some setups, we use nagios to monitor most components of our infrastructure. There was no nagios check for CDP backups yet. The CDP backup server includes an API that enables you to get the status of the backup policies. Idera even supplies some examples on how to use the API.

With little work I updated one of these examples to a nagios check. This nagios check returns 4 statuses:

  • Unknown: if the check cannot get the status
  • Warning: if one or more policies are in warning
  • Error: if one or more policies are in error
  • OK: if all policies finished successfully

The check also returns the list of policies with their status. So when you view the check details you can easily see which policy is in error.

To run the check you need php-cli with php-soap on your nagios server.

To enable the check for a backup server follow the following steps:

Add the following command to nagios:

define command{
command_name check_r1soft_cdp
command_line php $USER1$/check_r1soft_cdp.php -H $HOSTADDRESS$
}

Add the following service to nagios:

define service {
use generic-service
host_name backup.server.nl
service_description Idera_CDP_Backup
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups critical-admins
notification_interval 240
notification_period workhours
notification_options w,u,c,r
check_command check_r1soft_cdp
}

Make sure to update the check with the correct username and password:

#set CDP user
$USER="admin";
#set CDP user password
$PASS="password";

Please find the check script attached:
check_r1soft_cdp. Rename this file from check_r1soft_cdp.txt to check_r1soft_cdp.php.

Last week we had a failure on a server were two disks in a md raidset reported bad sectors at the same time. This caused the server to lock up and hang. Normally a quick reboot would have solved the problem. In this case the reboot got stuck when mounting the partitions, with the message “Recovering journal”. The server hung there for a long time and nothing happened. This was caused by a corrupt journal. The journal on the ext3 filesystem was probably affected by the bad blocks on the disks.

It proved to be quite difficult to recover from this error. The following steps were needed to get the server to boot normally again:

  1. Remove the needs_filesystemcheck flag if it is enabled on the partition. Otherwise the journal can not be removed: debugfs -w -R “feature ^needs_recovery” /dev/VolGroupXX/LogVolXX
  2. Remove the journal from the partition: tune2fs -f -O ^has_journal /dev/VolGroupXX/LogVolXX
  3. Check the filesystem: fsck -y /dev/VolGroupXX/LogVolXX
  4. Enable journalling again: tune2fs -j /dev/VolGroupXX/LogVolXX
  5. Reboot

Needless to say that when this happens the disks are ready for the bin. Get rid of them as soon as possible! 🙂

As I wrote in my last blog port, I’ve been enabling selinux on some webservers. Last week I updated the Idera CDP agent on one server to support backup and restore of MySQL via the CDP agent. The backup is successful without any issues. Since this integrated MySQL backup was new functionality, I also wanted to test the restore. The restore did not work with selinux enabled. There were a ton of error messages in the audit.log, actually too much to paste in this blog post. I’ve attached the file with error messages cdp-mysql.

To fix the problem I’ve created multiple selinux policies, after the first 4 tries new deny messages appeared in the audit.log. After the fifth version of the policy the restore finished without any error in the log and the database that I dropped and I tried to restore was available and accessible for the sites.

To create the working policy I did the following. I copied the messages that I attached into a separate file (cdp-mysql.se) and used the following command to create a selinux policy:


audit2allow -i cdp-mysql.se -M cdp-mysql

This creates a couple of files (cdp-mysql.pp and cdp-mysql.te) in the current working directory. The cdp-mysql.te contains the plain text policy. The cdp-mysql.pp file can be used to import the selinux policy:


semodule -i cdp-mysql.pp

This activates the cronolog selinux policy that contains the configuration listed below. After this module is activated cronolog is allowed to create directories under the log directory.

module cdp-mysql 1.0;

require {
type bin_t;
type fixed_disk_device_t;
type mysqld_t;
type port_t;
type var_lib_t;
class sock_file { create unlink getattr };
class tcp_socket name_bind;
class chr_file { read write };
class file { write getattr read lock open append };
}

#============= mysqld_t ==============
allow mysqld_t bin_t:file append;
allow mysqld_t fixed_disk_device_t:chr_file { read write };
#!!!! This avc can be allowed using the boolean 'allow_ypbind'

allow mysqld_t port_t:tcp_socket name_bind;
#!!!! The source type 'mysqld_t' can write to a 'file' of the following types:
# mysqld_db_t, hugetlbfs_t, mysqld_tmp_t, mysqld_log_t, mysqld_var_run_t, root_t

allow mysqld_t var_lib_t:file { read write getattr open lock };
allow mysqld_t var_lib_t:sock_file { create unlink getattr };

For those who want to use it I’ve attaced the cdp-mysql.pp module. Make sure to test the md5 checksum (44ec3ec35db17e0adab38ad0ba1fac10 cdp-mysql.pp). You can also recreate the module with the file containing the errors from the audit.log

In the last couple of weeks I’ve started enabling selinux in enforcing mode on my webservers. Besides some expected problems, I encountered a problem with cronolog when I enabled selinux. There was no way that cronolog wanted to create the needed directories. Usually we run cronolog to create separate “Year” and “Month” directories and in every “Month” directory a separate directory and file for each day.

When I started apache with the cronolog log configuration, the following error appeared in the global apache error_log:

piped log program '/usr/sbin/cronolog /var/log/httpd/yyy.xxx.nl/%Y/%m/%d/access_log' failed unexpectedly
/var/log/httpd/wpdev.redbee.nl/2013: Permission denied
piped log program '/usr/sbin/cronolog /var/log/httpd/yyy.xxx.nl/%Y/%m/%d/access_log' failed unexpectedly

There was no year “2013” directory created in the top level log directory /var/log/httpd/yyy.xxx.nl/
As cronolog never gives any problems in that area and file rights were configured correctly, I suspected selinux. And indeed this was the problem.

In /var/log/audit/audit.log the following errors where logged:

type=AVC msg=audit(1361309216.029:25603): avc: denied { create } for pid=10047 comm="cronolog" name="2013" scontext=unconfined_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:httpd_log_t:s0 tclass=dir
type=SYSCALL msg=audit(1361309216.029:25603): arch=c000003e syscall=83 success=no exit=-13 a0=7fff5ebbf050 a1=1fd a2=ffffffffffffffa8 a3=322f6c6e2e656562 items=0 ppid=9699 pid=10047 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=175 comm="cronolog" exe="/usr/sbin/cronolog" subj=unconfined_u:system_r:httpd_t:s0 key=(null)
type=AVC msg=audit(1361309235.635:25604): avc: denied { create } for pid=10048 comm="cronolog" name="2013" scontext=unconfined_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:httpd_log_t:s0 tclass=dir
type=SYSCALL msg=audit(1361309235.635:25604): arch=c000003e syscall=83 success=no exit=-13 a0=7fff4cc1efa0 a1=1fd a2=ffffffffffffffa8 a3=322f6c6e2e656562 items=0 ppid=9699 pid=10048 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=175 comm="cronolog" exe="/usr/sbin/cronolog" subj=unconfined_u:system_r:httpd_t:s0 key=(null)

I copied these messages into a separate file (crono.se) and used the following command to create a selinux policy:


audit2allow -i /home/rogierm/crono.se -M crono

This creates a couple of files (crono.pp and crono.te) in the current working directory. The crono.te contains the plain text policy. The crono.pp file can be used to import the selinux policy:


semodule -i crono.pp

This activates the cronolog selinux policy that contains the configuration listed below. After this module is activated cronolog is allowed to create directories under the log directory.


module crono 1.0;

require {
type httpd_log_t;
type httpd_t;
class dir create;
}

#============= httpd_t ==============
allow httpd_t httpd_log_t:dir create;

Over the years I’ve replaced the hardware on quite a number of broken servers. Sometimes swapping the disks just works and in other occasions it fails and the disks are not detected. This is caused by missing SATA drivers in the initrd. This is easily fixed by booting from a rescue CD and creating a new initrd with the right drivers.

When you boot from a rescue CD you can check the SATA driver that is loaded by doing the following:

root@server1 [~]# lsmod|grep sata
sata_nv 22217 2
libata 105757 1 sata_nv

In this case sata_nv is used. To check if this is available in the initrd on the original disks you have to unpack the initrd that is used for booting. First chroot into the systemimage from the rescue image.

chroot /mnt/sysimage
mkdir /root/temp-initrd
cp /boot/initrd-xxx.img /root/temp-initrd
cd /root/temp-initrd
gunzip < initrd.img | cpio -i --make-directories

In the lib directory that is just unpacked you can see the modules that are included:

root@server1 [~/temp-initrd/lib]# ls
./ ../ dm-mod.ko ext3.ko jbd.ko scsi_mod.ko sd_mod.ko

This means the sata_nv driver is not included. This is causing the boot problems. To fix this we need to rebuild the initrd for the correct kernel with the right drivers:.

mkinitrd --with=sata_nv --with=raid1 /boot/initrd-2.6.x-y.z.1.el5.img 2.6.x-y.z.1.el5

Make sure to specify the right kernel, because if you boot from a rescue CD you are probably running a different kernel then is actually installed on the system you are replacing the disks for.

We’ve been running a local Spamexperts cloud for spamfiltering for some time now. With great pleasure. They monitor the nodes and they actually notice when they are down. However we would also like to know this and alert our on-call staff via our internal monitoring system. We’ve setup general monitoring of the cluster nodes to see if they are available and accepting SMTP connections. Due to the nature of the way Spamexperts offer their service, it is not supported to change anything on the servers themselves. This restricts us in monitoring different parts of the physical servers and the OS. For example, we would like to know, and get alerted, when the load on the servers hits a specific limit and when the mailqueues exceed a certain amount of mails. Spamexperts have created an API call (api_server_status) to get some generic information on the nodes in the cluster. To integrate this into our monitoring setup I’ve created a nagios plugin that reads the output of the api call and checks this against some configurable thresholds. I’ve attached the script check_spamexperts.php to this post.

It just requires php on the nagios server, and it needs an API user with access to the api_server_status call. The script checks the load on the cluster nodes and the incoming and outgoing queues.

Create the API user:
Go to your Spamexperts panel and select the “Software API user” option:

Create a user or check if the existing user has the api_server_status available:

To start using the plugin, save the check_spamexperts.php file to your nagios libexec directory, eg: /usr/local/nagios/libexec.

To enable it add the following to the commands file. Adapt the specific flags to your environment, so use your hostname to access the API (-H), the username and password (-u -p) and the specific load thresholds and maximum queue length.


define command{
command_name check_spamexperts
command_line php $USER1$/check_spamexperts.php -n $HOSTNAME$ -H api.domain.ext -u apiuser -p apipassword -w load5warninglevel -c load5criticallevel -i max_incoming_queue -o max_outgoing_queue
}

Define the individual hosts that are running in your local Spamexperts custer in the nagios hosts.cfg:

define host{
use generic-host ; Name of host template
host_name node1.domain.ext
alias Spamexperts spam cluster
address 1.2.3.4
check_command check-host-alive
contact_groups critical-admins
max_check_attempts 20
notification_interval 60
notification_period 24x7
notification_options d,u,r
}
define host{
use generic-host ; Name of host template
host_name node2.domain.ext
alias Spamexperts spam cluster
address 1.2.3.5
check_command check-host-alive
contact_groups critical-admins
max_check_attempts 20
notification_interval 60
notification_period 24x7
notification_options d,u,r
}

Then create the service check that checks the health of each cluster node.

define service {
use generic-service
host_name node1.domain.ext,node2.domain.ext
service_description spamexperts
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups critical-admins
notification_interval 240
notification_period workhours
notification_options w,u,c,r
check_command check_spamexperts
}

Update 2013-02-28: I’ve updated the driver to include the datastore id in the path, so it is now possible to use the driver with multiple datastores. Also the driver now correctly downloads drivers that are imported from URLs, eg. via the marketplace.

Based on this blog article I created an updated datastore driver that allows you to use a ZFS backend with OpenNebula. This datastore driver implements snapshot functionalitiy to clone images. I have created this driver to be able to start a VM with a persistent disk without having to wait untill the full image file is copied in the datastore.

The driver implements updated versions of the cp, the clone and the rm commands. I don’t use the mkfs command, so I have not implemented this with the ZFS datastore driver yet.

Due to the nature of the OpenNebula datastore layout and the ZFS snapshot capabilities, I had to make a workaround. For the sake of simplicity I dediced to use the filesystem driver as a basis. This means that the images are files in a datastore directory. ZFS snapshots are based on a directory level. To make snapshotting possible I created a link in the datastore location to the image file in a ZFS backed NFS directory.

Please find a short outline based on the blog article, and my own additions. The ZFS datastore driver is available as a tar donwload:zfs-datastore-v1.1

Setup the ZFS part
Install openindiana, create a zfs pool, create all the necessary ZFS volumes and share this as an NFS share with the frontend server.

# zfs create tank/export/home/cloud
# zfs set mountpoint=/srv/cloud tank/export/home/cloud
# zfs create tank/export/home/cloud/images
# chown -R oneadmin:cloud /srv/cloud
# zfs set sharenfs='rw=@93.188.251.125/32' tank/export/home/cloud
# zfs set sharenfs='root=@93.188.251.125/32' tank/export/home/cloud
# zfs allow oneadmin destroy,clone,create,mount,share,sharenfs tank/export/home/cloud

Setup ZFS volume per datastore
For each datastore that you want to host on the ZFS server you have to create a volume and allow oneadmin to manage it. Replace datastoreid with the id of the datastore you will create (eg. 100). Make sure to chown the directory to allow oneadmin to access it.

# zfs create tank/export/home/cloud/datastoreid
# zfs allow oneadmin destroy,clone,create,mount,share,sharenfs tank/export/home/cloud/datastoreid
# chown oneadmin:other /srv/cloud/datastoreid

Install ZFS datastore driver

Unpack the tar file with the driver in the datastore remote directory (/var/lib/one/remotes/datastore/)

$ tar xvf zfs-datastore.tar

Configure the ZFS datastore driver with the correct parameters and make sure passwordless connectivity is possible between the ZFS host and the frontend.

zfs.conf:

ZFS_HOST=10.10.10.3
ZFS_POOL=tank
ZFS_BASE_PATH=/export/home/cloud ## this is the path that maps to /srv/cloud
ZFS_LOCAL_PATH=/srv/cloud ## relative one ZFS_BASE_PATH
ZFS_CMD=/usr/sbin/zfs
ZFS_SNAPSHOT_NAME=golden

Configure OpenNebula to use ZFS datastore driver

Update the datastore configuration in oned.conf with zfs driver. See example below:

DATASTORE_MAD = [
executable = "one_datastore",
arguments = "-t 15 -d fs,vmware,iscsi,zfs"
]

Create new ZFS datastore

When the ZFS stuff is all done, make sure this NFS share is mounted on the frontend. In my example this is mounted on /srv/cloud. Now you can create a datastore with this new driver.

zfstest.conf:

NAME = zfstest
DS_MAD = zfs
TM_MAD = ssh


$ onedatastore create zfstest.conf

Create new image in ZFS datastore

Create conf file to create new image:

# cat /tmp/centos63-5gb.conf
NAME = "Centos63-5GB-zfs"
TYPE = OS
PATH = /home/user/centos63-5gb.img
DESCRIPTION = "CentOS 6.3 5GB image contextualized"

Run oneimage command on the right datastore:

[oneadmin@cloudcontroller1 ~]$ oneimage create -d 100 /tmp/centos63-5gb.conf
ID: 111

The image path to be used to create a snapshot can be found by checking the image details of the newly created image (id 111 in our example):

$ oneimage show 111
[…]
SOURCE : /var/lib/one/datastores/100/4ce405866cf95a4d77b3a9dd9c54fa73
[…]

To use ths image as a golden image, create a snapshot on the ZFS server, so this snapshot can be the basis for the future clones. Instant cloning relies on the relevant ZFS capabilities, which allows creating a new ZFS dataset (clone) based on an existing snapshot. This means the snapshot has to be created first. So after you upload the golden image, manually create a snapshot of this image. This only needs to be done once as this snapshot can be used as many times as possible. This command needs to be run on the ZFS server, not the frontend!


# zfs snapshot tank/export/home/cloud/100/4ce405866cf95a4d77b3a9dd9c54fa73@golden

Now this image can be used for instand cloning.

Example ZFS datastore content on the frontend

[oneadmin@cloudcontroller1 ~]$ cd /var/lib/one/datastores/100/
[oneadmin@cloudcontroller1 100]$ ls -al
total 28
drwxr-xr-x 2 oneadmin oneadmin 4096 Nov 7 02:43 .
drwxr-xr-x 6 oneadmin oneadmin 4096 Oct 15 17:04 ..
lrwxrwxrwx 1 oneadmin oneadmin 76 Oct 16 03:21 25473f081ba733822f3e9ba1df347753 -> /srv/cloud/25473f081ba733822f3e9ba1df347753/25473f081ba733822f3e9ba1df347753
lrwxrwxrwx 1 oneadmin oneadmin 76 Oct 16 02:53 2bf829fedb6e1728f204be8a19ff8f8c -> /srv/cloud/2bf829fedb6e1728f204be8a19ff8f8c/2bf829fedb6e1728f204be8a19ff8f8c
lrwxrwxrwx 1 oneadmin oneadmin 76 Oct 19 17:41 4ce405866cf95a4d77b3a9dd9c54fa73 -> /srv/cloud/4ce405866cf95a4d77b3a9dd9c54fa73/4ce405866cf95a4d77b3a9dd9c54fa73
lrwxrwxrwx 1 oneadmin oneadmin 76 Oct 16 03:24 a00d08dd9b7447818e110115cbc33056 -> /srv/cloud/a00d08dd9b7447818e110115cbc33056/25473f081ba733822f3e9ba1df347753
lrwxrwxrwx 1 oneadmin oneadmin 76 Oct 16 03:18 cab665db977255c4c76c7aa3d687a6d6 -> /srv/cloud/cab665db977255c4c76c7aa3d687a6d6/2bf829fedb6e1728f204be8a19ff8f8c

Example output of ZFS volumes

root@openindiana:/home/rogierm# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 5.68G 361G 45.5K /rpool
rpool/ROOT 1.56G 361G 31K legacy
rpool/ROOT/openindiana 1.56G 361G 1.55G /
rpool/dump 2.00G 361G 2.00G -
rpool/export 133K 361G 32K /export
rpool/export/home 101K 361G 33K /export/home
rpool/export/home/oneadmin 34K 361G 34K /export/home/oneadmin
rpool/export/home/rogierm 34K 361G 34K /export/home/rogierm
rpool/swap 2.12G 362G 133M -
tank 35.3G 1.04T 32K /tank
tank/export 35.3G 1.04T 32K /tank/export
tank/export/home 35.3G 1.04T 31K /tank/export/home
tank/export/home/cloud 35.3G 1.04T 15.3G /srv/cloud
tank/export/home/cloud/1872dba973eb2f13ef745fc8619d7c30 1K 1.04T 5.00G /srv/cloud/1872dba973eb2f13ef745fc8619d7c30
tank/export/home/cloud/25473f081ba733822f3e9ba1df347753 5.00G 1.04T 5.00G /srv/cloud/25473f081ba733822f3e9ba1df347753
tank/export/home/cloud/28e04ccbc4e55779964330a2131db466 1K 1.04T 5.00G /srv/cloud/28e04ccbc4e55779964330a2131db466
tank/export/home/cloud/2bf829fedb6e1728f204be8a19ff8f8c 40.0M 1.04T 40.0M /srv/cloud/2bf829fedb6e1728f204be8a19ff8f8c
tank/export/home/cloud/4ce405866cf95a4d77b3a9dd9c54fa73 5.00G 1.04T 5.00G /srv/cloud/4ce405866cf95a4d77b3a9dd9c54fa73
tank/export/home/cloud/8b86712ae314bc80eef1dfc303740a87 1K 1.04T 5.00G /srv/cloud/8b86712ae314bc80eef1dfc303740a87
tank/export/home/cloud/931492cd32cb96aad3b8dce4869412f3 1K 1.04T 5.00G /srv/cloud/931492cd32cb96aad3b8dce4869412f3
tank/export/home/cloud/a00d08dd9b7447818e110115cbc33056 5.00G 1.04T 5.00G /srv/cloud/a00d08dd9b7447818e110115cbc33056
tank/export/home/cloud/cab665db977255c4c76c7aa3d687a6d6 1K 1.04T 40.0M /srv/cloud/cab665db977255c4c76c7aa3d687a6d6
tank/export/home/cloud/images 5.00G 1.04T 32K /srv/cloud/images
tank/export/home/cloud/images/centos6 5.00G 1.04T 5.00G /srv/cloud/images/centos6
tank/export/home/cloud/one 63K 1.04T 32K /srv/cloud/one
tank/export/home/cloud/one/var 31K 1.04T 31K /srv/cloud/one/var