I’ve been using Idera (previously R1soft) CDP backup for some time now and am very happy with it. It works fine and sends out a daily email with the backup status. While this is fine for some setups, we use nagios to monitor most components of our infrastructure. There was no nagios check for CDP backups yet. The CDP backup server includes an API that enables you to get the status of the backup policies. Idera even supplies some examples on how to use the API.

With little work I updated one of these examples to a nagios check. This nagios check returns 4 statuses:

  • Unknown: if the check cannot get the status
  • Warning: if one or more policies are in warning
  • Error: if one or more policies are in error
  • OK: if all policies finished successfully

The check also returns the list of policies with their status. So when you view the check details you can easily see which policy is in error.

To run the check you need php-cli with php-soap on your nagios server.

To enable the check for a backup server follow the following steps:

Add the following command to nagios:

define command{
command_name check_r1soft_cdp
command_line php $USER1$/check_r1soft_cdp.php -H $HOSTADDRESS$
}

Add the following service to nagios:

define service {
use generic-service
host_name backup.server.nl
service_description Idera_CDP_Backup
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups critical-admins
notification_interval 240
notification_period workhours
notification_options w,u,c,r
check_command check_r1soft_cdp
}

Make sure to update the check with the correct username and password:

#set CDP user
$USER="admin";
#set CDP user password
$PASS="password";

Please find the check script attached:
check_r1soft_cdp. Rename this file from check_r1soft_cdp.txt to check_r1soft_cdp.php.

We’ve been running a local Spamexperts cloud for spamfiltering for some time now. With great pleasure. They monitor the nodes and they actually notice when they are down. However we would also like to know this and alert our on-call staff via our internal monitoring system. We’ve setup general monitoring of the cluster nodes to see if they are available and accepting SMTP connections. Due to the nature of the way Spamexperts offer their service, it is not supported to change anything on the servers themselves. This restricts us in monitoring different parts of the physical servers and the OS. For example, we would like to know, and get alerted, when the load on the servers hits a specific limit and when the mailqueues exceed a certain amount of mails. Spamexperts have created an API call (api_server_status) to get some generic information on the nodes in the cluster. To integrate this into our monitoring setup I’ve created a nagios plugin that reads the output of the api call and checks this against some configurable thresholds. I’ve attached the script check_spamexperts.php to this post.

It just requires php on the nagios server, and it needs an API user with access to the api_server_status call. The script checks the load on the cluster nodes and the incoming and outgoing queues.

Create the API user:
Go to your Spamexperts panel and select the “Software API user” option:

Create a user or check if the existing user has the api_server_status available:

To start using the plugin, save the check_spamexperts.php file to your nagios libexec directory, eg: /usr/local/nagios/libexec.

To enable it add the following to the commands file. Adapt the specific flags to your environment, so use your hostname to access the API (-H), the username and password (-u -p) and the specific load thresholds and maximum queue length.


define command{
command_name check_spamexperts
command_line php $USER1$/check_spamexperts.php -n $HOSTNAME$ -H api.domain.ext -u apiuser -p apipassword -w load5warninglevel -c load5criticallevel -i max_incoming_queue -o max_outgoing_queue
}

Define the individual hosts that are running in your local Spamexperts custer in the nagios hosts.cfg:

define host{
use generic-host ; Name of host template
host_name node1.domain.ext
alias Spamexperts spam cluster
address 1.2.3.4
check_command check-host-alive
contact_groups critical-admins
max_check_attempts 20
notification_interval 60
notification_period 24x7
notification_options d,u,r
}
define host{
use generic-host ; Name of host template
host_name node2.domain.ext
alias Spamexperts spam cluster
address 1.2.3.5
check_command check-host-alive
contact_groups critical-admins
max_check_attempts 20
notification_interval 60
notification_period 24x7
notification_options d,u,r
}

Then create the service check that checks the health of each cluster node.

define service {
use generic-service
host_name node1.domain.ext,node2.domain.ext
service_description spamexperts
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups critical-admins
notification_interval 240
notification_period workhours
notification_options w,u,c,r
check_command check_spamexperts
}