We’ve been running a local Spamexperts cloud for spamfiltering for some time now. With great pleasure. They monitor the nodes and they actually notice when they are down. However we would also like to know this and alert our on-call staff via our internal monitoring system. We’ve setup general monitoring of the cluster nodes to see if they are available and accepting SMTP connections. Due to the nature of the way Spamexperts offer their service, it is not supported to change anything on the servers themselves. This restricts us in monitoring different parts of the physical servers and the OS. For example, we would like to know, and get alerted, when the load on the servers hits a specific limit and when the mailqueues exceed a certain amount of mails. Spamexperts have created an API call (api_server_status) to get some generic information on the nodes in the cluster. To integrate this into our monitoring setup I’ve created a nagios plugin that reads the output of the api call and checks this against some configurable thresholds. I’ve attached the script check_spamexperts.php to this post.

It just requires php on the nagios server, and it needs an API user with access to the api_server_status call. The script checks the load on the cluster nodes and the incoming and outgoing queues.

Create the API user:
Go to your Spamexperts panel and select the “Software API user” option:

Create a user or check if the existing user has the api_server_status available:

To start using the plugin, save the check_spamexperts.php file to your nagios libexec directory, eg: /usr/local/nagios/libexec.

To enable it add the following to the commands file. Adapt the specific flags to your environment, so use your hostname to access the API (-H), the username and password (-u -p) and the specific load thresholds and maximum queue length.


define command{
command_name check_spamexperts
command_line php $USER1$/check_spamexperts.php -n $HOSTNAME$ -H api.domain.ext -u apiuser -p apipassword -w load5warninglevel -c load5criticallevel -i max_incoming_queue -o max_outgoing_queue
}

Define the individual hosts that are running in your local Spamexperts custer in the nagios hosts.cfg:

define host{
use generic-host ; Name of host template
host_name node1.domain.ext
alias Spamexperts spam cluster
address 1.2.3.4
check_command check-host-alive
contact_groups critical-admins
max_check_attempts 20
notification_interval 60
notification_period 24x7
notification_options d,u,r
}
define host{
use generic-host ; Name of host template
host_name node2.domain.ext
alias Spamexperts spam cluster
address 1.2.3.5
check_command check-host-alive
contact_groups critical-admins
max_check_attempts 20
notification_interval 60
notification_period 24x7
notification_options d,u,r
}

Then create the service check that checks the health of each cluster node.

define service {
use generic-service
host_name node1.domain.ext,node2.domain.ext
service_description spamexperts
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups critical-admins
notification_interval 240
notification_period workhours
notification_options w,u,c,r
check_command check_spamexperts
}

Leave a reply