FaceBook RSS Twitter YouTube
MonitoringForge is provided by GWOS
|
|
|
|
|
News
Design Patterns for Email Monitoring

Discussion topic for December:  how to design monitoring best practices for anything related to email.

Read and reply to the blog post.

All of the submitted content when complete will then be added into MonitoringForge wiki, so participate or contribute.


Posted by : taraspaldingPosted on : 12/07/2009 04:42

Comments
ke4qqq           (01/26/2010 01:16)   
I think this is the first contribution for this particular effort, and as such I guess that means I get to stick the flag in the ground with regards to licensing. I wish I could claim credit for having all of these thoughts originate with me, but I have had the fortune to benefit from the knowledge of others. In hopes that someone can use something they learn from me, and knowing that there are others who can improve upon what I write, this work is copyright 2009 - David Nalley <david@gnsa.us> and licensed under a Creative Commons Attribution Share Alike 3.0 unported license. Please feel free to use it , modify it, redistribute it, and improve on it

SMTP is the foundation of virtually all modern e-mail systems. As such it is ubiquitous. That makes SMTP monitoring something that virtually everyone is interested in.
In this document I am assuming that basic host monitoring is already occurring, and that it will be documented elsewhere.

Monitoring is generally divided into two categories - status monitoring and performance monitoring.

Now on to what you want to monitor.

Status monitoring:
  • Is port 25 (or $non-standard-smtp-port) open?
    This will tell you if communications is even possible.
  • Does the daemon respond?
    MTA's generally greet a new connection. For instance, my LUG's MTA sends this:
    220 uclug.org ESMTP Sendmail 8.14.3/8.14.2; Wed, 16 Dec 2009 19:35:46 -0500
    You should check that the daemon responds to connection attempts as you expect.
  • Does mail actually get accepted?
    There are a number of conditions under which an MTA will behave normally, and yet reject mail at the conclusion of a session. You should be actually sending email and ensuring that it gets queued up as part of your testing.
  • Does mail actually get delivered?
    End to end tests are a bit more complex and have far more chance to be a false positive (at least from the SMTP monitoring perspective) That said, there is no substitute for checking the entire mail path. Doing it right means that you are testing external origination resulting in internal delivery and internal origination resulting in external delivery. This means that you'll at least need a box outside the network that your mailserver is on, and something to receive mail externally as well (gmail or other $freemailservice will work fine in most cases). This is even more important if you have email gateways, multiple SMTP servers serving as your mail server, or spam appliances. If those are the case you need to also monitor each of the possible paths mail can take. This means you'll be checking email at both of the delivery points. To do this right, you need to have a way of testing that you indeed receive every email that is sent.

These are all network-based tests, but there are also process and OS tests that are beneficial.

  • Is the daemon up?
    This is the most basic, and is just a simple heartbeat type test. Additionally if something else is in your SMTP flow you'll want to check that it is actually up as well. So, for instance, I use Postfix and MailScanner on one of my mail hosts. If either is dead, mail isn't going to flow, so make sure that you check all of the processes in the mail flow.
  • Syslog
    You should also be sending your maillog to a centralized syslog server. The level of detail is generally implementation specific, and may even change for specific instances, but at a minimum you should be sending errors and above.
Performance Monitoring:
Performance based monitoring is often regarded as a bit too much, and often superficial. However, performance monitoring can often be predictive and tell you that you are rapidly running out of capacity, or provide baseline for future planning. In addition, if you start having problems that aren't catastrophic can often been seen with performance data.

With performance monitoring you are largely monitoring the same things as in status monitoring, you are just monitoring them differently. So in addition to monitoring whether a process exists or not, you would look at memory and CPU consumption as well as number of processes active.

  • Process count
  • CPU Consumption of the process
  • Memory Consumption of the process

With mail you should also be monitoring mail queues and statistics. Specifically you want to monitor:

  • Rate of sending email
  • Rate of receiving email
  • Rate of 4xx errors
    Since 4xx errors tend to be non-fatal, it's generally acceptable to group all 4xx and all 5xx errors together.
  • Rate of 5xx errors
    5xx errors are fatal, and are something that should be monitored separately.
  • Number of messages in the inbound queue
  • Number of messages in the outbound queue
    In addition you will want to monitor any other queues you have (spam, quarantine, etc).

You need to monitor all of the above in advance of problems so as to get a decent baseline for what is normal in your environment.

You also want to track the the length of time it takes for a message to make it roundtrip. Since you are already sending messages the entire length of your mailflow it's relatively easy to tack on measuring the length of time from sending to receiving. Do keep in mind though that there may be false positives showing up since delays and failures in the network external to you will influence these statistics. Particularly if you use a third party service, such as an external mail service there could be failures that are completely opaque to you.

 
 
 
Recent News
 
 
About MonitoringForge
Community Guidelines
Terms of Use
Feedback
Contact us
Network Monitoring Software

Powered by Essentia
 
IMPORTANT DISCLAIMER: This website includes trademarks of third parties that are not affiliated with or sponsors of GroundWork Open Source, Inc. Any such third party trademarks are used for identification purposes only. Please see the Terms of Use for additional information regarding use of trademarks on this website. Nagios is a registered trademark of Nagios Enterprises.