Performance Evaluation
Methodology
Tools
Configuration files
Results
Conclusions
Appendix A - Estimation of the number of entities in the system
Thanks - 8-)

Performance Evaluation

This evaluation was done in 2005, with architectures available at those days. So it's outdated.

With recent computers and software, values are much better, but they shall be evaluated again.

Benchmarking something is difficult. The main reason is the impossibility of setting up a configuration and an environment corresponding to the real world conditions. As with any other benchmark, results here may not match real world but are intended to give an idea of the performance level which can be achieved by this filter.

The goal of this benchmark is to provide an estimate on the upper bound of email throughput of the couple sendmail/ze-filter running on a Sun T2000 and on a Dell 1850 computers, with few or only realistic tuning.

Methodology

Test conditions used here are much different of real-world conditions, as involved computers are as “near” as possible from the others. This is intended to reduce all kind of external latencies (DNS query delays, …).

This approach must be considered with care, as external network latencies can't be neglected in real-world. In fact, the number of idle sleeping threads and processes in the server are directly related to latencies, and a high number of idle entities have direct impact on the operating system performance (scheduling and so…).

The computer running the SMTP source program should be running far from its limits. This limits the impact of problems on the SMTP source on the benchmark.

Configuration and tuning are limited to those things usually found in real world mail servers.

Each experiment is done at least twice (eventually more), to confirm that the qualitative results are coherent. If this is the case, the best one is taken. If not, experiment is repeated till the moment we can see some repetitive pattern.

Content filtering experiments

Only two features were evaluated : detection of XFILES (messages with attached executable files) and URL filtering, as they have some controlled behaviour. Pattern matching isn't tested as it depends highly on how it's configured and has to be avoided on big servers. This corresponds to four filter configurations (ordered by increading load) :

NOOP - no filtering features enabled. But the filter isn't transparent : it does some statistics about its activity and create a temporary file with the message contents (if the transaction wasn't rejected before the SMTP DATA command).
XFILES - the filter looks for executable attached files in the messsages.
URLBL + XFILES - both features enabled.
URLBL - the filter checks URLs found in the message against a blacklist of URLs.

As there were no behaviour filtering feature enabled, all connections and messages are entirely handled. In other words, as no connection is rejected before the DATA command, a temporary file is allways created and we're not saving disk access : results are worst than when doing behavioural filtering (connection rate limit, greylisting, …).

Various levels of input message rate

The goal here is to evaluate the filter behaviour at different levels of input message rate, and what could be some upper bound of input message rate.

To evaluate this, messages were sent to the computer running the filter at constant rate during some time, long enough to be in a stationnary situation. This experiment was repeated with increasing message rates, to find a point where some qualitative or quantitavie abnormal behaviour is observed. Some examples of abnormal behaviour are :

output message rate is lower than input message rate,
CPU load too high,
SMTP server begins rejecting connections from the source,
message handling time becomes too important, …
increasing number of open connections

From queueing systems a stability condition is usually evaluated by the condition :
occupation rate = arrival rate X service time / number of servers < 1

For this experiment, we want to evaluate how some parameters vary with the input message rate. These parameters are :

Message handling time - this is measured by the filter itself which keeps track of the time spent by each callback for each connection.
Disk activity - this is the result of iostat sampling at intervals of 5 seconds
CPU load - this is the result of iostat sampling at intervals of 5 seconds.
An estimate of the number of “entities” in the system under test : sendmail processes and filter threads, as explained at Appendix A.

High and sustained input message rate

The goal here is to evaluate the filter stability when a high input message rate is applied long enough. Input message rate is selected around the maximum of the previous series, but the system should remain in a stable steady state and no message/connection must be rejected by the mail server under test by reasons such as high load or time outs. The duration of each experiment is 20 minutes.

Some parameters, such as effectively handled message rate, will be evaluated as the mean over a 10 minutes time window centered on the duration of the experiment. This is considered as the “steady state” part of the experiment.

Behaviour filtering experiments

The behaviour feature evaluated is greylisting. Behaviour filtering is much more efficient as it doesn't handle the message body : connection may be rejected before the SMTP DATA command - so no temporary spool file is created.

Evaluation of this behaviour feature is interesting as it may do disk and/or network access for its database queries. It's less performant than connection rate control, which depends only on data stored in memory.

Two configurations were used :

STANDALONE greylisting - In this mode, the filter maintains all data in it database.
CLIENT/SERVER greylisting - in this mode, the filter maintains its own greylisting database and, when the wanted information isn't in its database, it queries a “grey server”. This is the configuration used in a cluster or when synchronising MXs is a requirement.

As before, a series of experiments was done with increasing message rates. The filter behaviour at each message rate was observed and the experiment stopped when some abnormal behaviour indicating the limit on the handling capacity was observed.

SMTP source

Although there are some SMTP traffic generators (like Mozilla mstone), it was simpler and faster to build a simple one, based on an opensource message submit program (mini_sendmail). But our home made tool can't generate a really huge traffic (something like more than 100 messages per second. Some time in the futur we intend to remake all these tests with a more performant tool.

We used a pool of 180 messages (spams, virus and normal messages), whose size varies from 1 KByte to 50 KBytes.

The SMTP source works as follow : The message source consists of a pool of 180 real messages (normal messages, spams and virus) whose size varies from 1 KBytes to 50 KBytes. For some given message rate (say R), “R” messages are choosen at random, from the pool of messages, each second and “R” SMTP clients are launched to send them. Each connection corresponds to a single message and a single recipient.

REMARK - as all tests here are done within ranges where all messages or connections are handled with normal behaviour and each connection corresponds to a single connection and a single recipient, the connection rate and the message rate correspond to the same value.

Target

T.B.D.

Tools

sendmail 8.13.5
libmilter - this is a modified version of libmilter, using a pool of workers instead of one thread per SMTP connection.
ze-filter ze-filter 1.9.0 Snapshot 060303
msg-source - A source of messages with “controlled” message rate using a modified version of mini_sendmail

Configuration files

sendmail.mc - configuration file for sendmail
ze-filter.cf - configuration file for ze-filter

Results

T2000 - Sun T2000 - Solaris 10

D1850 - Dell 1850 - FreeBSD 6.0

Conclusions

Although these results are very interesting, they should be completed sometime in the futur, mainly :

Using a more efficient SMTP source with controlled output rate
Complete Sun T2000 experiments with greylisting tests
I've seen some benchmarks for other filters using other test conditions. To be able to compare ze-filter performance with these other filters, it could be interesting to do all tests again with these other conditions :
- Instead of sending a single message per connection, connecting once and sending all messages in a single connection. This removes all overhead caused by connection establishment.
- Putting the MTA and the filter on different machines. Having the MTA and the filter in the same machine is a big penalty, as sendmail forks for each connection and does many disk access which are “fsynced” to avoid message losses.

We observed that the message/connection handling time grows linearly with the message rate instead of being constant, as one could expect. This difference comes from the OS activity (scheduler and disk activity). A direct consequence of this is that the expected number of entities in the system (processes and threads) grows with the square of message rate.

It could be expected higher performance ratio between the T2000 and the D1850. The probably reason is the disk activity and the MTA (sendmail) running in the same machine as the filter. The reason seems to be sendmail 8, which isn't multithreaded. T2000 performance must be much better when the MTA is sendmail X ¹⁾, which is a multithreaded MTA.

You shouldn't directly compare results here and results from other experiments. Most of the time, the filter is running alone in a server or test conditions aren't the same. See, e.g., Tuning Symantec Brightmail AntiSpam on the Sun Fire T2000 Server. At our experiments, both the filter and the MTA sendmail were running at the same time on the same machine.

To be completed

Appendix A - Estimation of the number of entities in the system

The number of entities (filter threads or sendmail processes) in the system can be estimated using the “Little's law” : in a queue system, the mean number of entities in the system is estimated by the product of the mean stay time and the arrival rate.

The filter keeps track of the mean time spent by the filter to handle connections and the mean duration of connections (as seen by the filter). So, with the data collected by the filter, we evaluated two parameters :

NE - Handling - this is the product of the applied message rate and the mean handling time (by the filter)
NE - Connect - this is the product of the applied message rate and the mean duration of each connection.

We have three situations :

The number of sendmail processes : NE-Connect is an estimate of the number of sendmail processes in the system. The duration of each connection, seen by the filter, doesn't include the sendmail fork delay, and this can't be neglected when the system is under heavy load. So, this is a good estimate only when the message rate applied to the filter is far from its upper bound. Under heavy loads, this is an estimate of the lower bound of the number of the mean number of sendmail processes in the system.
The number of milter threads when using original libmilter (one thread per SMTP active connection) : NE-Connect is an estimate for the number of milter threads. On the other hand, NE-Handling is an estimate of the number of threads effectively running on the filter.
The number of milter threads using the modified libmilter (with a pool of workers) : When using the modified libmilter using a pool of workeres, tasks are distributed to idle workers and new workers are launched only when there isn't any worker idle. In this case, NE-Handling is an estimate of the number of threads in the system.

A remark : these experiments were done in an environnement where most latencies come from the mailserver under test. In real world, this isn't true, and the duration of SMTP connections are much longer than those measured here. This means the ratio NE-Connect/NE-Handling is much higher in real mailservers.

Thanks - 8-)

I'd like to thanks Claus Assmann for the hints and pointers on MTA benchmarks and Francois Philippe (cefi.fr) who lend me the T2000 for three weeks.

¹⁾

Called MeTA1 now

Table of Contents