Table of Contents

Performance Evaluation

This evaluation was done in 2005, with architectures available at those days. So it's outdated.

With recent computers and software, values are much better, but they shall be evaluated again.

Benchmarking something is difficult. The main reason is the impossibility of setting up a configuration and an environment corresponding to the real world conditions. As with any other benchmark, results here may not match real world but are intended to give an idea of the performance level which can be achieved by this filter.

The goal of this benchmark is to provide an estimate on the upper bound of email throughput of the couple sendmail/ze-filter running on a Sun T2000 and on a Dell 1850 computers, with few or only realistic tuning.

Methodology

Test conditions used here are much different of real-world conditions, as involved computers are as “near” as possible from the others. This is intended to reduce all kind of external latencies (DNS query delays, …).

This approach must be considered with care, as external network latencies can't be neglected in real-world. In fact, the number of idle sleeping threads and processes in the server are directly related to latencies, and a high number of idle entities have direct impact on the operating system performance (scheduling and so…).

The computer running the SMTP source program should be running far from its limits. This limits the impact of problems on the SMTP source on the benchmark.

Configuration and tuning are limited to those things usually found in real world mail servers.

Each experiment is done at least twice (eventually more), to confirm that the qualitative results are coherent. If this is the case, the best one is taken. If not, experiment is repeated till the moment we can see some repetitive pattern.

Content filtering experiments

Only two features were evaluated : detection of XFILES (messages with attached executable files) and URL filtering, as they have some controlled behaviour. Pattern matching isn't tested as it depends highly on how it's configured and has to be avoided on big servers. This corresponds to four filter configurations (ordered by increading load) :

As there were no behaviour filtering feature enabled, all connections and messages are entirely handled. In other words, as no connection is rejected before the DATA command, a temporary file is allways created and we're not saving disk access : results are worst than when doing behavioural filtering (connection rate limit, greylisting, …).

Various levels of input message rate

The goal here is to evaluate the filter behaviour at different levels of input message rate, and what could be some upper bound of input message rate.

To evaluate this, messages were sent to the computer running the filter at constant rate during some time, long enough to be in a stationnary situation. This experiment was repeated with increasing message rates, to find a point where some qualitative or quantitavie abnormal behaviour is observed. Some examples of abnormal behaviour are :

From queueing systems a stability condition is usually evaluated by the condition :
occupation rate = arrival rate X service time / number of servers < 1

For this experiment, we want to evaluate how some parameters vary with the input message rate. These parameters are :

  1. Message handling time - this is measured by the filter itself which keeps track of the time spent by each callback for each connection.
  2. Disk activity - this is the result of iostat sampling at intervals of 5 seconds
  3. CPU load - this is the result of iostat sampling at intervals of 5 seconds.
  4. An estimate of the number of “entities” in the system under test : sendmail processes and filter threads, as explained at Appendix A.

High and sustained input message rate

The goal here is to evaluate the filter stability when a high input message rate is applied long enough. Input message rate is selected around the maximum of the previous series, but the system should remain in a stable steady state and no message/connection must be rejected by the mail server under test by reasons such as high load or time outs. The duration of each experiment is 20 minutes.

Some parameters, such as effectively handled message rate, will be evaluated as the mean over a 10 minutes time window centered on the duration of the experiment. This is considered as the “steady state” part of the experiment.

Behaviour filtering experiments

The behaviour feature evaluated is greylisting. Behaviour filtering is much more efficient as it doesn't handle the message body : connection may be rejected before the SMTP DATA command - so no temporary spool file is created.

Evaluation of this behaviour feature is interesting as it may do disk and/or network access for its database queries. It's less performant than connection rate control, which depends only on data stored in memory.

Two configurations were used :

As before, a series of experiments was done with increasing message rates. The filter behaviour at each message rate was observed and the experiment stopped when some abnormal behaviour indicating the limit on the handling capacity was observed.

SMTP source

Although there are some SMTP traffic generators (like Mozilla mstone), it was simpler and faster to build a simple one, based on an opensource message submit program (mini_sendmail). But our home made tool can't generate a really huge traffic (something like more than 100 messages per second. Some time in the futur we intend to remake all these tests with a more performant tool.

We used a pool of 180 messages (spams, virus and normal messages), whose size varies from 1 KByte to 50 KBytes.

The SMTP source works as follow : The message source consists of a pool of 180 real messages (normal messages, spams and virus) whose size varies from 1 KBytes to 50 KBytes. For some given message rate (say R), “R” messages are choosen at random, from the pool of messages, each second and “R” SMTP clients are launched to send them. Each connection corresponds to a single message and a single recipient.

REMARK - as all tests here are done within ranges where all messages or connections are handled with normal behaviour and each connection corresponds to a single connection and a single recipient, the connection rate and the message rate correspond to the same value.

Target

T.B.D.

Tools

Configuration files

Results

Conclusions

Although these results are very interesting, they should be completed sometime in the futur, mainly :

We observed that the message/connection handling time grows linearly with the message rate instead of being constant, as one could expect. This difference comes from the OS activity (scheduler and disk activity). A direct consequence of this is that the expected number of entities in the system (processes and threads) grows with the square of message rate.

It could be expected higher performance ratio between the T2000 and the D1850. The probably reason is the disk activity and the MTA (sendmail) running in the same machine as the filter. The reason seems to be sendmail 8, which isn't multithreaded. T2000 performance must be much better when the MTA is sendmail X 1), which is a multithreaded MTA.

You shouldn't directly compare results here and results from other experiments. Most of the time, the filter is running alone in a server or test conditions aren't the same. See, e.g., Tuning Symantec Brightmail AntiSpam on the Sun Fire T2000 Server. At our experiments, both the filter and the MTA sendmail were running at the same time on the same machine.

To be completed

Appendix A - Estimation of the number of entities in the system

The number of entities (filter threads or sendmail processes) in the system can be estimated using the “Little's law” : in a queue system, the mean number of entities in the system is estimated by the product of the mean stay time and the arrival rate.

The filter keeps track of the mean time spent by the filter to handle connections and the mean duration of connections (as seen by the filter). So, with the data collected by the filter, we evaluated two parameters :

We have three situations :

A remark : these experiments were done in an environnement where most latencies come from the mailserver under test. In real world, this isn't true, and the duration of SMTP connections are much longer than those measured here. This means the ratio NE-Connect/NE-Handling is much higher in real mailservers.

Thanks - 8-)

I'd like to thanks Claus Assmann for the hints and pointers on MTA benchmarks and Francois Philippe (cefi.fr) who lend me the T2000 for three weeks.

1)
Called MeTA1 now