Postfix Stress Test

Cool tips from Ralf Hildebrandt to tune your Postfix http://www.arschkrebs.de/postfix/

11. Stress Testing
How much mail will my box be able to handle?

To find out how much traffic your installation can handle, you need to perform some kind of stress testing. To put an adequate load on the server, you need a fast mail generator. Postfix comes with a pair of testing programs named smtp-source and smtp-sink for just this purpose. Here’s how they work:

smtp-source

This program connects to a host on a TCP port (port 25 by default) and sends one or more messages, either sequentially or in parallel. The program speaks both SMTP (default) or LMTP and is meant to aid in measuring server performance.
smtp-sink

This test server listens on the named host (or address) and port. It recieves messages from the network and throws them away. You can measure client and network performance with this program.

Let’s start with smtp-source to stress-test your Postfix installation. The following example injects 100 total messages of size 5k each in 20 parallel sessions to a Postfix server running on localhost port 25. Because you’re also interested in how much time this takes, use the time command:

$ time ./smtp-source -s 20 (1) -l 5120 (2) -m 100 (3) -c (4) \
-f sender@example.com (5) -t recipient@example.com (6) localhost:25 (7)
100
real    0m4.294s
user    0m0.060s
sys     0m0.030s

(1)     20 parallel sessions
(2)     5k message size
(3)     100 total messages
(4)     display a counter
(5)     envelope sender
(6)     envelope recipient
(7)     target SMTP server

In the example above, injection took 4.294s. However, you also want to know how long actual delivery takes? Check your logs for this, and also to verify that every last message arrived for <recipient@example.com> received.

Now let’s turn our attention to smtp-sink to find out how many messages per second your server can handle from your horrible mass mailing sofware. Postfix has to process each outgoing message even if the server on the other side throws it away (therefore, you can’t use this to test the raw performance of your mass mailer unless you connect your mailer directly to smtp-sink).

The following example sets up an SMTP listener on port 25 of localhost:

$ ./smtp-sink -c localhost:25 1000

Now you can run your client tests.

If you want to get an idea for how much overhead the network imposes and also get a control experiment to see what the theoretical maximum throughput for a mail server, you can make smtp-source and smtp-sink talk to each other. Open two windows. In the first, start up the dummy server like this:

# ./smtp-sink -c localhost:25 1000
100

With this in place, start throwing messages at this server with smtp-source in the other window:

$ time ./smtp-source -s 20 -l 5120 -m 100 -c \
-f sender@example.com -t recipient@example.com localhost:25
100

real    0m0.239s
user    0m0.000s
sys     0m0.040s

This output shows that smtp-sink is much faster at accepting messages than Postfix. It took only 0.239 seconds to accept the messages, which is 18 times faster than the Postfix injection process. Now, wouldn’t it be nice if you could throw away all incoming email like this?
11.1. Disk I/O
Why do I see huge load, when no process is actually using the processor during stress testing?

When you run your stress testing, you might encounter huge load averages on your machine that seem out of place. Assuming that you don’t have any content filtering in place, Postfix is I/O bound, so your I/O subsystem could be saturated.

If the output of top shows a a high load such as 10.7, but none of your processes are actually using the CPU. In this particular case, your load is probably coming from the kernel using most of the CPU for I/O and not letting processes run. Furthermore, the reason that the kernel is doing so much I/O is that many more processes have requested I/O operations (and are now waiting for them).

Linux 2.6 kernels support iowait status in the top command. To see if this is the case on 2.4.x kernels (which don’t have a seperate means of displaying the iowait status), you can add a kernel module. Oliver Wellnitz wrote such a kernel module that you can download at ftp://ftp.ibr.cs.tu-bs.de/os/linux/people/wellnitz/programming/. This module calculates the load differently and gives you an interface in the /proc filesystem that you can see like this:

# cat /proc/loadavg-io
rq 0.30 0.23 0.14
io 0.08 0.31 0.27

In this example, rq is the number of processes, which are in the state TASK_RUNNING, while io is the number of processes, which are in the state TASK_UNINTERRUPTIBLE (waiting for I/O). The sum of those two is what the kernel usually calls load.

If you’re having problems like this, you need faster disks, or even a solution such as a SSD (a solid state disk, basically a battery backupped RAM disk) or a mirrored/striped RAID for the queue directory. See Section XX.XX[XREF, in the performance chapter] for more information. One other solution that may or may not work is to remove the synchronous updates for the queue directory. If you’re using an ext2 or ext3 filesystem, try this command:

# chattr -R -S /var/spool/postfix/

This setting is actually the default with recent Postfix installations.

Hamilton Vera

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: