Cool tips from Ralf Hildebrandt to tune your Postfix http://www.arschkrebs.de/postfix/
11. Stress Testing
How much mail will my box be able to handle?
To find out how much traffic your installation can handle, you need to perform some kind of stress testing. To put an adequate load on the server, you need a fast mail generator. Postfix comes with a pair of testing programs named smtp-source and smtp-sink for just this purpose. Here’s how they work:
This program connects to a host on a TCP port (port 25 by default) and sends one or more messages, either sequentially or in parallel. The program speaks both SMTP (default) or LMTP and is meant to aid in measuring server performance.
This test server listens on the named host (or address) and port. It recieves messages from the network and throws them away. You can measure client and network performance with this program.
Let’s start with smtp-source to stress-test your Postfix installation. The following example injects 100 total messages of size 5k each in 20 parallel sessions to a Postfix server running on localhost port 25. Because you’re also interested in how much time this takes, use the time command:
$ time ./smtp-source -s 20 (1) -l 5120 (2) -m 100 (3) -c (4) \
-f firstname.lastname@example.org (5) -t email@example.com (6) localhost:25 (7)
(1) 20 parallel sessions
(2) 5k message size
(3) 100 total messages
(4) display a counter
(5) envelope sender
(6) envelope recipient
(7) target SMTP server
In the example above, injection took 4.294s. However, you also want to know how long actual delivery takes? Check your logs for this, and also to verify that every last message arrived for <firstname.lastname@example.org> received.
Now let’s turn our attention to smtp-sink to find out how many messages per second your server can handle from your horrible mass mailing sofware. Postfix has to process each outgoing message even if the server on the other side throws it away (therefore, you can’t use this to test the raw performance of your mass mailer unless you connect your mailer directly to smtp-sink).
The following example sets up an SMTP listener on port 25 of localhost:
$ ./smtp-sink -c localhost:25 1000
Now you can run your client tests.
If you want to get an idea for how much overhead the network imposes and also get a control experiment to see what the theoretical maximum throughput for a mail server, you can make smtp-source and smtp-sink talk to each other. Open two windows. In the first, start up the dummy server like this:
# ./smtp-sink -c localhost:25 1000
With this in place, start throwing messages at this server with smtp-source in the other window:
$ time ./smtp-source -s 20 -l 5120 -m 100 -c \
-f email@example.com -t firstname.lastname@example.org localhost:25
This output shows that smtp-sink is much faster at accepting messages than Postfix. It took only 0.239 seconds to accept the messages, which is 18 times faster than the Postfix injection process. Now, wouldn’t it be nice if you could throw away all incoming email like this?
11.1. Disk I/O
Why do I see huge load, when no process is actually using the processor during stress testing?
When you run your stress testing, you might encounter huge load averages on your machine that seem out of place. Assuming that you don’t have any content filtering in place, Postfix is I/O bound, so your I/O subsystem could be saturated.
If the output of top shows a a high load such as 10.7, but none of your processes are actually using the CPU. In this particular case, your load is probably coming from the kernel using most of the CPU for I/O and not letting processes run. Furthermore, the reason that the kernel is doing so much I/O is that many more processes have requested I/O operations (and are now waiting for them).
Linux 2.6 kernels support iowait status in the top command. To see if this is the case on 2.4.x kernels (which don’t have a seperate means of displaying the iowait status), you can add a kernel module. Oliver Wellnitz wrote such a kernel module that you can download at ftp://ftp.ibr.cs.tu-bs.de/os/linux/people/wellnitz/programming/. This module calculates the load differently and gives you an interface in the /proc filesystem that you can see like this:
# cat /proc/loadavg-io
rq 0.30 0.23 0.14
io 0.08 0.31 0.27
In this example, rq is the number of processes, which are in the state TASK_RUNNING, while io is the number of processes, which are in the state TASK_UNINTERRUPTIBLE (waiting for I/O). The sum of those two is what the kernel usually calls load.
If you’re having problems like this, you need faster disks, or even a solution such as a SSD (a solid state disk, basically a battery backupped RAM disk) or a mirrored/striped RAID for the queue directory. See Section XX.XX[XREF, in the performance chapter] for more information. One other solution that may or may not work is to remove the synchronous updates for the queue directory. If you’re using an ext2 or ext3 filesystem, try this command:
# chattr -R -S /var/spool/postfix/
This setting is actually the default with recent Postfix installations.