Mail fails consistently with timeout or lost connection

Every now and then, mail fails with “timed out while sending end of data — message may be sent more than once”, or with: “lost connection after DATA”. Network outages happen, systems crash. There isn’t much you can do about it. Usually the problem goes away by itself.

However, when you see mail deliveries fail consistently, you may have a different problem: broken path MTU discovery. Or it could be a broken PIX firewall.
Cisco PIX “fixup protocol smtp” bug
The Cisco PIX firewall has a bug when running software older than version 5.2(4) or 6.0(1).

The bug ID is CSCds90792. The “fixup protocol smtp” feature does not correctly handle the case where the “.” and the “CRLF” at the end of mail are sent in separate packets.

How does one recognize a mailer behind a Cisco PIX with “fixup protocol smtp” enabled? As of version 5.1 and later, the fixup protocol smtp command changes the characters in the SMTP banner to asterisks except for the “2”, “0” and “0 SPACE” characters.

When you connect to a mailer behind such a filter you see something like:

220 **************************************0******0*********20 ****200**0*********0*00

IP path MTU discovery
A little background is in order. With the SMTP protocol, the HELO, MAIL FROM and RCPT TO commands and responses are relatively short. When you’re talking to old versions of sendmail, every command and every response is sent as a separate packet, because sendmail didn’t implement ESMTP command pipelining until recently.

The message content, however, is sent as a few datagrams, each datagram typically a kbyte large or even bigger, depending on your local network MTU.

When mail fails consistently due to a timeout, I suspect that the sending machine runs a modern UNIX which implements path MTU discovery. That causes the machine to send packets as large as it would send over the LAN, with the IP DON’T FRAGMENT bit set, preventing intermediate routers from fragmenting the packets that are too big for their networks.

Depending on what network path a message follows, some router on the way responds with an ICMP MUST FRAGMENT message saying the packet is too big. Normally, the sending machine will re-send the data after chopping it up into smaller pieces.

However, things break when some router closer to the sending system is dropping such ICMP feedback messages, in a mistaken attempt to protect systems against certain attacks. In that case, the ICMP feedback message never reaches the sending machine, and the connection times out.

This is the same configuration problem that causes trouble with web servers behind a misconfigured packet filter: small images/files are sent intact, large images/files time out because the server does not see the MUST FRAGMENT ICMP feedback messages.

Workaround: at the sending machine, disable path MTU discovery. Mail will get out, but of course everyone else will still suffer. How to disable path MTU discovery? It depends. Solaris has an ndd command; other systems use different means such as sysctl to control kernel parameters on a running system.

Workaround: at the receiving machine, set a smaller MTU. For example, people using PPPoE (PPP over Ethernet) often have to choose an MTU lightly smaller than the default 1500 for ethernet.

Fix: find the router that drops the ICMP MUST FRAGMENT messages, and convince the person responsible for it to fix the configuration.


  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: