Return Path Rewriting
NOTICE – This page is provided for historical reasons only. These days, SRS should be preferred to home-brew SPF-compliant rewriting solutions.
Return Path Rewriting (RPR) is a class of methods to replace verbatim mail forwarding in the spam age.
If you’ve come here because of strange return path / envelope sender addresses
and X-RPR-*
headers in messages passing through one of my mailservers, this
document should provide all the information about why I rewrite return path
addresses, and how I do it.
This document assumes you understand the difference between the SMTP envelope
sender, also called the (envelope) return path, and the message header From:
line. You should have a fair understanding of how email gets transported from
sender to recipient.
All concepts and code presented herein should be considered EXPERIMENTAL.
Motivation
Many people are beginning to employ anti address forging mechanisms to counter spam with forged envelope sender addresses. They accept mail from a given SMTP envelope sender address domain only from the real mail servers for that domain. GMX, a large German freemail provider, has long had an anti spam module called “Spamserver-Blocker”. That module accepts mail from large free email providers like Hotmail and Yahoo (and even GMX itself) only if coming directly from the respective mailservers. Other large mail providers are said to have their own version of mail origin filters along the same lines.
There’s also a scheme called Sender Policy Framework (SPF), with the help of which people voluntarily publish information on which IP addresses will legitimately send mail for a given domain. Based on the information from those SPF records, mail can be blocked when coming from addresses other than those which are supposed to. There is a number of such Mailer Authorization Records in DNS (MARID) type schemes around, some of which authorize based on return paths too.
The main issue of this kind of origin filtering is that it breaks traditional
.forward
or /etc/aliases
style verbatim mail forwarding. My previous
solution was to educate GMX users to change their spam protection setup
(German instructions).
These days I believe the right thing to do is to rewrite the return path of all
forwarded mail. Thus I have implemented a form of what I call Return Path
Rewriting (RPR), to avoid confusion of the envelope sender and the Sender:
header, which are two distinct things.
The SPF folks call this Sender Rewriting, and have proposed a scheme called SRS along with a perl reference implementation. I did not particularly like some details in their original choice of design and implementation, so I rolled my own.
Please note that the fact that I use my own schemes does not necessarily mean that my way of doing things is any better than the various SRS variants – it is just different in some aspects, but quite similar in others. So far it has been working fine for me, but YMMV.
The Problem
With traditional .forward
or /etc/aliases
style email forwarding, the
return path is preserved when the message is passed on to the next MTA.
Consider Hugo, who sends mail to Susi, who forwards her mail from aliases.at
to her real account on sorglos.de
:
smtp.sanders.ch
|
| MAIL FROM:<hugo@sanders.ch>
| RCPT TO:<susi@aliases.at>
V
mx01.aliases.at
|
| MAIL FROM:<hugo@sanders.ch>
| RCPT TO:<susi@sorglos.de>
V
mail.sorglos.de
Let’s assume that mail.sorglos.de
does some kind of origin verification; that
is, they only accept a message if the sending host really is a mail server for
the domain in the envelope sender.
Because the return path is preserved when forwarding, mail.sorglos.de is
receiving mail with envelope sender hugo@sanders.ch
not from the real MTA for
sanders.ch
, but from mx01.aliases.at
– a host totally unrelated to the
domain sanders.ch
.
The receiving MTA will now assume that the message is spam with forged envelope sender address, and refuse to accept it.
A Solution – Return Path Rewriting
Because in the long term, trying to educate users and admins to disable their
origin filters or add your server to their whitelist is infeasible, there’s
currently only one working solution: rewriting the return path on forwarded
messages. Consider the example from above, but this time, aliases.at
does
return path rewriting on the forwarded message:
smtp.sanders.ch
|
| MAIL FROM:<hugo@sanders.ch>
| RCPT TO:<susi@aliases.at>
V
mx01.aliases.at
|
| MAIL FROM:<bounce-hugo#sanders.ch-susi@aliases.at>
| RCPT TO:<susi@sorglos.de>
V
mail.sorglos.de
Because of the return path rewriting, the origin verification on
mail.sorglos.de
will succeed, and the message will be accepted.
Now assume that Susi’s mailbox on mail.sorglos.de
is full, and Hugo’s message
bounces. mail.sorglos.de
will send the DSN message back to the rewritten
envelope sender address as follows:
mail.sorglos.de
|
| MAIL FROM:<>
| RCPT TO:<bounce-hugo#sanders.ch-susi@aliases.at>
V
mx01.aliases.at
|
| MAIL FROM:<>
| RCPT TO:<hugo@sanders.ch>
V
smtp.sanders.ch
For the bounce to get back to Hugo, mx01.aliases.at
has to extract the
original envelope sender address hugo@sanders.ch
from the rewritten local
part bounce-hugo#sanders.ch-susi
, and forward the bounce there.
Of course, this naïve rewriting scheme is inherently insecure, as it opens
up mx01.aliases.at
for relaying to arbitrary addresses. In practice we need
to protect against replay and forgery, ie. prevent spammers being able to guess
valid rewritten addresses, and prevent them from using old envelope sender
addresses for spamming too. Bounce addresses should be impossible to forge, and
only be valid for a given time.
My Experimental Return Path Rewriting Scheme v3
The design goals of this scheme were:
- Ability to expire
- Reasonable protection against forgery
- Stateless (no database involved)
- Human readability
- Case independency
- Compatibility with greylisting peers
I tried to keep the rewritten envelope return path as short as reasonably possible whilst still adhering to these design goals.
I generally do not think that optimizing return paths rewritten by other systems by “cutting the middleman” should be done. It involves assigning semantics to local parts of foreign domains, and in conjunction with catch-all domains or clueless admins leads to problems caused by limited relaying loopholes. I currently prefer babushka style rewriting over multiple hops, even though it suffers from severe limitations in envelope sender length. You can usually just change your forwarding to use more direct routes if you run into problems with long addresses.
Now consider the above case of forwarding a message from hugo@sanders.ch
via
susi@aliases.at
to susi@sorglos.de
.
smtp.sanders.ch
|
| MAIL FROM:<hugo@sanders.ch>
| RCPT TO:<susi@aliases.at>
V
mx01.aliases.at
|
| MAIL FROM:<_3_200402097faacbe9d81chugo_sanders.ch@aliases.at>
| RCPT TO:<susi@sorglos.de>
V
mail.sorglos.de
Now lets take that longish rewritten return path apart. It consists of the following parts:
_3_ 20040209 7faacbe9d81c hugo_sanders.ch @ aliases.at
(1) (2) (3) (4) (5)
- Prefix (constant)
- Date (8 digits)
- Cookie (12 hex digits)
- Original Sender (variable length)
- Local Domain
I’ve chosen a short prefix which allows me to encode a scheme version identifier, which is unusual for existing localparts, but still uses unproblematic characters only.
The date is human readable. I’ve chosen to keep that this way, even though a clever scheme could encode enough information to allow expiry in as few as two bytes (eg. a 12 bit day counter in two radix 64 digits).
The cookie consists of the first 12 hex digits of an MD5 hash. I use MD5
because it is available within the Exim 3 string expansion language, even
though it is not the best choice of hash function. I concatenate a local secret
(here t0ps3cret
), the (lowercased) original sender address, and the full
timestamp, and calculate a MD5 hash from the resulting string
t0ps3crethugo@sanders.ch20040209
. The first 12 hex digits of the resulting
hash 7faacbe9d81c02ec1e7e4c7648424948
end up in the cookie. This leaves
approximately 24 bits of security against forged cookies, fully considering
birthday attacks even though they should not be an issue given the assumed
attack scenario. 24 bits may seem low. But as Mallory has no way of verifying
arbitrary cookies for validity offline, this is good enough for me. Or in
other words, there seems to be no good reason to throw 16 million messages at
a mailserver for a 50% chance of getting one single lousy message through.
Cryptographically speaking, this scheme is not perfect. Although length
extension attacks are prevented by the order of the tokens to be hashed,
ideally a HMAC construction should be used, together with a more secure hash
algorithm like SHA-256 or SHA-1.
The original sender is not specially encoded or escaped, except for the
substitution of the last @
by an underscore (_
), that is, @
in the
localpart are untouched. The reason for this is that I do not want to introduce
the potentially problematic @
into the return path if it was not there
already.
If available, I use the virtual domain for which the original message was received as local domain, in all other cases the primary hostname. This is what makes this return path “SPF compliant”, as the SPF folks call it. Or in other words, we only ever send messages with local return paths.
When receiving a message for such an address with _3_
prefix, steps along the
following lines would be taken:
- Process only if it’s a DSN (empty SMTP envelope sender)
- Verify the rewritten address syntax (deliver normally if invalid)
- Verify the cookie (deliver to postmaster or discard if mismatch)
- Verify the timestamp (deliver to postmaster or discard or fail with 5xx if older than say N days)
- Rewrite the destination address to the original envelope sender
I also add some informational RPR-*:
headers in an attempt to help the
clueless understand what’s going on with the mail they receive from me. Also, I
add the X-Primary-Address:
header if not already present, for potential
compatibility with challenge/response systems like
TMDA. For this to work, TMDA users need to either set
PRIMARY_ADDRESS_MATCH
to 6
, which is bad, or somebody implements some kind
of intelligent and/or fuzzy substring matching into TMDA (eg. limited to some
known rewriting forwarder domains). In any case, including this header on
rewritten messages will be necessary. For the record, please note that some
people consider C/R systems
harmful, and I
generally do not think that they should be used.
Feel free to improve on this scheme, and please do give me feedback if you have questions, thoughts, ideas or concern.
Implementations
I originally implemented my own return path rewriting on Exim 3. Martin Treusch von Buttlar has first derived an Exim 4 implementation of a similar scheme, and others followed. It should not be too hard to reimplement something along these lines in your favourite MTA.
Please do not blindly use any of these snippets of code without careful consideration of the implications. Also, you should test your setup well before using any kind of return path rewriting in production.
Version 3 of my RPR scheme for Exim 3
This is the scheme presented above. This scheme has shorter return paths than v1 and v2, and a granularity of 1 day for greylisting compatibility. Due to Exim 3 limitations requires a tiny snippet of embedded perl for bounce processing.
Version 2 of my RPR scheme for Exim 3 (obsolete)
This scheme has significantly shorter return paths than v1, and it has a granularity of 1 minute, which makes it incompatible with some greylisting systems. Due to Exim 3 limitations requires a tiny snippet of embedded perl for bounce processing.
Version 1 of my RPR scheme for Exim 3 (obsolete)
This scheme has the disadvantage of very long resulting return paths, but includes more information in the rewritten address. Due to Exim 3 limitations requires a tiny snippet of embedded perl for bounce processing.
Other Implementations
Here are links to other implementations of some form of return path rewriting that other people have come up with – let me know if you know of more. If you plan on implementing some form of RPR, you should find it beneficial to also take a look at these. Take all the proposed solutions (including mine) with a grain of salt though, none of them seems to be perfect.
Oh, and GMX are using some form of return path rewriting too.