|
Explosion in a Punctuation Factory |
The rewriting rules of Sendmail help your system check and correct an electronic mail address before sending it to its final destinationBy Bryan CostalesThe Sendmail program is the mail-transfer software for many Unix systems, but Sendmail's configuration file has a long and glorious history of being difficult to understand, much less modify. Are Sendmail's rewriting rules confusing to you? If they are, you're not alone. The rewriting rules--used to rewrite mail headers, check for errors, and to select mail programs--don't have to be all that mysterious. Compact, yes, but relatively simple once you begin to understand them. The rewriting rules have been variously described as resembling: modem noise, Mr. Dithers swearing in the comic strip ``Blondie,'' and an explosion in a punctuation factory. While these allusions are sadly true, they are also, in reality, misleading. What appears confusing and complex is, in reality, just succinct. The Sendmail program parses (reads and processes) each rule
every time it reads its configuration file,
Why Rules?The rules are used to modify mail addresses, to detect errors in addressing, and to select an appropriate means of mail delivery. Addresses need to be modified because they can be specified in many ways yet are required to be in specific forms for particular means of delivery. To illustrate, consider the address
Another role for the rules is to detect (and reject) errors
locally. This filtering prevents errors from propagating over
the network. Mail to an address without a user name, such as
Sequences of rules are grouped together into rule sets. Each set is similar to a subroutine. A rule set is declared with the ``S'' key letter, which must begin a line in the Sendmail configuration file. For example, ``S0'' begins the declaration of the rules that forms rule set number 0. Rule sets are numbered starting from 0, where sets 0 through 5 are internally defined by Sendmail to have very specific purposes: 0 Resolve delivery agent 1 Process sender address 2 Process recipient address 3 Preprocess all addresses 4 Postprocess all addresses 5 Rewrite unaliased Rule set definitions may appear in any order in the configuration file. For example, rule set S5 may be defined first, followed by S2 and then S7. The rule sets are gathered when the configuration file is read, and they are sorted internally by Sendmail. If a rule set is undefined, the result is the same as if it were defined but had no rules associated with it. It is like a subroutine that contains nothing but a ``return'' statement. It does nothing and produces no errors. To observe the effect of rules that do nothing, create a
three-line configuration file named, say, The The ``rewrite:'' designation that begins each line of address- testing-mode output is simply there to highlight rewriting lines when they are mixed with other kinds of debugging output. The ``input'' designation means that Sendmail placed the address into the workspace (more about this later). The ``returns'' designation shows the result after the rule set has rewritten that address based on its rules. The address that was fed to Sendmail (bob@here) was first split
into parts (tokens) based on the separating characters defined by
the ``Do'' macro shown in Listing 1A, and 10 others defined
internally by Sendmail, namely:
For clarity, each token in Listing 1B was printed within full quotation marks; however, some versions of Sendmail omit these marks. The ``input:'' line shows the three tokens passed to rule set 0. The ``returns:'' line shows, because there is no rule set 0, that the undefined (empty) rule set returns those tokens that make up the address unmatched and unchanged. The example illustrates version 8 Sendmail. If you are running an old version of Sendmail, two things will be different. First, the initial output will not include the message ``(ruleset 3 NOT automatically invoked)'', but will include two extra rewrite lines. Second, old versions of Sendmail always assume you want to see the effect of rule set S3, whether you do or not. Rule SetsEach rule set may contain any number of individual rules or none at all. Rules begin with the ``R'' key letter and generally take the following general form: S0 Rlhs rhs Rlhs rhs comment The first line--the S0--declares the start of rule set 0. All the lines after the S line that begin with R belong to that rule set. A new rule set begins when another S line with a different number appears. Each R line is an individual rule in a series of rules that form a rule set. If you examine the Sendmail configuration file for almost any major mail-handling site you'll see that any given rule set can have a huge number of rules. But our hypothetical rule set 0 has only two rules and therefore only two lines that begin with an R. Each rule has two distinct parts, each divided from the other by one or more tab characters. You can use space characters inside each part, but you must use tabs to separate the parts. The left-hand part of the rule is called the lhs for left-hand side. Conversely, the right-hand part is denoted rhs. These two form the rule. A comment may optionally follow the right-hand side, and, if present, must be separated from it by one or more tab characters. The left-hand and right-hand sides form a ``do while'' pair. As long as the left-hand side evaluates to true, the right-hand is processed. If the left-hand side evaluates false, Sendmail skips to the next rule for that rule set. The WorkspaceWhether the left-hand side is true or false is determined by making comparisons. When an address is processed for rewriting by a rule set, Sendmail first separates the parts into tokens and stores those tokens internally in a buffer called the ``workspace.'' When the left-hand side of a rule is evaluated, it is divided
into tokens and those are compared to the tokens in the
workspace. If both the workspace and the left-hand side contain
exactly the same tokens, a match is found, and the result of the
left-hand side comparison is true. To illustrate, in Listing 2A we've added two lines to the
end of our minimal configuration file, Now run Sendmail in rule-testing mode, as shown in Listing 2B. As we did in Listing 1B, enter rule set 0 and a typical e-mail address at the prompt. Notice that nothing was rewritten, even though there is a rule set 0 and a rule in our sample configuration file. Remember that a rule is only rewritten if the workspace and the left-hand side exactly match. For the demo rule, they do not match (see Figure 1). Enter the exact text that appears in the left-hand side of the demo rule at the prompt (see Listing 2C). An amazing thing happens. The rule has actually rewritten an address. The address ``left.side'' was given to rule set 0 and was rewritten by the rule in that rule set to become the address ``new.stuff''. This transformation was possible because the workspace and the left-hand side exactly matched each other, so the result of the left-hand side comparison was true. Before leaving this demo rule set, perform one final experiment. Enter the text ``left.side'' again, but this time change the case of the letters to upper case. Notice that the workspace and the left-hand side still match, even though they now differ by case. This example illustrates that all comparisons between the workspace and the left-hand side of rules are done in a case-insensitive manner. This property enables rules that solve complex problems to be written without the need to distinguish between upper- and lower-case letters. The Flow of Addresses Through RulesWhen rule sets contain many rules, the ``flow'' is from the first through the last rule (top to bottom), in the order they are declared in the configuration file. To illustrate, modify the two demo lines you added to the sample configuration file, replacing them with the three new demo rules shown in Listing 3. There are only two parts to each rule (the comment is missing). Before you test these new rules, consider what they do. The first rule rewrites any ``x'' in the workspace into a ``y''. The second rule rewrites any ``y'' in the workspace into a ``z''. And the last rule rewrites any ``z'' that it finds in the workspace into an ``a''. Now run Sendmail in rule-testing mode once again, and, one at a time, enter rule set 0 and one of the letters ``x'', ``y'', and ``z''. No matter which of ``x'', ``y'', or ``z'' you enter, each is rewritten into ``a'', illustrating the ``flow'' of addresses (the workspace) through rules. Let's look in detail at what is going on by examining the input. Follow along with Figure 2. When you first enter rule set 0, the first rule of that rule set tries to match its left-hand side to the workspace; the left-hand side exactly matches the workspace, so the right-hand side rewrites the workspace so that ``x'' is replaced by ``y''. Now the next rule tries to match its left-hand side to the workspace. But what is contained in the workspace has been rewritten by the first rule. The key point here is that each rule compares its left-hand side to the current contents of the workspace, even though they may have been rewritten by earlier rules. It should now be clear why all three letters are rewritten to ``a'' (see Figure 3). Now feed one more letter into Sendmail in rule-testing mode. This time enter anything other than an ``x'', ``y'', or ``z'', say the letter ``b''. Notice that the workspace remains unchanged because ``b'' did not match of the left-hand sides in any of the three rules. If the left-hand side of a rule fails to match the workspace, that rule is skipped, and the workspace remains unchanged. Operators Versus the WorkspaceRules would be pretty useless if they always had to match the
workspace exactly. Fortunately, that is not the case; in
addition to literal text, you can also use operators. Operators
are like wild cards in that they allow the left-hand side of
rules to match arbitrary text in the workspace. To illustrate,
look at Figure 4. The left-hand
side begins with the first character following the ``R'' key
letter. The left-hand side in Figure 4 is the operator,
The address being evaluated is separated into tokens, placed
into the workspace (see Figure 5),
and then the workspace is compared to that pattern. When
matching the workspace to a left-hand side pattern, Sendmail
scans the workspace from left to right. Each token in the
workspace is compared to the operator ( The A rule using The But a bad address in the workspace will not match. For
example, consider an address that lacks a user name (as shown in
Figure 8). The first
When any part of a pattern fails to match the workspace, the
entire left-hand side fails. One small bit of confusion may yet
remain. When an operator like More Play With Left-Hand Side MatchingTake a moment to revise the sample Sendmail configuration file
as shown (Listing 4). I've given each
temporary right-hand side a number to see whether it is selected.
The Next enter an address that contains just a host and domain part, but not a user part, something like ``@host.domain''. The first thing to notice is what was not printed! The workspace does not match the pattern of the first rule. But instead of returning an error, the workspace is carried down as is to the next rule, where it does match. Now enter an address that fails to match the first two rules but successfully matches the third, something like ``user@host.domain''. The flow for this address is shown in Figure 9. The fourth rule contains the original lone Other OperatorsA single operator, the $@ Exactly none $* Zero or more $+ One or more $- Exactly one But the story doesn't end here. In this article you've been given a glimpse of how Sendmail's rules work. In all the listings, I've shown only ordinary, literal text in the right- hand side. The power of Sendmail lies in its use of operators in the right-hand side to rewrite addresses in complex and sophisticated ways. The right-hand side operators are: $: Rewrite once (prefix) $@ Return (prefix) $digit Copy by position $( Database lookup $[ Name canonicalization Clearly, there is not enough room in this tutorial to go over all the possible Sendmail rewriting rules. And the rewriting rules are only a part of Sendmail. The Sendmail program is a very flexible tool, and its configuration file reflects this flexibility by its complexity. Still, this tutorial hopefully has shown that you can understand Sendmail's configuration file, and encouraged you to continue exploring. |
Print This Page Send as e-mail |
Best of the Web
Data deduplication: Declawing the clones
Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs.
Compression, Encryption, Deduplication, and Replication: Strange Bedfellows
One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.
WAN Optimization Whitelists and Blacklists
Optimization is a fantastic way of saving money and creating really happy customers at the same time, but it doesn't work flawlessly for all applications.
WAN Optimization as a Managed Service: It's Not About the Cost
This insight examines how organizations outsourcing their WAN optimization initiatives to a third-party go about achieving their goals for application performance, reducing operational costs, and streamlining enterprise infrastructure.






