Editor's note: This is an excerpt from "Routing TCP/IP, Volume II: CCIE Professional Development, Second Edition," by Jeff Doyle and published by Cisco Press.
Just as the concepts of stub and transit (core) autonomous systems were introduced in EGP, that pioneering protocol also introduced the concepts of interior and exterior neighbors. That is, if an EGP process peers with a neighbor in the same AS, the neighbor is interior; if the neighbor is in a different AS, the neighbor is exterior.
BGP uses the same concept: If a BGP session is established between two neighbors in different autonomous systems, the session is external BGP (EBGP), and if the session is established between two neighbors in the same AS, the session is internal BGP (IBGP). Figure 1-6 illustrates this concept.
Multiple routers usually exist within an AS, so IBGP is necessary whenever BGP advertised information must be passed within a given AS. In Figure 1-6, for instance, the combination of EBGP and IBGP sessions makes it possible for the router in AS1 to advertise a route to the router in AS3. Traditionally, IBGP is associated with transit autonomous systems such as AS2 in Figure 1-6. A stub AS usually runs EBGP at one or more edge routers only, and routes packets to and from the edge routers via an IGP. However, with multiprotocol BGP being used more and more frequently for services, such as MPLS-based VPNs and IP multicast, IBGP is beginning to appear even in stub autonomous systems.
Recall that BGP routers use the AS_PATH not only as an AS hop count metric but also as a loop avoidance device: If a router sees its own AS number on the AS_PATH list, it drops the route. This presents some interesting problems for IBGP.
Consider, for example, a route being communicated from AS1 to AS3, through AS2, in Figure 1-7. The physical path across AS2 is through three routers, RTR1, RTR2, and RTR3. If each of these three routers adds its AS number to the AS_PATH list as it passes the route along, two problems arise:
■ The AS_PATH list no longer is a true representation of the length of the inter-AS path. AS2 is a single AS hop and should be represented by one entry on the AS_PATH list. If each router makes an entry for AS2, the AS number would appear three times (Figure 1-8).
■ The loop avoidance function of the AS_PATH stipulates that if a router sees its own AS number on the AS_PATH list, it assumes a loop has occurred and drops the route. So if RTR1 added AS number 2 to the list, RTR2, seeing that AS number and knowing it is in AS2, would drop the route (Figure 1-9).
The solution to these problems is a special rule for IBGP: A router adds its AS number to a route’s AS_PATH only when the route is sent to an EBGP neighbor. The AS number is not added to routes sent to an IBGP neighbor.
Figure 1-10 shows the effects of this rule: Routers within AS2 do not drop the route because they do not see their own AS number on the AS_PATH list, and the router in AS3 correctly determines the AS hop distance to prefix A.
This rule solves the problems represented in Figures 1-8 and 1-9 but introduces another problem. Detecting one’s AS number on the AS_PATH list is BGP’s method of detecting and avoiding routing loops; however, the AS_PATH is meaningless within the scope of a single AS. What if a routing loop does exist within the AS? How can you avoid it?
To answer this problem you must look again to EGP. That protocol had no loop avoidance mechanism, so the solution was to ensure a loop-free topology. This is also the rationale behind hierarchical area topologies in OSPF and IS-IS, as discussed in Volume I; SPF trees (the means by which link-state protocols “see” loops) do not span area boundaries, so a loop-free inter-area topology is imposed.
This, then, is also the solution to the IBGP routing loop vulnerability: Ensure that the IBGP peering sessions cannot loop by requiring a loop-free topology. One of the keys to this solution is that BGP sessions run over TCP, which is unicast point-to-point (avoiding at least some looping risks) and has no requirement that the two points of the session physically connect. So in the example network, even though the path through AS2 transits three routers, the IBGP session can be established directly between the edge routers, as shown in Figure 1-11. The IBGP session is following the physical path through RTR2 but logically exists only between RTR1 and RTR3.
Our IBGP problems are still not completely solved, however. To understand the next problem, suppose the route to prefix A has been sent across the EBGP and IBGP sessions in Figure 1-11. The resulting route entry at RTR3 is shown in Figure 1-12. The entry is created from RTR1’s advertisement, so RTR1 is indicated as the next hop toward the destination. And the next hop to reach RTR1 within AS2 is RTR2, so that entry is also shown.
The problem becomes evident in Figure 1-13, in which a packet with a destination address belonging to prefix A is forwarded from AS3 to AS2. The sequence of events shown in Figure 1-13 follows:
1. When RTR3 receives the packet, it does a lookup of the destination address and sees that the next hop is RTR1.
2. Because RTR1 is not directly connected, RTR3 must do a second look up to find out how to forward the packet toward that next hop address.
3. The next hop of RTR1 is RTR2, so the packet is forwarded to that router.
4. RTR2 has no route entry to prefix A because the IBGP session communicating the route was directly from RTR1 to RTR3; therefore, RTR2 drops the packet.
So even though a loop-free route exchange is accomplished with the logical topology depicted in Figure 1-11, not enough information is shared across the actual path packets take for the packets to be forwarded successfully. This problem highlights two levels of information need to be considered when setting up IBGP:
1. Next-hop information about the advertised prefix
2. Next-hop information about the prefix next hop addresses
Figure 1-13 shows how a router performs a recursive lookup: It first looks up the route to the packet destination address; if the next hop address is not directly connected, it must perform a second look up to find the route to the next-hop address. IGPs normally route hop-by-hop, so recursive lookups are not a problem. This issue is typical for IBGP, though.
To prevent the problem depicted in Figure 1-13, every router along the path over which a packet will be forwarded must have enough information in its routing table to know what to do with the packet. One solution is to redistribute all the routes learned from EBGP neighbors into the IGP.6 In Figure 1-11, when RTR1 receives the route information about prefix A from AS1, it could, in addition to advertising the information to RTR3, redistribute the information into the local IGP. Now, when a packet is forwarded to RTR2, as shown in Figure 1-13, RTR2 has a route table entry for prefix A learned via the IGP from RTR1 and showing RTR1 as the next hop for the prefix.
Although redistribution of BGP routes into the local IGP works just fine on paper, Chapter 3 explains in some detail why this is almost always a bad idea in practice. In short, there are two issues with redistributing BGP routes into the IGP:
■ One of the assumptions of BGP is that external peers are outside of your realm of trust, and therefore information received from those peers are subject to acceptance under more cautious rules than information received from an IGP or IBGP peer. Promiscuously tossing external routes into your IGP database exposes you to security and stability threats.
■ The information received from an external BGP peer is usually either the full Internet routing table, a substantial subset of that table, or some other large set of routes. IGP performance degrades in inverse proportion to the size of its routing information databases. A large set of routes (the specific thresholds depend on the individual router’s memory capacity, CPU speed, and efficiency of IGP coding) can cause the IGP to consume most or all the router’s processing capacity, bringing the router’s availability quickly down to 0 and in many cases causing a complete platform failure. Chapter 3 shows that it can get much worse than just a single router failure.
The best practice in the great majority of cases is to keep BGP-learned routes within BGP. If these routes must be distributed to routers within the AS to eliminate the problem, as shown in Figure 1-13, distribute them using IBGP, as shown in Figure 1-14. The practice for efficient routing across an IBGP infrastructure is that a full mesh of IBGP sessions should exist between all BGP routers within a single AS. Chapter 5 shows that this practice is subject to a few modifications in the interest of scaling, but until you get to that chapter, full IBGP meshes can confidently be written into your book of best practices.
The logical topology of Figure 1-14 brings you back to the problem of loop avoidance. The physical topology you have been using up until now is easy to understand, but the reality is that the interior architecture of most autonomous systems is more complex. Figure 1-15, for instance, shows that the logical BGP topology is quite different from the autonomous system’s physical topology. Although the EBGP sessions (represented by the arrows crossing the AS boundary) correspond with the external physical links, the fully meshed IBGP sessions are significantly more complex. It is essential to remember, however, that every IBGP session must travel over some physical link. The direct IBGP session between RTR5 and RTR6 in Figure 1-15, for instance, actually passes through RTR2 and RTR3.
How, then, are BGP routing loops avoided in a complex topology? By adding another special IBGP rule: Routes learned from an internal neighbor are never sent to another internal neighbor.
The objective of the full IBGP mesh is to ensure that all routers within the AS have the information they need to forward packets to the correct next hop. Suppose RTR6 in Figure 1-15 receives a packet on its AS-external link, and a route lookup shows RTR7 as the next hop. The path the packet must follow to get to that next hop is through RTR2, RTR3, and RTR5. The IBGP sessions ensure that whenever a router learns a route from an external neighbor, it passes the information directly to every router within the AS, without the need for any one of the routers to forward the information to any other router within the AS. And if no information learned from an internal neighbor is passed along to another internal neighbor, no routing loops can occur.