Carl Malamud: Internet Talk Radio, flame of the Internet.
Geek of the Week. We’re talking to Dr. Radia Perlman. She’s the author of Interconnections: Bridges and Routers. It’s the definitive book on bridges and routers, as you may guess, and she’s at Novell, Inc. Welcome to Geek of the Week.
Perlman: Hello.
Malamud: Well why don’t you start by telling us the difference between a bridge and a router. Aren’t these device—haven’t they merged? Doesn’t Cisco do everything for you now?
Perlman: Ah, yes. What’s the difference between a bridge and a router? Well, if you look at the ISO model… ISO has them defined very well, which is that a bridge is a data link layer relay, and a router is a network layer relay. And that’s fine, until you start wondering what’s a data link layer and what’s a network layer. And if it were up to me, what a data link layer would be is something that gets messages one hop to a directly-connected neighbor. And what a network layer is is something that pieces together a whole path.
So under that definition, there should be no such thing as a bridge. But it turns out that the real definition between a network layer versus a data link layer is that a data link layer protocol is anything that’s designed by a committee chartered to do a data link layer protocol. And that’s how bridges get to be called bridges—
Malamud: Let me guess. And a routing protocol’s anything charted by a committee chartered to do that, right.
Perlman: Absolutely, yes.
Malamud: Okay. Well, um…okay, standards designed by committee. As I understand bridges and routers, many of the functions have merged into a single box these days; many devices can can do both. And for a while there was a real religion. There were some networks that were bridge-only networks. A lot of big deck installations were like that. And others were router-only installations and you know, a lot of the kind of classic TCP/IP land was that way. Do both play a role in any network? I mean, should we be looking at both as part of our toolkit?
Perlman: Yes, because sometimes you need bridges because you have non-routable protocols. Like LAT. But um…
Malamud: Should there be non-routable protocols? Is that a design error?
Perlman: No, there should not be non-routable protocols, and as a matter of fact I was not too happy about LAT originally. Because you could do something like telnet, which does the same purpose but it’s built on top of a network layer. But the designers basically said, “Why would anybody want the terminal server on a different LAN than the host?” And so they were absolutely convinced that you would never want to do that and so therefore it was perfectly safe to not bother with a network layer because it would always be confined to a LAN. But of course that’s one of the most popular reasons for bridges, is in order to get things multiple hops.
Malamud: Now, LAT has some nice properties, of course. It lets you multiplex data from several users on the same device going to the same host. Should we have changed telnet to accommodate these needs, or is this kind of an overdesign of the LAT protocol?
Perlman: I’m not that familiar with that kind of thing, but absolutely there’s no reason why you couldn’t have designed all the functionality of LAT on top of a network layer. And any sort of functionality that’s in LAT and not in telnet you— I mean, any protocol can become any other protocol by sort of modifying it, by taking away all the old packet formats and adding new ones. You certainly could have a routable thing that has all that functionality.
Malamud: So it sounds like a bridge is simply a stopgap measure when you have non-routable protocols, and you would prefer to see a router in your network instead of a bridge if you could.
Perlman: Yeah. A bridge is a kludge. It’s a world-class kludge. It’s a wonderful kludge. But it’s definitely a kludge to get around the fact that people built things without a network layer. Now, that’s partially the prob— The problem came about because the people that invented Ethernet did a real good thing. Ethernet is good technology. But they did a really bad thing because they called it a net. And they shouldn’t have called it Ethernet, they should’ve called it “Etherlink.”
Malamud: Or “Etherwire,” or…yeah.
Perlman: Right. And so all these people got fooled. They said, “Wow, this is our entire network. It’s this one wire.” And then they didn’t need a network layer, because all the routing was done by this one wire, because everything in the universe was attached to this one wire. But that was of course very silly, in retrospect, and people shouldn’t have built devices that only could talk—and protocols that could only talk on a single wire.
When the world got entrenched that way and then decided they wanted a way for packets to leap from one Ethernet to another, I sort of felt like well, you know, they deserve what they got. Because they should’ve had a network layer there in the first place. But, there certainly was a market for a magic device that would somehow splice these things together. And that’s where bridges came from. Now, that’s where transparent bridges came from. The notion was that you couldn’t change the installed base, and that you had to be able to talk from one wire to another wire magically, without changing the installed base.
Then along came source routing bridges, which are really a mystery as to how they can possibly be called a bridge, because it was a whole network layer—a very complex network layer. Not only did the end nodes have to be aware that there were store-and-forward devices there, but the end nodes had to figure out the routes. So you had to change the header, and you had to change what the end nodes did. But that did get to be called the bridge rather than a router just because it was done by the 802 committee. And as a matter of fact it’s called a MAC layer bridge, which if you consider the data link layer it’s sort of sub-divided by IEEE 802 into the physical layer, which…well, I guess that’s the same as the physical layer. But there’s the MAC sub-layer and the LLC sub-layer. There’s really no reason for this layering, other than the fact that some people thought that protocols ought to be connection-oriented even on LANS. So that’s why you needed this upper layer. But because the committee that sponsored the source-routing bridges happen to be Token Ring Committee, which is a specific MAC layer device which is sort of lower than the data link layer, bridges got to be called MAC layer devices, which again the only reason that’s meaningful at all is because that was the committee that standardized it.
Malamud: And by splitting up the data link layer into two sub-layers you have room for two committees.
Perlman: Right. Oh, yes. And more committees means more travel, more fine lunches and dinners…
Malamud: Absolutely. That’s our goal in this whole thing anyway.
Perlman: right.
Malamud: Let’s turn our attention to routers. Most routers these days are multi-protocol routers. And most of them will forward several different protocols, IPX and IP, and typically will speak several different routing update protocol languages. Is this a good thing or are routers getting too complicated? Should we be trying to pare it down to a single network layer and a single routing update protocol?
Perlman: A single network layer would be nice. That’s not gonna happen in any near timeframe. There’s absolutely no excuse for multiple routing protocols, though. It just happens that everyone who designed a network layer happened to also design a routing protocol because everyone assumes that their protocol suite is the only one in the universe. And every protocol suite needs a routing protocol, and so they have done this. But there’s absolutely no reason why one routing protocol couldn’t route everything.
Malamud: But aren’t some easier to wor— For example, RIP is kind of— Out of the box it’s braindead, but it’s real easy to put together. BGP4 on the other hand is a more sophisticated type of operation. Wouldn’t it be appropriate to have different routing protocols for different environments, just like we have different data link layers for different environments? Or can one do everything [inaudible]?
Perlman: I think one can do everything. There’s some question about whether you need an inter-domain versus an intra-domain, but there’s nothing so exotic about these various network layers like IPX, IP, CLNP DECnet, AppleTalk. There’s nothing so exotic about any of them that one routing protocol couldn’t route everything. And that’s known as integrated routing; have one routing protocol.
And it’s much easier to deal with because you only have to read one spec. You only have to implement one thing. You only have to configure one thing. And in terms of implementation it’s actually harder, more than twice as hard to implement two routing protocols as one because you have to worry about the interrelationships between them. But then also it just is so horribly wasteful. If you consider link state protocols, one of the things they do is they have routers— Oh, I use “rooters” and routers interchangeably that way. I don’t—
Malamud: Oh, that’s okay. I use tomatoes and tomahtoes.
Perlman: Interchangeably?
Malamud: Well in salads as well.
Perlman: Right, okay.
Perlman: You have to find out who your neighbor routers are, and so all of these protocols have some kind of thing where you periodically send “hello.” You say, “Hello, I’m Radia,” to your neighbors, and they send something to you.
Why on Earth you have to send two different messages to do the same thing because for instance you have to send an OSPF hello, and you have to send let’s say an IS-IS hello, they’re exactly the same kind of thing, just a slightly different packet. And so that’s twice as much traffic for no good reason whatsoever.
Malamud: Those are both link-state protocols, IS-IS and OSPF. Are there differences between them? Are there things that one protocol does and the other doesn’t?
Perlman: Not…terribly…much. I mean, in terms of functionality they really are doing the same thing. Nobody really wants to see the network layer. What they want to do is they want to deposit their packets and have them crop up someplace else where they tell it to. All the routing protocols do that. RIP does that, too. It basically builds a forwarding table, and however a routing protocol does that is fine; once you have the forwarding table you just forward packets. And that’s really where the performance of a router matters most, is forwarding.
There are subtle differences between IS-IS and OSPF. The most significant one is if you’re in a multi-protocol environment IS-IS will route everything, and so you only have to do one protocol, whereas if you insist that if you have IP you have to have OSPF then it means that if you have CLNP you have to also have IS-IS. If you have IPX you also have to have NLSP and so forth. And then you wind up with all these routing protocols basically doing the same thing.
Malamud: What would it take to modify OSPF to handle multiple protocols? I would think in a multi-protocol internet that that would be a desirable goal.
Perlman: Well, the thing about IS-IS which makes it somewhat unpalatable to some people is that everything… There’s a base protocol which has the fields in there. And then everything else is options. So there’s a way of adding options to packets. So if you need a new field you just invent a new option code. And especially if it’s something like here are IPX addresses I can reach, you can just add that as an option and even if there’s routers in your net that never heard of IPX, they just pass the routing information through and the fact that there’s this extra field in there they don’t understand, that’s okay. The rule is if you don’t understand an option you don’t bother looking at it but you certainly keep it in memory and you forward it to the rest of the network.
So the ability to put options in there makes it very easy to extend it and have multiple protocols. OSPF has very much fixed-length fields. So it would be a different protocol if you were to do it, whereas with IS-IS adding new options to the protocol doesn’t really change the protocol.
Malamud: Now, you were quite active in the IETF and and other committees during the development of a lot of the IS-IS work and OSPF. Surely you made this argument to the IETF about why OSPF should route multiple protocols, and the use of options instead of fixed-length… Why wasn’t that adopted?
Perlman: Because…
Malamud: Were other protocols not taken seriously? Was it a feeling that IP’s the only thing that matters?
Perlman: Yes, definitely there was “it doesn’t matter” about other protocols. And so therefore the multi-protocol ability was irrelevant to some people. But most mostly, to tell you the truth, the arguments were not technical. If they had been technical it would have been a lot more clear-cut. But a lot of the people with really strong opinions didn’t understand the details of either protocol. And I would say virtually nobody understood both protocols. And I would think that anybody that really wanted to have a strong opinion would owe it to the world, before having that strong opinion. to make sure that they deeply understood the technical aspects of both.
Malamud: Well, most people’s strong opinions have been busy garnering the opinions and—
Perlman: Right.
Malamud: Reading the documentation is always one of those big steps to actually take.
Perlman: Right. Now, there are some things to be said against IS-IS, which is that it was written in ISO-ese, which is a dialect of gibberish. I mean, it was really very hard to read. It turns out to be a very simple protocol, but somehow it is possible to write anything simple up in a way that makes it seem very complicated.
Malamud: Well that’s the international process.
Malamud: OSPF has been extended to support multicasting, as a part of the routing update process. Does IS-IS also support concepts of multicasting in there?
Perlman: Certainly anything that you could do there in OSPF you could easily do with IS-IS as well.
Malamud: Would you explain [?] what multicast extensions mean in the context of a routing protocol?
Malamud: Well in terms of OSPF, what was done is that… Yeah, right. This I’m not that familiar with, so I’m like treading a little bit on thin ice. But, what happens is that anyone who’s listening to a particular multicast address announces that fact to the router. And the router puts into the link-state packet who’s listening to what multicast addresses. And then when there’s a packet transmitted with a particular destination multicast address, there is a optimal tree built from that source to every possible listener of that.
My feeling is that that’s not the right way to do multicast. That instead you should be… Yeah, you shouldn’t have something where everybody in the universe has to know who’s listening to every multicast address. That just doesn’t scale. What instead you do is that all the routers on the path of a particular multicast have to keep state that there’s a multicast going on. And that is more done in the style of what’s going on today in terms of multicast research, which is the… What does the acronym stand for, Protocol Independent Multicast. I think it was roughly based on the core-based tree stuff, and has been evolving, is still evolving somewhat, but I like that approach much better.
Malamud: You talk about building state into the routers, and there’s been a variety of efforts by Dave Clark and and others to build some form of state into the network layer so we can begin doing things like resource reservation, for example. Do you have any idea how that research is going to turn out? Are we beginning to see a glimpse of how we’re going to do different classes of service, for example?
Perlman: Yeah, I find… I don’t firmly believe that resource reservation is a good idea. The idea of a datagram service is where if it gets overcrowded, then packets get lost at random and your service goes down but nobody’s completely locked out. With resource reservation, the lucky few that get in get great service. But when you reserve resources… Yeah. If you actually believe, like I do, that almost all applications are bursty, then resource reservation, either you have to overbook, in which case you’re back to the old world of datagrams because you will have to throw things away, or else you’re horribly underallocated.
Now, if you’re willing to live with an incredibly underallocated network then you don’t need to do anything fancy at all. If you’re only using 30% of your capacity, then any scheme whatsoever will work fine; all users will get good service.
Malamud: Can you combine those together and say that for this class of asynchronous data we really are going to kind of overbook just a little bit and make sure we’re okay, and for this other class of general purpose if a few datagrams fall on the ground it’s okay?
Perlman: You certainly could do that. It’s reasonable to reserve some bandwidth for the applications that really are not bursty and really do require that kind of service, and then for others…not bother. Yeah.
Malamud: How is that going to fall out, though? I mean, is ATM going to solve all these problems for us or are we going to have to do something at the network layer to support some of these concepts?
Perlman: I’m not sure about that. ATM is sort of a funny case. It declared victory, and I then decided that I really needed to learn what it was. And so I would ask things like, “Well, what do the addresses look like?” And they said, “Well we…haven’t [crosstalk] done that yet.”
Malamud: For further study. To be determined.
Perlman: Right, yeah. And “Well, how do you set up calls?”
“Well, we haven’t decided that.”
You know, “How do you route things?”
“Well, we haven’t decided that.”
And turns out the only thing they’d actually decided at the time that I first got into looking at it was that the cell size would be 48 bytes. And based on this one decision that made nobody happy, ATM was going to take over the world.
Malamud: 48 bytes is a strange number. You know the origin of that number?
Perlman: Yeah, the phone companies wanted it to be no bigger than 32 bites because otherwise they’d have to do echo suppression. And the data communications people wanted it to be at least 64 bytes. And so they picked 48.
Malamud: So the voice circuits will in fact have to do echo cancellation.
Perlman: I believe so, yes.
Malamud: And can we live with 64-byte cells? I mean, does this make any sense at all in a datagram world, especially running at let’s say gigabit speeds, to be breaking that up into little pieces?
Perlman: Well, I thought for a while that that seemed incredibly stupid, to have such tiny packets. But they’re not really packets if you think of them as big bits rather than tiny packets. And you don’t really see the fact that your packet gets chopped up into individual bits as it gets sent to cross an Ethernet. Here you want really see the fact that your packet gets chopped up into 48-byte chunks.
Malamud: In the routing world one of the things that many people have long wanted to have is policy routing. Saying, “Gee, let’s see, this packet ought to go this direction because it’s commercial, and this packet ought to go that direction because it’s a Tuesday.” How are we going to build policy routing, or will we ever do that? Is this an application question or can we expect our network to be a traffic cop?
Perlman: Yeah, that’s tough also. If you look at all the possible things that anybody might ever want to solve, there’s really no way to do a network layer that accomplishes all that, except by perhaps giving you all the information required and letting you calculate your own path and set it up.
Malamud: So source routing, is that how we solve this problem?
Perlman: If that’s the absolute most general problem needs to be solved, then that’s really the only way I can imagine doing it. But if you’re willing to solve some cases, then that’ll be fine. BGP does that. It solves something. It’s not clear exactly what things it solves, what things it doesn’t; there are certainly some clearly useful policy-based routing things that it doesn’t solve, like the ability to pick a different route depending on where the packet came from. So you might have two ways to get to the destination, one which only military users are allowed to use, and the other one which only commercial are allowed to use. If you receive packets from both military types and commercial types, BGP only has you select one path to the destination. So if you pick the one that military types are allowed to use the commercial ones aren’t happy, and vice versa. So that is something BGP doesn’t do, and for a while I guess I was arguing about that. But the designer of BGP had a killer argument whenever I said anything that well, “BGP doesn’t do this.” it It was, “Well, it’s better than EGP. Which nobody can argue with.
Malamud: Can’t argue with that.
Perlman: And we needed to do something, so. It does a lot of policy kinds of things, not everything—
Malamud: Like what? What are some examples of what it can do?
Perlman: Well, it allows you to not pass certain routes around. It allows you not to tell some people that you can really reach some other place. Now, that doesn’t prevent them from actually using you to get the packets through. If they’re smart enough to know that if they send you the packet you’ll get it there, it doesn’t stop you. But it won’t have mindless routing, won’t just automatically find you as a path.
Malamud: So this is a secret trap door; if you know to look behind the brick you can go through through this door, and otherwise you don’t.
Perlman: Right.
Malamud: So this is kind of a policy routing through obscurity type of mechanism?
Perlman: Yeah, I guess so. And then the other kind of thing that it does is it allows you to select which route you’re going to use to the destination on something other than a metric. A metric just…you’re told well, it costs this this way and it costs that that way. Here you’re actually told, “Here’s the path. And now examine it and and decide which one you want to use.”
Now, the problem is that’s awfully complicated for people to configure. It involves things like well, I don’t want to go through this particular domain in order to get there. That’s a fairly simple one. Or, if I am going through this domain I don’t also want to go through that domain. You can do arbitrarily complex things. But—
Malamud: Because that domain is expensive and you don’t— Or that domain is… Why would you want to do that?
Perlman: You could say I don’t want to go through that domain unless there’s no other way to get there, in the case of one domain being expensive.
Malamud: Wouldn’t metrics handle that?
Perlman: Yes, unless you wanted to say, “I don’t want to go through that domain at all.”
Malamud: Mm hm.
Perlman: Well, I guess metrics we do that, too.
Malamud: Well, maybe you could have multiple metrics? One’s a boolean that says “don’t go there” and another is “here’s the cost” and another one is you know, “here’s the throughput metric…,” wouldn’t that work?
Perlman: You might be able to do that with just metrics. You wouldn’t be able to do something like saying it’s okay to go through Domain A or through Domain C, but I never want to go through both of them on a particular path. I don’t know why anyone would need this particular kind of policy, but I’m sure there are things that you can’t just do with metrics.
Perlman: …if you can piggyback off of existing numbers that already exist out in the world, that’s very convenient. And one kind of thing is telephone numbers, which are 8 bytes long. So if you could just take your telephone number (everyone owns at least one telephone number) and turn that into an address somehow.
Malamud: Now, the telephone number is a structured space. It’s a country code, an area code, a local exchange… And so you’re you’re saying we should be configuring our addresses based on geographic or other topology considerations? My address should be…let’s see, in my case Alternet’s my service provider, I ought to be a us.alternet.thissideoftherouter.radio. And then 48 bits for the machine.
Perlman: Right, that’s one way of doing it. And there maybe should be lots of ways of doing it. Telephone numbers is one way of doing it. Or you could go to some central authority and get it. Or you could get get it based on the provider. So there’s nothing wasteful about having lots of ways of doing it because as far as a router is concerned it’s just a pile of bits, and you route to the longest matching address prefix; the longest matching initial part of that address is where you route to. So it doesn’t hurt if there’s lots of ways of getting these things. You want to make it as convenient as possible for people to find—
Malamud: And could we build an efficient router with 20-byte addresses? Could we build efficient protocols with 20-byte addresses?
Perlman: Yes. It’s not really any harder. It might be…5% more difficult. But 5% is meaningless. If if it were three orders of magnitude more difficult, then you might say one scheme works and another scheme doesn’t. But given that you’re only talking about a couple of percent, then you can’t possibly have a solution where it works but if you were to make it 1% slower it wouldn’t work, especially since technology keeps improving.
Malamud: So in a world where we’re I send 30 megabyte audio files out you’re saying a few more bytes on an address just isn’t going to hurt our routers, it’s not going to hurt our machines, it doesn’t matter.
Perlman: Right. Right. And if it turns out that routers can’t keep up with the bandwidth, that people can’t build routers fast enough to route as fast as the links are, the fact that the addresses are a little bit smaller won’t help you at all. What you really need to do is something like ATM, where you first set up a path and then send the packets. And that’s a perfectly reasonable thing to do. And in which case the initial address, The size of the address that you use in the initial packet for setting up routes, is irrelevant for performance.
Malamud: Well, this has been Geek of the Week, and thank you very much Radia.
Malamud: This is Internet Talk Radio, flame of the internet. You’ve been listening to Geek of the Week. You make copy this program to any medium, and change the encoding, but may not alter the data or sell the contents. To purchase an audiocassette of this program, send mail to radio@ora.com. Support for Geek of the Week comes from Sun Microsystems. Sun, the network is the computer.
Support for Geek of the Week also comes from a O’Reilly & Associates, publishers of the Global Network Navigator, your online hypertext magazine. For more information, send mail to info@gnn.com.
Network connectivity for the Internet Multicasting Service is provided by MFS Datanet and by UUNET Technologies.
Executive Producer for Geek of the Week is Martin Lucas. Production Manager is James Roland. Rick Dunbar and Curtis Generous are the sysadmins. This is Carl Malamud for the Internet Multicasting Service, town crier to the global village.
Further Reference
Geek of the Week: Radia Perlman at the Internet Archive