Carl Malamud: Internet Talk Radio, flame of the Internet.
Malamud: This is Geek of the week. We’re talking to Scott Bradner. He’s a consultant on the staff of Harvard University, he’s Area Director in the Internet Engineering Task Force for Operational Requirements, and has also been named in the special interim area looking at the question of the next-generation Internet Protocol. Welcome to Geek of the Week.
Scott O. Bradner: Well, I think I’m glad to be here.
Malamud: Well good. Everyone thinks they’re glad to be here at first. We hope they retain that happy impression of this memorable experience.
You are best known for your router evaluation work that you do at Harvard? Can you describe what you do in this laboratory?
Bradner: Basically what I do is to create an artificial network environment in which I abuse vendors’ products and see whether they can suffer the abuse gladly, and come up with some kind of a consistent measurement of performance in the areas of throughput and packet loss rate and latency so that somebody might be able to compare different products using something other than the marketers’ own numbers, which are…generally tended to be a little bit on the suspect side.
Malamud: And so do you do this…do you break it down by protocol? Do you exercise— Are you doing conformance testing? Are you seeing whether the BGP4 X operation is being done?
Bradner: Conformance is a wonderful thing. Conformance is my reading of the specs versus your reading of the specs. And seeing as you’re the one with the lawyers and the money, I’m always going to lose. So in other words I don’t do anything in the way of conformance. Only do performance. And the performance is I do a wide variety of tests, it’s about 190 tests now, on any particular product. They range from a single stream of TCP/IP or AppleTalk or IPX or VINES IP or whatever, to mix protocols…to multiple streams up to twenty-four or thirty-six streams of parallel data. FDDI. Gonna do some ATM testing next week. So it’s a wide variety of things. Comes up with a tremendous amount of data, which I suspect somebody might be able to figure out how to analyze. I haven’t yet.
Malamud: So you don’t come up with a metric? “This router is a 9, that router’s a 7.”
Bradner: That is something that many people have asked for, because it’s a lot easier on the market droids to be able to use just a number saying “Our router is better than theirs because it got a 3 and theirs got a 2.” And there was a great deal of discussion in a group that I worked with, the Benchmarking Methodology Working Group of the IETF, on doing exactly that. We even came up with a name for the metric—we were gonna call ’em “millstones.” [sp?] But we never got a way to actually do anything with it and to define it in a realistic way.
The problem is that what you want is something that’ll say how this router will work in your network. Well your network is different than the guy next door’s network. You may have SNA and they don’t. You may have AppleTalk and they don’t. You may have a different percentage of this protocol versus that protocol than they do. So there’s no consistent way. There were suggestions we could take a snapshot of some average network and try that out. But the averages don’t work. There’s just too much variety. So we have to come up with these discrete pieces of data, and you have to figure out from profiling your own network whether these pieces of data are useful.
Malamud: So you’re looking at throughput of different protocols and different protocol mixes. Are you looking at things like the different routing update protocols and how they perform?
Bradner: Well let me go back to your first point. I actually take three different measurements. The first I call packet loss rate, and that’s the input offered load against the forwarded traffic. So that if you send in 100 thousand packets a second, how many packets a second come out. And this is useful because you want to find out if indeed you overload a device, is it going to deteriorate. And there were some devices a few years ago…matter of fact there was one even last year—mostly on the token ring side for some reason or other, where if you overload them they deteriorate catastrophically. One particular [indistinct] sent it half a million packets at a rate of 14,000 packets a second and it forwarded thirty packets. This isn’t a thirty packets a second, it’s thirty packets.
And it turns out what was happening is that it would forward packets until it ran out of buffer space, that was about thirty packets, then it would lose a packet, and it would recognize it lost a packet so it’d go off to account for that fact. And by the time it got back from accounting for that fact, it lost another thousand packets. And it would go off and account for that. And when it [recording garbled] from that, it’d lost another 10,000 packets.
Malamud: Seems like a sub-optimal strategy.
Bradner: It did not yield the correct behavior I would expect, but it actually might have been something which would be beneficial in a network if you have an overload condition where you’re getting a catastrophic overload. This would have protected the rest of your network because it would’ve just died in its tracks. So maybe you could—maybe a marketer or two could turn this into a benefit. I wouldn’t call it that.
That’s the first measurement, this packet loss measurement. And this is done for a variety of packet sizes. It’s done for a variety of protocols, and protocol mixes.
The second measurement is something called throughput. This is defined in an RFC—all of these are defined in RFC 1242. The throughput is the maximum rate at which all of the traffic offered to the device is forward. And this gives you an idea of the zero-loss or perfection rate of a router. This can be significantly different than the rate which the vendor might tell you they can forward traffic. If you push packets at a router or bridge as fast you can, and most of them don’t do the catastrophic deterioration, so they will forward out packets at some rate. But they’ll be losing let’s say 30% of the traffic you give them.
If you put that into your network and that rate of which they’re starting to lose traffic is below the rate that you’re going to be offering traffic, then you’re backoff algorithms and your protocols start taking over and actual traffic flow deteriorates significantly.
Malamud: Are there examples of machines in which the zero-loss rate is in fact the maximum rate? Is there great variance in the difference between those two?
Bradner: There are some where th zero-loss rate is very close to the maximum forwarding rate. I don’t know of any where it is actually but it’s within a percent in some devices. Many devices fold over and then go at a pretty much flat…um, no matter how much more you offer this is the rate that they forward out, which is within a few percent of the throughput rate. some of the throughput rate is significantly below the forwarding rate. And that is because there are a number of routers where internal software gets into overload conditions or gets into software updates of some kind—whether it’s sinking the disk in a machine, a Unix-type router or something like that, or updating the clock on your management console. It loses a few packets. And some people might say that’s not very important. You lose 100 packets out of 100,000. If it happens to be your data stream that loses those hundred packets it can make a significant impact.
The example of that is during the startup phase of the T3 NSFNET, there were times when the network was losing about half a percent of its traffic. Which might seem to be quite small, but this was causing user-level perceivable problems such that customers in NEARnet, the head of the technical committee for NEARnet, users in NEARnet were complaining that their user-level programs were much slower. And this was only with a half-percent loss.
Malamud: Why is that? Is that because all the packet losses were within your network and the other networks were doing fine?
Bradner: No, this was across the backbone, and any traffic going across the backbone. We’re certainly not going to claim credit for losing packets in NEARnet; that’s not our job. Packets going across the backbone, traffic going across the backbone was being lost, and therefore whenever you lose a packet it had to go back through transmission and restart. And if you lost two packets in the right sequence, it pushed the transfer rate even lower because of the restart algorithms.
Malamud: So half a percent is no—even though we have TCP and retransmission, a half a percent is enough that the user sees a difference.
Bradner: We were surprised. But we looked into it because of user complaints. So empirical evidence indicates that there’s a problem.
Malamud: So you invented the zero-loss rate test.
Bradner: Well actu— I didn’t invented it, that’s actually came from the work of the Benchmarking Methodology Working Group. A number of people worked on that. I’m not sure who specifically came up with that particular measurement but this was something that was felt to be very important.
The third measurement I take is latency. I measure latency not because I think it’s a particularly valid measurement. Because I don’t. I measure it because everybody in the world seems to want to know it. And—
Malamud: Latency being the amount of time a packet spends in a router.
Bradner: The length of time it takes for a router to process it. Yeah, the time it spends lying around in the buffers. In the general environment of today’s internetworking, routers do processing in the range of a couple hundreds microseconds for a packet. So it’s not very much time. And in fact, it is much smaller than the time it takes to store and forward a packet if you did no processing whatsoever, because it takes a while for the packet to travel over the wire. And in that context the actual latency induced by the processing of a router tends to be a small percentage of the latency going across a network.
Malamud: So the fact that we’re going through sixteen hops to go from one coast to the other isn’t really introducing a significant…amount of latency.
Bradner: It is producing a significant amount of latency but it’s not because of routing processing time, it’s because at each one of those hops,you have to receive the whole packet before you can start sending it. So every time you have to do a full packet store-and-forward.
Malamud: So how long does the packet stay in a router? Gust give me an order of magnitude here.
Bradner: It’s in the range of below sixty-four microseconds to about 400 microseconds in the…that’s the more—modern routers are in that range. Now, that’s quite small.
And another factor on latency is that for most protocols, particular protocols with windowing like TCP/IP, it has very little effect on user-perceivable behavior. Because it’s a windowed protocol that has more than one packet outstanding on the network at any one time, it doesn’t care if the network is a little pseudo-longer because there’s longer latency someplace in the middle.
On protocols like old IPX where you had to receive an acknowledgement for every piece of data sent, then the latency could be very important. And getting devices with very low latencies, or even cut-through devices where the packet is started to be forwarded before it is fully received, could make a significant impact on the performance. Lotus 1−2−3 loads a lot faster with a cut-through device.
Malamud: Scott Bradner, we’ve been looking at routers and the performance testing work that you do at Harvard University. Many routers have a few routes in their routing tables. They know about a few internal networks and they send everything out to their service provider. But there’s maybe thirty, fifty, a hundred routers in the world that need to know about…the whole world. And those routers, for them the size of the routing table begins to make a difference. Does the size of the routing table affect the performance of a router?
Bradner: That I’m not sure. Mostly because it’s very difficult to simulate the environment which those rotors reside in. It’s a lotta typing to put in on all of those routes into a simulation setup. I did run a test on a vendor’s router just recently, but it was for the ability to support a large routing table rather than performance when that routing table was in place. It is something that I’ve thought about doing and in the past have not done because the difficulty in maintaining the routing table—you have to send in updates on a periodic basis—and just processing those updates can have a potentially significant impact on the performance even if you’re not making any changes in the routing table just repeating it every X period of time. Though with some of the modern protocols like BGP you don’t have to continually update it. But it’s something I do hope to do in the future.
I am looking at ways to find out the effect of producing routing updates on performance. If indeed you’re forwarding traffic at some rate and then suddenly get a RIP update or an OSPF update and that causes some permutation in the routing table, what effect does that have on the forwarding rate.
Malamud: That’s actually a significant effect we’ve noticed on the multicast backbone on occasion, in which your audio and video was going along just fine, and every ninety seconds an update occurs and you find a little bit of your data goes away. How does one test for these types of environments, and how do you fix them? Is it a tuning problem? Is it a problem in the initial configuration of the router, the design of the router?
Bradner: It’s not generally the design of the router because the router is just transistors, or now integrated circuits and plugs. It’s…the routing protocols have an effect there. There’s always been a…certainly an interesting question about what the effect of a “minor” routing update change to a large OSPF network which causes the entire fabric of the network to realign, the time it takes to run the algorithms on that—the Dijkstra algorithms—on the routing table and what that impact would be on forwarding. This is something that a lot of people would like to look at, and so far it’s been something that I’ve most mostly wished to do rather than do. Because it is difficult to set up the environment. Something I do plan to do in the future, though.
Malamud: One of the big issues in router design is the size of the routing tables. And we’re currently looking at routers in the leafs that are maybe 16 megabytes of memory, and in the core of the network that’s even 64 megabytes of memory. Are we going to be able to stop the growth of the routing table, or does that even matter? Are we just gonna get more and more memory like we do on our computers?
Bradner: Well certainly some people think that memory is cheap and you can just keep growing that way but it has been pointed out that the size of memory doubles every three years and the routing table has been doubling every two years, so as a long-term strategy that probably doesn’t work.
The answer is as CIDR is deployed in the backbone, which is the route aggregation process, this growth should change significantly. There’s been very recent work in the CIDR deployment that looks extremely promising. The last major pieces have fallen into place and route aggregation is going to start. And there is no reason to expect that we’re going to be faced with the kind of crisis which is going to require major renovation of the backbone routers, as long as we can start making real progress in getting the existing route table aggregated. That reduces both the absolute size of the existing table and significantly reduces the growth of the routing table space.
Malamud: How does CIDR do that? Maybe you can give us a brief explanation of CIDR and why it’s going to change our routing tables.
Bradner: The simplest way to put it is that in pre-CIDR days, if you had let’s say 512 Class C networks and you wanted people to know where you were, you had to advertise 512 Class C networks, which would cost 512 entries in the routing table. With CIDR this is advertised in a way which allows a single entry to be put in the routing table rather than the 512, thereby reducing the size of the advertisement considerably. And in addition to that, since you would have received, obtained, your addressing from your network provider, and they would have provided it out of a block of addresses that they had obtained from the IANA, then all or at least all of the routes from this provider that were CIDR-capable could be theoretically collapsed into a single routing table entry.
Malamud: So instead of handing out addresses randomly, we’re doling ’em out by the structure of the network, essentially.
Bradner: Well we’re doling them out two ways. One is by the structure of the network, so that providers get large clumps of addresses to hand out. And second of all those are being handed out in a logical fashion which allows the segregation. Basically they’re being handed out and powers of two. Powers of two Class C networks. Because that way you can aggregate them and just make them look like one entry. CIDR stands for classless inter-domain routing, and the whole point is that you’re no longer treated as Class A, Class B, Class C, you’re just treated as effectively a bit mask over the address space.
Malamud: So there’s two issues. One is how we hand out addresses. The second is taking advantage of that in the routing announcements.
Bradner: That’s correct.
Malamud: And where’s that implemented? Is CIDR a protocol, or do we see that come in someplace else?
Bradner: The first of those two is how we hand out addresses. And the process of handing out addresses by the Internet providers has been using CIDR logic for the last year or two. So we have a lot of already-assigned addresses, a lot of the growth—recent growth—in the Internet has been along the lines that are CIDR-capable.
CIDR itself is implemented in the Exterior Gateway Protocol, a protocol that’s used to exchange routing information between regional networks and the backbone, or between autonomous systems, in particular it’s BGP4—Border Gateway Protocol number 4 or revision 4. This is used in the backbone from the NSFNET to talk to other regional networks. Or…NSFNET’s not a regional network… Well actually, maybe it is in the global sense. But in any case, it’s the interchange of the summation routing information between providers, whether they be NSFNET or AlterNet, or NEARnet, or the European networks, they exchange information with BGP4 and that allows them to make use of the CIDR aggregation possibilities.
Malamud: Scott Bradner, we’ve been talking a lot about the engineering of a global Internet, and routing protocols, and how do we make IP traffic flow efficiently from one place to another. Yet most of the nodes in the world don’t run IP, they run IPX from Novell, or they run DECnet, or they run a variety of protocols. How are we going to support all these node out there? Will the Novell people have to shift over to IP? Will we shift over to IPS? Or will they both somehow coexist?
Bradner: Well it depends what you mean by support. If you mean by support that I can sit at my desk running a Macintosh running Unix running sendmail, and send email to the person two cubicles down that happens to be on a PC running on Novell…we support that now. We do that through gateways. We can do that through application-specific gateway. Email is certainly the most common one, there’s potential for other types of application-specific gateways. This is how you get to BITNET, IBM mainframe-specific kind of networks, or AppleTalk-based networks, or Novell.
So if you mean support by the ability to communicate, particularly in a non-real-time fashion, we do that through functional gateways. And we’ll continue to do that through functional gateways. Not only because the underlying architecture is different, the protocols are different, but because some people believe this is a good way to introduce security, or additional security…some security, into the Internet structure. Because by putting in an application gateway of this sort, you only pass the kind of function that you wish to allow, and keeping out the kind of function you don’t, ie. the people that are trying to peer around and find your family jewels. So, we’re gonna support that kind of thing that way.
There is an additional question, though, of if you want to support the kind of thing which doesn’t go through an application gateway terribly well, or whose function is not supported on the underlying protocol—running Mosaic over IPX, for example—you could do it by building a version of Mosaic that ran over IPX that went through a gateway. You could do by figuring out a way to encapsulate TCP/IP over IPX and put that through a gateway and strip off the TCP. And that’s what’s done in the AppleTalk gateways to Ethernet, for example. When you run TCP/IP on a Mac, it encapsulates at least one form of that, encapsulates the IP in AppleTalk packets, and then they get stripped out and turned into regular IP in the gateway. We could do it that way. Or we could migrate the nodes to a common infrastructure, to a common protocol infrastructure. And the IPng effort in the IETF is trying to—
Malamud: IPng is IP Next Generation.
Bradner: Yes. IP Next Generation. This was done at a time when certain new science fiction shows were showing up on the TV, and it consciously chosen in this way; the title was.
We’re trying to consciously take into account the various ways that you could grow and what the requirements are on a more global data networking interconnect phase requirement. Instead of looking at the Internet simply as this collection of TCP/IP LANs, looking at the Internet as the future data networking needs of the globe, not limited to any one protocol. But that doesn’t mean that the IPng area or its Area Directors are egotistical enough to believe that we’re going to figure out a way to convert every IBM mainframe and every PC in the world to a single protocol, because that’s not going to happen real soon now. But we want to take into account those requirements so that in case, and in places where it is feasible to migrate the end systems to a common underlayment in order to provide a common set of services, that can be done. The addressing will be sufficient to do it, the routing stability will be sufficient, the scalability was sufficient. And the security will be sufficient.
Those are all things that we’re trying to take into consideration. We put out a call for whitepapers using RFC 1550, and we received a number of them from a wide range of organizations and individuals around the world telling us what they believe the requirements are in this area in order to be able to support this sort of thing.
Malamud: Now, you’re assuming a single global Internet. And for a while there the trade press got on a little kick which said that Novell is gonna invent its own Internet, and our Internet will go away or will have to somehow compete with their Internet. Is Internet a network, or is it something more fundamental?
Bradner: Well that’s why I very carefully phrased it as that we’re looking at the Internet as the data networking needs of the globe, rather than tying it specifically to any particular protocol. There is…whether this particular IPng Area Director believes that a large IPX Internet will grow up and be a viable commercial enterprise or not…and there’s certainly some pressure for some things like that…and run that in parallel with an IP Internet or an IPng Internet, that does not change the picture that you still would need to cooperate between them.
Malamud: You still have to interconnect.
Bradner: You still have to interconnect. You’re not going to get either side—if sides be the right term in this kind of conversation. You’re just not going to get either side to admit that the other has won to the extent that they’re going to convert all their boxes, not necessarily because they don’t think that the other side has won, but some of those boxes will never be converted simply because no one knows how to run them anymore. And they’ve just been running because some graduate student set it up three years ago and went off into the outer dark and whoever has it doesn’t know them anymore. So there’s some environments that simply will never change.
Malamud: So we’re never gonna have a single internetwork protocol.
Bradner: We do not have a single Internet protocol now,—
Malamud: And we’re never gonna converge on one.
Bradner: And we can’t convergent one by definition, actually. The IPng effort is defining yet another Internet protocol. It certainly is hoped that we define one that everything could use. But, there will be IP version 4, the existing generation of IP, for many many years now, in real ways and for the forseeable future. We have a great deal of inertia in the knowledge base of the market as to what they can operate what they can do. We have another problem, though, is let’s say you were a vendor and you were going to come up with some software that ran on a server. And you’ll get the option of implementing this server in IPng where there are…a hundred thousand hosts. Let’s say this is a couple years down the road. Or you could implement it with IPv4, where there are…20 million hosts.
Malamud: Gee. Bigger market, smaller market. Which should we pick? [both chuckle]
Bradner: So, there’s not only an inertia in terms of the install base that’s there but that’s in development. There’s an inertia in development that will tend to go along the same lines. Now, some of the IPng proposals have ways to mitigate this by using methodologies by which you could build a server that could deal with both types of clients—IPv4 clients and IPng clients. That won’t necessarily make the IPv4 clients go away. But it would mitigate the problem of providers providing only IPv4 services.
Malamud: So we should be prepared for a messy world.
Bradner: We’re in a messy world now.
Malamud: Well there you have it. This has been Geek of the Week. We’ve been talking to Scott Bradner. Thanks a lot.
Malamud: This is Internet Talk Radio, flame of the Internet. You’ve been listening to Geek of the Week. You may copy this program to any medium, and change the encoding, but may not alter the data or sell the contents. To purchase an audio cassette of this program, send mail to radio@ora.com.
Support for Geek of the Week comes from Sun Microsystems. Sun, The Network is the Computer. Support for Geek of the Week also comes from O’Reilly & Associates, publishers of the Global Network Navigator, your online hypertext magazine. For more information, send mail to info@gnn.com. Network connectivity for the Internet Multicasting Service is provided by MFS DataNet and by UUNET Technologies.
Executive Producer for Geek of the Week is Martin Lucas. Production Manager is James Roland. Rick Dunbar and Curtis Generous are the sysadmins. This is Carl Malamud for the Internet Multicasting Service, town crier to the global village.