Carl Malamud: Internet Talk Radio, flame of the Internet.


Malamud: This is Geek of the Week. We’re talking to Ron Frederick, he’s a research scientist at Xerox’s Palo Alto Research Center, and welcome to Geek of the Week, Ron.

Frederick: Thank you very much.

Malamud: You are the author of nv. Why don’t you tell us what nv is and…why would I want to use it.

Frederick: Okay. NV is short for “network video.” And it’s one of the tools on the Internet right now which allows you to do multimedia over the MBone, the multicast backbone that’s been set up. And the design is arranged such that if you have a workstation, you can pretty much receive the video just using your regular workstation screen, no special hardware required.

Malamud: So this is video as in…TV, right, not just pictures and satellite images and things like that.

Frederick: That’s right. It’s video at a slightly slower rate and slightly smaller size than your television set would give you in your house. The frame rates and such are limited by processing power and by what the network can deliver.

Malamud: Well how good a video is it? You say it’s reduced resolution and reduced frame rate. How reduced compared to a television image would video on the Internet be?

Frederick: The standard video that we’re transmitting around right now is about five frames a second. And it’s sort of—the size is what you might call a quarter size of television resolution. It’s close to what you’d get off of a VHS VCR in terms of picture quality.

Malamud: And what kind of bandwidth are we talking about here to do this?

Frederick: And that particular stream typically takes about 128 kilobits worth of bandwidth. So it’s about double what an uncompressed audio stream would take up on the network.

Malamud: And what’s it gonna take to turn it into…or is it ever going to be full-motion video in full color and HDTV and all these things? Is this the first step?

Frederick: This is a progression. As the machines continue to get faster, the same technology that’s allowing this video to be out on the network will allow better quality. Today if you have a DEC Alpha workstation and a video card you can actually push it up to about twenty frames a second at that size. And as you get another step, maybe another factor of two in performance you’ll be up at the full rate. And eventually up at the larger sizes as well. Of course you’ll need more bandwidth on the network to do that. But we’re installing more and more bandwidth every day on our backbones and on our regional network. So eventually I do think that this technology will push all the way to television quality.

Malamud: Now you said to receive the video you don’t need any special hardware.

Frederick: That’s right.

Malamud: So how does this work? I thought you usually do video with special codecs and devices like that. You’re doing this all in software, then.

Frederick: Yes. The encoding and the decoding of the video’s completely in software. And the only thing you need at the encode side therefore is something that will take raw television images out of a camera or a VCR or something like that and turn them into digital bits. So you need a frame grabber or some other very simple video card. And the software will take care of the rest.

Malamud: And what kind of including do you use for the data?

Frederick: The primary nv is something I designed myself. It’s a combination of doing differencing from one frame to the next, so that it doesn’t always have to transmit the whole picture every time, and then a compression scheme which is sort of a simplified form of some of the stuff JPEG does. It’s a wavelet-based scheme.

Malamud: And does the compression scheme… Is this lossless compression or do we lose quality? Are there artifacts based on the way you do your encoding?

Frederick: There are some artifacts. The most noticeable artifacts are caused by the frame differencing. The fact that even though a block has changed a little bit it might not send it right away. But the encoding is designed such that if you end up ever going back into a still image, it will eventually produce a perfectly clean full high-resolution picture. And so it’s only a matter of whether or not it’s moving too quickly to ever get to that state. But the background and some of the other things that aren’t changing the image will get to full resolution eventually.

Malamud: Now, you mentioned the standards such as MPEG. How does that relate to the encoding scheme? Why aren’t you using MPEG, for example?

Frederick: Well in fact for the latest generation of nv I looked very seriously at JPEG, or something like it. JPEG is the still version of the MPEG standard. And the answer there was that currently JPEG is too expensive to really do in software on the machines that we have today.

Malamud: Why are you—

Frederick: You can do it, but MPEG decoding for example is much cheaper than MPEG encoding. And so you can’t do a real-time encode, even though you can take a file that someone created for you and play it back at a fairly nice rate.

Malamud: Now you said you looked at JPEG, not at MPEG as a standard. Why would you be looking at a still photo standard for doing full-motion video?

Frederick: Um, what I was intending to do with JPEG was actually take some of my frame differencing, but then just use the JPEG standard for sending the data which had changed. And the reason for that as opposed to just using MPEG was that MPEG was so expensive to encode. So I sort of started with the most expensive standard I knew of that was really design for this, but designed to require hardware, and tried to step back one step at a time to see what was possible in software. And JPEG was easier than MPEG but not easy enough. When I replaced certain pieces of JPEG, it finally got easy enough that I could actually do a reasonable encoding in real time in software.

Malamud: Now, there’s another standard, CellB. How does that relate to what you do?

Frederick: That’s right. And in fact the latest version of the nv tool will actually encode and decode CellB as well as my own format.

Malamud: What’s the difference between CellB and what you invented?

Frederick: CellB is something that can’t—that was developed over at Sun, and it was actually designed with many of the same goals in mind as to what nv does. It has a very similar idea about only transmitting frame differences. And it has a slightly different technique for how it actually encodes the portions of the image that have changed. The main difference between the two of them is that the CellB standard doesn’t have a high-resolution mode. So it’s much better than nv in terms of compression ratio and frame rate, at the same bandwidth. But, if you just sort of end up with a still image at some point, the CellB image doesn’t crisp up to the full resolution of what the camera’s grabbing in the way the nv encoding does.

Malamud: There’s another set of video tools on the net, known as CU-SeeMe that are made for Macintoshes. They don’t use native multicasting. The way CU-SeeMe works is you know, they send a video stream to a reflector which then sends it back out to the other participants in the conference. Now, you’re using multicasting. Can you describe those two approaches and whether in fact they’re going to merge?

Frederick: Uh, okay. Sure. I’ll say a couple things about CU-SeeMe first, and that’s that there was actually an interest at one point in allowing the CU-SeeMe reflector to talk multicast as well. And so nv supports CU-SeeMe decoding and more recently encoding as well. But the reason it needed the reflector was primarily that the Macintosh doesn’t support native-mode multicast yet at all.

Malamud: Is the idea of having multicasting on a Mac at all feasible?

Frederick: Oh, very much so. And Apple has promised that it in some fairly soon future release of Mac TCP it will be there, and other third-party vendors for IP on the Mac have said the same thing.

The reflector has a lot of nice properties. I mean, it gives you a little bit more control than just sort of raw multicast would give you. On the other hand, it’s much more expensive on the network. What ends up happening is all of these streams end up hitting the same Ethernet and go into this reflector, and then if you have five participants in the conference you have five times as much bandwidth leaving the reflector to go back to those participants. And so you run out of network bandwidth much sooner than you would if you actually had native-mode multicast.

Malamud: Now, you said that nv has CU-SeeMe decoding int it. Does that mean the that a person on a Sun or other environment that has X Windows support on it can participate in a CU-SeeMe session? Is that what that means?

Frederick: In the current implementation that’s out there now, nv did CU-SeeMe decode only. And so someone who was on a Sun could watch a stream that had been transmitted from a Macintosh by having the reflector send out a copy to a multicast group at the same time it was sending it out to the other participants. In the more recent release I added an encode step as well. And there’s a little bit of work that needs to be done in the reflector to convert from multicast back into unicast for the Macintosh participants. But yes, in the very short term we’re hoping that a user on a Mac and and a user on a Sun could actually talk back and forth to each other.

Malamud: Your tool nv and CU-SeeMe both are video-only solutions. Why aren’t you doing audio as an integral part of this tool?

Frederick: The intent of writing nv was to add to the existing research efforts on the MBone. And there was already someone, in this particular case Van Jacobson and Steve McCann at LBL, doing a very nice audio tool. So rather than trying to duplicate their effort, I wanted to write something that would be a companion program. And in typical use, you actually run a vat and an nv side by side. And you don’t get the synchronization between the audio and the video. At least you don’t get explicit synchronization. But, it’s sort of good enough for now, and the protocols we’re designing have the right information in it so that eventually you could do lip sync and all of the other things that you want it to do.

As far as a professional, polished application long-term, I definitely see a case where you have a single user interface that provides both video and audio. It’s just more expedient for the research to keep them separate.

Malamud: Now, you said that the protocols you’re designing will eventually let these things sync together. Is there an underlying protocol that both nv and vat use?

Frederick: Yeah. There’s a particle called RTP, which stands for Real-time Transport Protocol. And that’s something that’s being standardized in the IETF community right now. It supports both video and audio, and could be used for other things as well. For example, a shared whiteboard application might imagine using RTP. And—

Malamud: Now, is RTP a replacement for IP? Is it a replacement for TCP? What level of the network stack does this live?

Frederick: It can run on top of several different transport protocols. The applications that we’re using right now, almost all of them run RTP on top of UDP, on top of IP. But it’s designed so that if you have a fairly reasonable end-to-end transport protocol you can layer RTP on top. In that sense, calling it a transport protocol is somewhat strange. It’s almost more of a presentation layer protocol, if you believe in the OSI stack.

Malamud: And what does it do? I mean, obviously…we’re familiar with something like TCP does. What does RTP add to the UDP datagrams?

Frederick: What it’s intended to do is provide synchronization information, timestamps, sequence numbers, um…some common way of expressing the format that the packet might be in, what kind of video or what kind of audio it is. And enough information about the participants that you can form a lightweight conference of some sort where everyone knows about everyone else and knows how to decode the streams as they receive them.


Malamud: Ron Frederick, we’ve been talking about video and audio on the Internet. The current structure for the MBone allows anyone to bring in nv, and if you have the right video card anyone to become a TV station.

Frederick: That’s true.

Malamud: And there’ve been times when two or three video streams at the net at the same time, and the MBone starts to collapse a little bit. Is this just a temporary growing pain or are we gonna have to solve the problem? Are you gonna have to have a license in order to be able to transmit nv?

Frederick: [laughs] That’s a very interesting question, and you’re right, we have had problems already about some kind of congested overloads when we have several different things running at once.

I don’t think you could possibly do something on the order of an FCC license in a world like the Internet. It’s just really not feasible to do that sort of thing. But there are definitely efforts going on in the community to try to deal with resource reservation issues. And there are also efforts going on on a slightly different tack of, if you notice from the reports you’re getting back from your receivers that you’re getting a large amount of loss, you can adapt the amount of bandwidth you use, so that everyone—if they all play nicely—can actually produce something that will get reasonable results.

Malamud: Is this automatic, or is built in…is this something that the person has to do who’s doing the transmitting?

Frederick: In that second case that’s something automatic in the tool that they’re using to transmit. So, if there are five people in a conference and a sixth one joins, all six of them might start using slightly less bandwidth than the five were before. So the total amount of bandwidth remains relatively constant. And could even vary over time based on other loads that’re on the network.

So, both of those solutions are being looked at. And the resource reservation solution is much nicer in terms giving you guarantees about what kind of quality you can get, but has a lot of additional problems about well, what stops me from just asking for a reservation for 100 megabits, or a gigabit, right? There need to be other factors that cause me not to reserve more than I actually need, or could afford, or something like that. So those are much harder problems.

Malamud: And how are we going to tackle those? Do we have some indications of…I know people like David Clark have been examining these issues. What are some strategies that you think look promising?

Frederick: It’s a really tough issue. And I think long term the only right solution is going to be an economic one. That you’re going to have to start looking at this as you pay for what you use. If not, you pay for what you ask for. And I don’t really see that the Internet could grow uncontrollably without eventually involving something like that. But I do think there are good technical solutions that will get us by in the short term that don’t require all of that mechanism. And it’s important that we continue to research this as well.

Malamud: Some people say that doing TV on the Internet is silly because we only do bad TV. And we should just stop doing that and do electronic mail or something else. Are you… Do you see a day when television streams run over a general-purpose Internet infrastructure?

Frederick: I guess—

Malamud: Is CNN gonna use this mechanism that you’re working on now?

Frederick: I guess I don’t see why the television broadcast industry is any different than any of the other things that we’re talking about doing eventually. Putting telephone on the Internet is something else that has come up in the past. And putting other types of regular data services but at much higher bandwidths on the Internet has come up. And it really seems to me that having different infrastructures for each of these things doesn’t make any sense. That if eventually all of our cameras are digital…then it’s all bits. And why should those bits be carried in a different way just because these are television bits instead of high-quality x-ray image bits, or god knows what else. So, I do see, long-term, an integrated network of some sort of making a lot of sense.

Malamud: I read an article in one of our new trade press publications about the Internet. And the headline was “MBone Offers Low-cost Tool for Business Communication.” [Frederick chuckles] The implication being you would place your phone calls over the Internet. Now, you and I know that it’s much more expensive right now to use the Internet to place a phone call. Certainly as a system cost if not the cost to the user.

Frederick: Yes.

Malamud: Is it going to be cheaper to run a voice call over the Internet than it would be in the underlying infrastructure? Just pick up the phone and dial?

Frederick: I think that if you had an integrated network, there would be very nice economies of scale. That the fact that you had all these very very high-bandwidth links between all the different sites, and you had a common high-bandwidth link that could be shared for lots of different amounts of data, would allow something as low-bandwidth as a phone call to go through incredibly cheaply. You could almost fit it into the holes that exist in high-bandwidth streams. You know oh, file transfer I started will take ten extra seconds because I decided to talk on the phone at the same time. But we used the same link and actually there weren’t any additional resources beyond that ten extra second delay.

Malamud: Right now we live in a world in which our demands for bandwidth are truly infinite. We will take anything they’ll give us, right now. You got a T1? I want a T3. Is that gonna peak? Is there some break-even level there where people stop asking for more bandwidth and we’re able to steady-state this thing?

Frederick: Well, for any given application, there are human perception limits at which more bandwidth doesn’t help. So, it’s certainly true that a video stream is going to get to a point where you’ve got so much bandwidth that there’s no point in using more for that one video stream. But, I really don’t see a limit from the long-term perspective of what we want our networks to do.

Malamud: Well an HDTV stream is 22 million bits per second, I believe? Somewhere in that range?

Frederick: In the compressed case that’s about right, yeah.

Malamud: Okay. And so…an individual home, what’s gonna be the…a hundred million bits per second, is that enough into the home, a billion bits per second?

Frederick: Well, if you really want that file which is across the country to become something that’s on your local machine and you want it in one second, and you can use an arbitrarily large amount of bandwidth to get it there, right, the question is how long are you willing to wait for the services that you’re asking for.

Malamud: And how much you—

Frederick: As well as how many services you want to be able to run at the same time. And so, if you could give me two gigabits to my house I’d be pretty happy. I won’t say that I would be happy forever. There may be some application some day which will want more than that.


Malamud: Ron Frederick, we’ve been talking about audio and video and the potential for doing video conferences on the net and audio conferences. Right now, that’s a fairly limited audience. It’s a very technical group that uses the MBone as a general rule. There’s occasional times when things like the Global Schoolhouse Project would show up there.

Frederick: right.

Malamud: Um… Is there a time when this is going to be a tool for shared work? You’ve talked about some projects at Xerox PARC like Jupiter. Maybe you can describe some of the visions of where these tools might be going.
19.48
Frederick: One of the problems with the tools as they exist today is that the tools themselves were written by very technical people with other very technical people in mind to use them. They’re not all that comfortable if you just sort of want to walk up to the machine and say “I wanna talk to him over there.” You have to know all these strange things about the way the MBone works, and make sure you pick up the tools via FTP and install them at your local site and so on and so forth. And if you try to present that to a slightly less technically-inclined audience it’s easy to turn them off.

So, one of the research projects at Xerox is to try to take that technology and hide some of its rough edges a little bit more. Provide a simpler user interface, a simpler model for how all this works. So that it feels a lot more like how you interact with people normally in your everyday work environment. And Jupiter is a project to take some of the things that we’ve learned about the virtual…”social virtual reality” is sort of the generic keyword that we often hear in that case. Some of the things that we’ve learned about the text-based conferencing systems that provide you with a virtual world that you can wander around in and meet people and talk to them in, and add the advantages of multimedia to that.

Malamud: Is it the MOOs, and the MUDs

Frederick: The MUDs and the MOOs and all of that sort of thing, yes, exactly. So, what we have on Jupiter is a MOO as the very base environment—

Malamud: MOO who stands for what?

Frederick: MOO is “MUD, Object-Oriented.” And MUD is “Multi-User Dimension,” or “Multi-User Dungeon” if you believe its gaming origins.

Malamud: So in the text case, you say things…you get a description in front of you saying “You are standing in a hallway,” and you type in “Open the door,” and it says “You’re in a room with a guy named Joe,” and you say “Hello, Joe.” That’s the MUD, right?

Frederick: Yes, that’s a pretty good description of it. If you’ve ever played the old Zork adventure games or things like that, it’s a lot like that except that instead of only having you in the world with some computer-generated characters you have more than one real person. And so, yes, as you wander around, you go north and it’ll give you another description. It’ll say “Somebody is here” and you can then speak to that person. And whatever you say, your text will appear on their screen and vice versa.

Malamud: So Jupiter is a MOO on steroids?

Frederick: To a certain extent, yeah. What we’ve done with it is we’ve taken that environment as the base, and we’ve added to it audio, and video, and a graphical user interface where instead of just being able to run an X application on my local machine which only I can operate, a window will appear on my local machine whether I’m running X or a Macintosh environment or a PC Windows environment, whatever, that this centralized system that controls all the different users in the world is controlling. so, I might open a whiteboard which is an object in my virtual office in the system. But someone else is also on the system and also in my virtual office, they can open that whiteboard, too. And if I make a mark on it they see that mark.

Or there might be an away board which says who’s on vacation and gets updated dynamically in this world. And if I tell the system to open the away board a little window pops up that shows me everyone who’s away.

Malamud: So it’s a shared workspace moved on to your computer screen.

Frederick: That’s right. The hope is that eventually this will become simple enough and comfortable enough that I could actually sit at home and it wouldn’t be any different than me sitting in my office, for the people at least that are at the other end of the building that would have a hard time visiting me in person anyway. In fact we’ve had conversations where two people are at home, the third person might be in the office, and the fourth person is somewhere else off on the Internet. And we’re all talking to each other just by wandering around in this virtual world. We get shared audio channels just because we’re in the same room. And there isn’t any special buttons we have to push or special programs we have to run to get it.

Malamud: So your screen has a map of the office, and you click on a room and you go in there, and you see people? I mean, do you actually see their images?

Frederick: In the case where we’re running video, yes. You see some low frame rate version of wherever they currently are if they’re transmitting video. At the very least you see their name, and if they start speaking you see some highlight around that name that indicates their speaking.

Malamud: And this is something real. I mean, you actually uses at that PARC.

Frederick: That’s right. It’s implemented and running today.

Malamud: And can you use it from your home? Is this too much bandwidth to be able to do this from home?

Frederick: We are able to do it from home over ISDN. What we’ve got set up to several people at PARC is roughly a hundred kilobits’ worth of bandwidth between work and home. And that’s enough for one low-bandwidth audio stream, one low-bandwidth video stream, and all the rest of the stuff that needs to happen to make the windows and the text work.

Malamud: Now, you’re talking about this world at Xerox PARC. In the Internet Engineering Task Force people go to meetings three times a year, and there is some effort to use the MBone to actually you know, send some of the signals out. But the people on the remote ends are really viewers, they’re not participants.

Frederick: That’s true.

Malamud: Is the technology you’re talking about in the Jupiter project something that we can scale onto a global Internet? Or is only going to be within let’s say a small corporation or you know, kind of an isolated island?

Frederick: We hope that the bandwidths required, especially since we’re working over such low-bandwidth things as ISDN, are things that the global Internet could handle. But it does have the same scaling problems that the current MBone tools do in that if you have a hundred of these streams or a thousand of these streams all running at the same time, you need a lot of backbone bandwidth to make it all work.

And there are a lot of other technical problems we have yet to solve in terms of echo cancellation, and…audio is hard, as you probably know from running the show. And so it’s difficult to make it really as comfortable to work from home or you know, instead of attending the meeting in person to sit in your office and try to attend, today. It’s difficult to get the same benefits as you would being there in person.

Malamud: But it’s still usable. It’s like the early modems, the 300 baud modems that were…you know, if that’s all you had you kinda get used to ’em I guess.

Frederick: That’s right. And the intent of the research is to make it more and more comfortable, to make it closer to actually being there.

Malamud: Now one of the things I noticed with 300 baud modems is they were fine as long as I didn’t have anything else. And the minute I got up to 9600 bits per second, 300 was too painful to even bother using. I mean, I would occasionally if I really really had to. You work at Xerox PARC and you’ve got all these wonderful tools. What do you do when you go on the road? Can you no longer get any work done, or are you able to fall back and—

Frederick: That’s an interesting question. And I have to say that so far, the MBone has not become quite as much of my everyday life as something like email has. I mean, when I go on the road if I’m away from email for two weeks I go nuts. That really has become something that I absolutely need to get my work done. And right now, Jupiter and the other MBone tools or things that are very very convenient. I mean I can have very productive conversations with people on the other coast or down in Los Angeles or whatever that probably in the long term have saved me a few plane trips. But, we can still do the work via email if we have to, it’s just sometimes not as convenient.

Longer term, that may change. I mean, it may get to the point where you sort of assume that IETF has…since its only meeting a few times a year and that’s not often enough, you have to have something like the MBone to really continue to do work as effectively.


Malamud: I’ve heard arguments that the Internet is very nice but we’re spawning a group of elite people. That most people don’t even have email, how can we possibly be talking about these other things. Are you finding you can still relate with an engineer who doesn’t use email, for example? Can you work with a computer scientist that doesn’t have electronic mail?

Frederick: I can, though I have to admit it’s somewhat painful. I’ve dealt with some of the support people at various computer vendors, that the only way to reach them is by calling this 800 number and waiting on hold for twenty minutes, and only finding out after all that time that they’re actually not in their office after all, and you have to leave them a voicemail message and they’ll call you back two days later. And comparing that to just spending a minute sending them some email which gives them the same information…there’s no contest. I mean, clearly email is the win.

Malamud: Now what about the email-to-MBone problem? Are we creating a core of elite on the Internet that have MBone access and forgetting about wide-spread computing?

Frederick: I would hope not. I think the efforts toward things like the National Information Infrastructure, and the deployment of networking technologies like ISDN and so forth are raising the lowest common denominator. You know, it’s true today that if you’re at all involved in computer networking or remote access with computers, you don’t buy a 300 baud modem. You buy a 14.4 kilobit modem. And the economies of scale have gotten to the point where buying anything less than that just doesn’t make sense from a cost perspective. And hopefully it won’t be too long before buying ISDN or something better than ISDN is so cheap that you wouldn’t even think about buying a modem. And that really does seem like a technology we can deploy to every home. We already have at least 20 million users in the world who have email. And that’s probably a low estimate.

Malamud: 20 million people can’t be wrong, right?

Frederick: [laughing] No. Hopefully the same will be true with these other technologies.

Malamud: Well there you have it. We’ve been talking to Ron Frederick, and this is Geek of the Week.


Malamud: You’ve been listening to Geek of the Week, a production of the Internet Multicasting Service. To purchase an audio cassette of this program, send mail to audio@ora.com. You may copy this file and change the encoding format, but may not resell the content or make a derivative work.

Support for Geek of the Week comes from Sun Microsystems. Sun, makers of open systems solutions for open minds. Support for Geek of the Week also comes from O’Reilly & Associates. O’Reilly & Associates, publishers of the Global Network Navigator. Send mail to info@gnn.com for more information. Additional support is provided by HarperCollins and Pearsall. Network connectivity for the Internet Multicasting Service is provided by UUNET Technologies, and MFS DataNet.

Geek of the Week is produced by Martin Lucas, and features [?], our house band. This is Carl Malamud for the Internet Multicasting Service, flame of the Internet.