Carl Malamud: Internet Talk Radio, flame of the Internet.
Malamud: This is Geek of the Week. We’re talking to Keith Moore from the University of Tennessee. Welcome to Geek of the Week.
Moore: Thank you, Carl.
Malamud: You’re one of the principal contributors to the to MIME, the multimedia Internet um, extensions for mail on the Internet. I’ve…totally mangled reverse engineering that acronym. Why don’t you tell us what MIME is?
Moore: Let’s see. I think it’s “Multipurpose Internet Mail Extensions.” Maybe message extensions, but I don’t remember the acronym either. Basically MIME is a way to put anything besides text in Internet mail. Internet mail for you know, ten plus years has supported only text, and MIME was a way to take existing Internet mail infrastructure and encode various other kinds of objects besides text into text-only mail. And so it has an encoding scheme for non-text objects, and it has a typing scheme that allows you to say what kind of object each of these things is.
Malamud: Let’s break these pieces down. As I understand RFC 822 mail, the original… Actually there’s two concepts. There’s RFC 822 mail, that’s the message. And then there’s SMTP, the message transfer protocol. And as I understand RFC 822, it’s a series of headers, and a message body.
Moore: That’s right.
Malamud: Now, how did my MIME change that kinda flat structure?
Moore: Well, let’s see. The first thing is all MIME messages are RFC 822 messages. As you said, RFC [822] has a series of headers of at the beginning of a message, and then a blank line, and then the rest of the message is text. The difference between a MIME message and an RFC 822 message is that in the MIME case, there’s a couple of extra headers at the top of the message that say “By the way, the rest of this message is in a special format. And even though it looks like text to you, if you run it through a parser, then it can be split up into several body parts and types etc., and you can read them.”
Malamud: So the body of this MIME message is all just plain old ASCII text? I can read it if— Even if I don’t have a MIME reader, I can read this message.
Moore: If the components of the message were text only yes, you can read them. And you would see a couple of things that you might not understand, but basically you’d still be able to read the message. We tried to make it is backward-compatible and as friendly to an install base as we could.
Malamud: What kind of body parts can we have inside of a MIME message now?
Moore: Well there are several top-level classes. There’s text, of course, so that you can still do text within a MIME message. There are body parts for images. There are body parts defined for audio—actually there’s only one defined right now; others will be defined. Let’s see, what else. There is a general type of body part called “application” which is probably some sort of application-specific data, a word processor or spreadsheet, something like that. And there’s a special type called “multipart.” I’m sure you’ll want to ask me about that. Then there is…besides the sort of top-level things there are some things that are specific to the message system as a body part type for that. And—
Malamud: So we have kinda basic components—text, audio, image, application. And then multipart is a way of structuring those individual components inside of a message?
Moore: That’s correct.
Malamud: So I read one piece after another, or… How does this work if I have a multipart text and then an audio and then an image? Do I first read the text and then hear the audio?
Moore: I think it’s generally expected that things will be presented in order. But there is a contract called “multipart/parallel” which says “Present everything within this enclosing construct all at the same time” so that you can do some sort of crude multimedia things. We didn’t try to solve the real multimedia problem, but we tried to give a way to carry several different kinds of objects, and someone can later come along and define maybe a scripting language that ties them all together so you can do real good presentation.
Malamud: And what about the external pointer body part? Why don’t you explain what that is?
Moore: Oh that’s right. So, let’s see. There’s a body part called “message/external-body” and what happens is if you see one of those things—if you’re a mail reader and you see an external-body part, what it has instructions to where to find the actual body part. The body part will not be actually included in the message. When you read this thing will say “By the way, this thing is over here for anonymous FTP at this host. Go get it and present it to me.” The user agent will usually ask you first “Do you want to go get this thing?” but as a side-effect of reading the message will actually go get it for you.
Malamud: So I could send you a message saying there’s a new radio program, and your reader would said “Do you want me to go get the audio?” So rather than sending you a 30-megabyte mail message, there can be a pointer to an anonymous FTP area?
Moore: That’s right. And they’re doing this now for new RFCs that come out and new draft documents that come out for Internet standards. And—
Malamud: So if you’re on the announcements list it says “Here’s an abstract. Do you want me to go get it for you?”
Moore: Right.
Malamud: You talk about audio, and images, and things like that. Yet RFC 822 is simply a text-based messaging service. How do you put audio into text?
Moore: Oh. Well people’ve been doing it for years with things called UUencode, and btoa, and various things for the Mac whose names escape me at the moment. Basically what you do is you say well, we’ll encode six bits per character, and we’ll use sixty-four different characters, and we split it up so that three octets get encoded in four characters. The scheme we picked for doing it has certain properties. We did a survey of various kinds of mail systems in the world and which characters they would allow through the mail system there. We weren’t trying to solve the problem just for the IP-connected Internet. There are people all over the world who are using 822 mail who aren’t actually using TCP/IP. Or SMTP, which is what imposes some of the restrictions.
And so we came up with a set of sixty-four characters that actually is the same set that was used by PEM for the same reasons. And we encode things as those characters and send them as plain ASCII characters. Basically upper-case letters, lower-case letters, digits, and three or four others.
Malamud: So I get this audio file. Let’s say I do put it in my electronic mail message. And I’ve got to have some reader go through and my audio reader has to first decode the message back into binary data—
Moore: That’s correct.
Malamud: —and then play it for me. Why not just send the binary data?
Moore: SMTP, specifically, and almost every other mail transport are not binary-transparent. SMTP requires, back from days of old, that you not send messages with the most significant bit of your octet set. And it basically assumes that the message will be ASCII text.
Furthermore, various SMTP implementations that would handle the mail, that get a message and route it and forward it onto somewhere else, have been known to change the way ends of lines are represented, or strip out certain characters. Various other mail transports have other limitations. They may translate ASCII and EBCDIC or back again on the other end. So there are various things that basically prevent you from sending arbitrary sequences of octets through [crosstalk] in the mail.
Malamud: But don’t a lot of mailing systems—cc:Mail for example lets you attach a binary file. Why not allow those people to continue doing that work and just say well you know, if you reach a place that doesn’t do 8-bit…you lose. You just can’t get the message. Are we forcing ourselves into a lowest common denominator, and is that a bad thing?
Moore: No, I think we actually hope that one of these days you’ll have ubiquitous binary transport. There is a way in MIME to say “This is a binary object. Let it go through.” But no one really expects to use it any time soon. Maybe in a very local environment. In your message transport, if you have a binary message and say okay “We’ll convert it later, we’ll encode it in something” and maybe someone else later on decides “Well, in fact we can do binary from here on out. We’ll take this encoded thing and convert it back to binary,” what you get is a lot of various transport agents doing things like munging the message in various ways. And if it gets corrupted, you don’t know who did it. It’s very hard to track down. So, I think the general wisdom is don’t try to use this yet. We’ll hope that SMTP and other transports can be upgraded, and maybe someday there’ll be enough of this infrastructure in place where you can use it.
Malamud: Looking at text messages, if we can get back to that area. US ASCII is really the character set that was defined as the way you structure a message. Yet in other countries obviously we have things like accents, let alone other character sets. How did you tackle that problem within the MIME body parts?
Moore: Well basically what we did, for the “text/plain” body part—you call it plain text, in MIME it’s called “text-slash-plain”—there is a parameter that goes with it that says what character set is used. And if it’s ASCII it’s treated just like it always has been. Basically we took all the sort of ISO standard character sets that were completely specified and defined parameters for those also. And several new ones have been added since we actually did the MIME standard. There’s been stuff for Japanese, and Korean, and Russian I believe, and Hebrew, based on conventions that were used in the net environments that dealt with those languages.
Malamud: Now, a lot of these other characters sets don’t fall within US ASCII by definition, right, or we wouldn’t be using ’em.
Moore: That’s correct.
Malamud: Do we have to take that and encode this the same we would a binary file so that I have to decode the message before I can read it?
Moore: No. There’s actually two different encodings. And one is sort of optimized for things that contain a lot of ASCII characters, and in that encoding which we call “quoted-printable,” if it doesn’t fall within a certain range of characters that are in the ASCII set, you encode it with equals [=] and two hex characters. And hopefully that doesn’t happen very often. That will work for some languages that…for which they mostly use ASCII characters. Spanish is probably a good example. I don’t want to speak for them, but I suspect that it’s not too obnoxious in Spanish. And if you—
Malamud: So “José” is J O S…quote equals something something…
Moore: Yeah. Equals and two hex characters. And hopefully the accents don’t occur very often. And as far as I can tell, this has been sort of a mixed reaction thing. But we all know it’s something for the time being. We’re hoping that SMTP 8-bit transport can become more ubiquitous also.
Malamud: So I get a message and my reader is smart enough and it knows the character set. If I’m in Spain I expect to get a lot of these, my reader’s configured, and it just shows it to me in the right font. If I carbon copied somebody in some other country who isn’t prepared to deal with other character sets—let’s say the US, right, we would see something different.
Moore: I think that’s the assumption, that the people who speak a language will have support for the character sets that are normally used for that language.
Malamud: Now, one of the things I like in my mail reader, I like to look at my headers. I don’t just go “read, read, read, read,” I say “delete, delete, delete.” In order to do that I need to know who it’s from. You’ve talked about how we take care of character sets inside of the message. What about the headers? Is there a way of putting an accent in my name in the “from” field, for example?
Moore: Yeah there is, and it’s a separate specification. It was really a separate problem from the one of having to encode things within the document. Because headers always always always have to be ASCII, everyone has to parse them. If you can’t deal with a certain thing within a body part in the message, you can just ignore it or tell the user you can ignore it or whatever. But, the headers themselves, if you break those things then you break the whole mail system.
So, there was a different way of encoding things within headers, and there are complicated restrictions on when you can use it and exactly where in a header it can appear and things like that.
Malamud: So can my domain name be josé.radio.com now?
Moore: Absolutely not. Domain names have to be ubiquitous. Everyone has to be able to type it in. And unfortunately, for some people that limits them to ASCII characters, and a certain subset of those. This is somewhat of a problem for people who are using LAN-based mail systems where they’re used to being able to have accented characters and such. And communicating this problem is something that we face. It’s like why if you… I think we understand that like Telex addresses have a limited alphabet and people are used to that, but they’re not used to it for their own environments which are being gatewayed into the Internet.
Malamud: So where in the header can I use these new character sets?
Moore: Basic—
Malamud: The “subject” field would seem like an easy one.
Moore: Right. So basically anywhere that there is plain text in the header. But you might have to read the specification— I had read RFC 822 to understand where it’s defined to be that. Subject is an obvious place, and yes you can put whatever character sets and such in there. Your personal name that appears before your address in the message header is another place you can do it. And generally within comments that appear in the header. But for instance, you’re specifically disallowed for putting it in the trace information that might be used to diagnose problems.
Malamud: We talked about how RFC 822 works over a variety of different transport mechanisms, from SMTP to UUCP to…you know, you can send it into the MCI Mail world and it works. In the core Internet that uses SMTP, what kind of modifications have you made to support this kind of emerging richer messaging structure?
Moore: Well, we’ve anticipated that if you were sending text-only messages, you know, several-hundred K is an upper bound. You don’t usually have huge huge huge text documents going over email.
On the other hand as soon as you can start sending things like images around, then messages on the order of several megabytes are to be expected. And we don’t expect however that everyone’s SMTP can deal with these things. You may not have enough disk space, you know. You were expecting text-only mail and all of a sudden you get this huge thing dumped on you.
So, one of the extensions for instance is a way to say “Here’s a message that’s so big, can you deal with it?” And the SMTP on the other end can say “Sorry, I can’t deal with this so don’t even bother trying to send it to me.”
Malamud: What if I’m sending to four people within an SMTP session?
Moore: Uh—
Malamud: I say “Mail to this kind and to this guy and to that guy.”
Moore: If they all have the same SMTP server, presumably it could…it might want to reuse it for all of them and say “I don’t have enough disk space.” The specification actually allows you to say “I’m sorry I can’t send it to this guy. You may have to find another way to get it here but I’ll take it for this other one.” And…that gets kinda complex in terms of error recovery, but it was felt that it was necessary.
The way mail works, when you send it out it goes to one SMTP server and it may decide to route to another one for some of the recipients and yet a third one for others. So, trying to find a pathway for each recipient that can handle the volume of data required could be kind of difficult.
Malamud: Now, these SMTP extensions… Do you just assume that the other side has it? How do you know whether these people can speak the new SMTP, or whether they’re speaking the old SMTP?
Moore: Oh, okay. So, there is a negotiation mechanism. When you first talk to an SMTP traditionally you said “HELO” and you give your domain name. And it comes back and says “Sure, okay. And in fact my domain is this.” You’re just introducing yourself. There’s a new word that says “EHLO” [pron. “e-hello”] which is “extended helo,” which says “Don’t just tell me what your domain is, also tell me what all the capabilities you have are, which extension you support.” And you get back a list. And you’re only allowed to use those extension if the server end says “Yes, I support these.”
Malamud: Okay. So basically if the person talking to you says “HELO” they’re a 7-bit. If they expect you to talk— Uh, not 7-bit, excuse me. They’re a normal SMTP world. If the person says E H L O instead, then you know that they speak extended SMTP and if you also do it, you respond that way.
Moore: Right. Actually, I think we try to be careful about that in that the server really isn’t supposed to make any assumptions about client’s capability even if the client says “EHLO.” But the client can make assumptions about the server’s capability if the server responds to EHLO.
Malamud: Okay. In addition to things like size limits, are there other things in the extended SMTP?
Moore: The most important one right now is the 8 bit extension that basically says we can break the old rule for SMTP that said we’re not allowed to send…what we call 8-bit characters. They’re all 8-bits, but characters for which the most significant bit in the octet is 1. And the European character sets and…all of the non-ASCII character sets, practically, require this. And so, you can now query the remote SMTP if it supports this extension there’s no need to have the text be encoded in 7-bit form, it can be in 8-bit form. You still have to encode non-text objects, practically speaking. But this—
Malamud: Well why if you’re allowing 8-bit text to come through over this transport aren’t you allowing 8-bit audio, or 8-bit image, or something else? If both sides are willing to do it.
Moore: Uh, there are people working on proposals for that. The reason the current extension doesn’t have it I believe is there’s a lot of mail software out there that it is easy to upgrade…in fact, mail software has done this in violation of standards for years—to go ahead and pass 8-bit text. But when you start saying arbitrary sequences of octets and we want to be able to pass these cleanly, that requires major changes to existing software. And so the current 8-bit extension is something that we thought that people could graft in without a lot of trouble.
Malamud: How old is MIME now? How many years has it been around?
Moore: Uh, I have to think about it. MIME has been at Draft Standard state for…two thirds of a year, three quarters of a year, something like that. Before that it was a Proposed Standard state for…about the same amount of time. So I guess now it’s…something like two…
Malamud: It’s a couple years old.
Moore: A couple years old. Yeah.
Malamud: Is anyone using it?
Moore: Uh, yeah. People are using it all over.
Malamud: Do vendors ship it?
Moore: There are some vendors that ship it and more coming. And I’ve seen more and more of this every day. It’s becoming a fairly commonplace thing.
Malamud: Now, with the X.400 world, they started in ’84 and then ’88 and then the ’92. It’s been a twelve-year development/deployment. Is MIME— Is it gonna take twelve years to get MIME out as a generally-accepted set of mail extensions?
Moore: No, I don’t think so. I think— I would…be…very careful about making estimations of install base. I’m sure other people have done that and have real statistics. But basically you already have a MIME-capable mail transport. You don’t need, if you have sendmail or any other SMTP or anything that’s already used to dealing with 822, you don’t have to change the transport. And furthermore, if you have any Internet mail 822-capable user agent, you don’t have to change the user agent, you can receive MIME documents. If they’re text, you can deal with them.
Then there are various patches to existing mail user agents for the 822 world that maybe don’t present an ideal user interface but allow you to read and send MIME mail messages. And then there are sort of external programs if you get a MIME mail message from someone and you didn’t quite know what to do with it, there are programs that deal with just the problem of shipping files around in MIME. So there are various things that you can do to be MIME-capable and MIME-aware until you get really really nice user agents along. And as I said, there’s no need to upgrade the transport at all. So in that sense, we already had a large infrastructure of mail transport that was already in place that we could use. And I think that gives us a leg up on trying to compete with X.400—if that’s—you know, some people view that’s what we’re doing, and some people say we’re just trying to make the world safe for multimedia email.
Malamud: Keith Moore, we’ve been talking about multiple body parts, and audio, and image, and character sets. And these are things that were in the 1984 X.400 specifications. Why did you reinvent the wheel? Why not just take X.400 and…you know, maybe just fix it?
Moore: Huh… Well, that’s a very good question. I think for us to have taken… We started with the assumption that we have to be friendly to our install base. That any proposal which would say “We’re gonna reinvent a new mail system” was a non-starter. It’s very hard to deal with multiple incompatible mail systems. And if we had said that’s we really need to do, I think we would have said okay, well what can we do as far as profiling X.400 or tweaking X.400 or whatever to make it work. But we didn’t believe we needed to do that. We really— We saw that it was possible. We had existence proofs of multimedia mail systems that ran on top of 822 that worked very well, that had you know, minimal impact on the install base, at least for the communities they’d been tested on, And we saw that we could take our existing Internet mail system and not throw it away, not have to deal with multiple mail systems and have to worry about who’s on which mail system when you’re sending someone a piece of mail. And—
Malamud: But you do have to now worry about gateways to the X.400 install base. Is that an issue?
Moore: We would’ve had that in any case because X.400 wasn’t going to go away, either. If you work with email for very long you quickly come to realize that gateways are very evil things. You have to have them and they do allow you to get connected, but they’re a royal pain. So we knew we’d have to do X.400 gatewaying. At the same time, giving the Internet world capabilities equivalent to the X.400 world really makes your gatewaying problem easier. There’s now been defined a set of mappings between MIME body parts and X.400 body parts, and basically you can now map all the common things. If you want a fax body part you can map that. If you want IA5 text that maps to ASCII text, it always has. And you can map a multiple body part message in the MIME world to a multiple body part message in the X.400 world. And it turns out that you can define in MIME a way to encapsulate X.400 body parts, and in X.400 you can define a way to encapsulate MIME body parts. And really now it becomes feasible to establish a mapping for all these different services. And so I think at least some of the X.400 community’s pretty happy with this.
Malamud: In the X.400 world one often hears X.400 and X.500 talk together, and in fact when I talk to people about the Internet, one of their first questions is “How do I find somebody’s electronic mail address?” Have you given any thought to, do we need a directory? Is it X.500? Is it the DNS? Will MIME work without a way of finding addresses?
Moore: Um, it’s certainly a problem. We certainly need some sort of general way to find email addresses in the growing Internet. That does imply a directory server, and one that’s distributed, and one that scales well. And it may be that X.500 can be made to work. I’m not so up on the scaling problems, but I kind of hear sort of groans whenever I mention X.500 to people, so I think it at this point is not a problem that solves well for the sort of world-wide Internet or things on that scale.
Malamud: But we can do email without a directory. I mean, we are…
Moore: Right. We have other means of discovering people’s addresses. And that works for now. I think it would’ve been a bad idea to tie MIME to any kind of directory service and say “We must have this directory service in place before we can use MIME.” And there have been at various time suggestions that we do something like that, that to say you know, you shouldn’t send someone a message that contains GIF files or JPEG files or audio files without knowing that they can deal with it. And you know, maybe that’s true except that if we really made our email system dependent on the ability to discover these things, then we wouldn’t have an email system.
Malamud: I found a easy way to handle the problem of can they deal with JPEG files, I send ’em one and if they can’t they call me up. [mimics someone roaring]
Moore: Right. Or you send a message saying, “Can you deal with a JPEG file. I don’t want to send this to you.” And sure, it’s informal and it requires manual intervention, but.
Malamud: Speaking of dealing with something, one of the things that I’ve looked at in the MIME standards is this “application” body part in which I’m sending you a Perl script, or I’m sending you PostScript. Is this secure? I mean, what are the security implications of sending arbitrary applications over messaging?
Moore: Um. I like to think of the Internet worm and there was a finger server that people exploited. It turns out if you sent a long enough command to a finger on a particular host, then you could actually gain privileged access to the system. And the reason that this was the case is that you basically had a network protocol server that did nothing but take this string that it got from the client in and feed it to an ordinary user program that was not designed to be a network protocol server. You would just feed it to the local finger program and return whatever it gave back.
And if you start taking text that comes from you don’t know where, and feeding it to a PostScript interpreter or a spreadsheet or a word processor, in many many cases there is some way to export a hole that can do something unkind to your system, maybe exposed to security risks, or maybe you know, expose… It might be able to to trash your system, it might be able to let someone else get into your system, it might be able to expose data that’s on your system to somewhere else. There are lots of ways this can happen. And yeah, it’s a general problem.
Malamud: Is this something that if someone sends you an application you shouldn’t execute it?
Moore: Well…
Malamud: If a stranger sends you an application?
Moore: If a stranger sends you an application and you know, you don’t know that the application is safe against this kind of attack then…yeah, you want to think very carefully about doing it or do it in some sort of safe environment. The other thing is you don’t really know whether someone sending you something is a stranger or not. We don’t have authentication mechanisms in place, at least in the general case.
Malamud: While you were developing MIME there was another set of messaging protocols, known as the Privacy-Enhanced Mail, moving forward that does solve the authentication, and message integrity, and confidentiality of the messages. How does the PEM work relate to MIME? Do you have to run PEM or MIME? Can you run both?
Moore: There are proposals for integrating the two, basically calling a PEM object just another type of MIMe object. The nice thing about that is if you have a MIME mail reader, then you should be able to just plug in the PEM module, basically. And when you get one of these things your mail reader will call the PEM module to decrypt the message or certify that yes it really did come from the person who it says it came from, or something like that. That hasn’t quite gone out the door yet but it’s being worked on.
Malamud: So there is a body part of type “message” and the sub-type is…type “PEM.”
Moore: Right.
Malamud: It’s that simple.
Moore: It’s a little more complicated than that, as security always is, but that’s the basic idea.
Malamud: Well how long is it gonna be before we have secure electronic messaging on the Internet?
Moore: That depends on a lot of things that have nothing to do with protocols. Security is this is real touchy issue that involves licensing restrictions, and export rules, and distribution of trust and key exchange, and all those things are hard problems. And you know, it’s easy to encrypt messages. It’s hard to get the keys across. And uh—
Malamud: It’s even harder to get export controls approved.
Moore: That’s right. So there’s so many barriers to getting a real secure mail system in place. And—
Malamud: Do we need one now?
Moore: Oh, absolutely. Any time you’re dealing on something on the order of tens of millions of users, you’re going to need some sort of security thing. There’s going to be someone out there that wants to mess with you. It doesn’t take many people on an Internet, it’s very easy to reach out and touch someone. So, yeah, you need ’em.
Malamud: There you have it. We’ve been talking to Keith Moore, and this has been Geek of the Week.
Malamud: This is Internet Talk Radio, flame of the Internet. You’ve been listening to Geek of the Week. You may copy this program to any medium, and change the encoding, but may not alter the data or sell the contents. To purchase an audio cassette of this program, send mail to radio@ora.com.
Support for Geek of the Week comes from Sun Microsystems. Sun, the network is the computer. Support for Geek of the Week also comes from O’Reilly & Associates, publishers of the Global Network Navigator, your online hypertext magazine. For more information, send mail to info@gnn.com. Network connectivity for the Internet Multicasting Service is provided by MFS DataNet and by UUNET Technologies.
Executive Producer for Geek of the Week is Martin Lucas. Production Manager is James Roland. Rick Dunbar and Curtis Generous are the sysadmins. This is Carl Malamud for the Internet Multicasting Service, town crier to the global village.