Announcer: Internet Talk Radio, flame of the Internet.
Carl Malamud: This is Geek of the Week, and we're talking to Brewster Kahle, who is founder and President of WAIS Inc. Before that, he was one of the chief developers of WAIS, the Wide Area Information either System or Server, depending on which documents you read. Welcome to Geek of the Week, Brewster.
Brewster Kahle: Thanks, Carl.
Malamud: What is WAIS, actually? What's the proper reverse engineering of that acronym?
Kahle: It's Wide Area Information Servers. It's an acronym. I can't stand acronyms, but we couldn't come up with anything better. If you can come up with a better name we'd love to change the name. Acronyms aren't usually the right way to go.
Malamud: Maybe we'll make that a contest for our listeners.
Malamud: What is WAIS? Can you give me an overview of what that service is?
Kahle: Yeah. It's an electronic publishing system. Basically it's trying to help people find and retrieve information over wires. But I think actually where the excitement of WAIS is is more helping people create content and share it. Anybody with a personal computer and a telephone should be a publisher. That's the goal of WAIS.
Malamud: Now, how can you be a publisher with WAIS? What is it that it lets you do?
Kahle: All you need is basically some sort of computer, some communication line, something to say, and a little bit of software. And you can then make it available for everyone to see it.
Malamud: What is WAIS, though? Is it a protocol, is it software, is it some indexing techniques? Is it the fact that somebody actually has information out there?
Kahle: Unfortunately, WAIS is all those. It's…many people think of WAIS as a protocol. It's a mechanism for people to go out and try to find what it is they're looking for. But you want to make sure that there's stuff to go out there and find. And that's— We're spending most of our time trying to up the grade and multiply the number of sources of information that are available.
But the key piece about WAIS is the protocol. It's making it so that the clients… Anybody can have a PC, a laptop, a little handheld device, all of those things can go out and probe information resources no matter where they are.
Malamud: But what's the protocol? Is it Z39.50 or is it a modification…?
Kahle: There's a bunch of confusion about exactly what is the WAIS protocol. In fact, the WAIS protocol is about five or six different standards, all packaged into one environment. Z39.50 is important for the information retrieval aspect. It came out of the librarian world,but it's really built to try to help you find card catalogs. Well, most people don't really care about card catalog entries. Yes, they're important. But a lot of people want to get at images, video, radio, all sorts of different types of information. So those standards for document formats come from different groups.
We also need a document identifier so you can go and refer a hypertext link, if you would, to documents that might be in Japan, or in somebody's laptop when they're traveling. You need a mechanism for pointing to documents. That's another standard. The query formats, those are other standards. So the WAIS protocol is actually probably four or five different protocols and standards all being used together.
Malamud: When you say standards, do you mean actual real standards. Are we looking at ISO standards or things that you folks developed to suit your needs?
Kahle: Some things are ISO standard, some things are ANSI standards, some things are just starting their way through the standards process. Some are actually proprietary formats. For instance, Microsoft Word. There a lot of Microsoft Word documents being shared with WAIS. But Microsoft Word format is not an ISO standard. It's a proprietary standard. We just tried to make sure whenever there were examples of proprietary standards for different parts of the WAIS protocol, you had choices. So the only thing, there are no single pieces of the WAIS protocol that are terribly locked into either a particular committee, in terms of the standards, or a particular vendor. That we see s the flexibility and why WAIS is going to win, is it's really riding on top of a set of standards to help people make sure that the consumers, the people that are trying to find information are getting the right stuff from the zillions of sources. And as that evolves, WAIS tracks with it.
Malamud: You developed WAIS when you were first at Thinking Machines. And I've always wondered why the manufacturer of a massive parallel processor would develop an information retrieval protocol.
Kahle: Yes, that's a good question. Thinking Machines, which is best known for Connection Machines, which are massively parallel machines that have hundreds and thousands of processes in them, why would they do WAIS? Well, roll it back a little bit and the name of Thinking Machines is "Thinking Machines" for a reason. The idea is to try to make a machine that thinks. Well, that's pretty hard. But that's the real goal of the company. And at least a machine that's going to think has got to know a lot. It doesn't argue that you are able to think if you do know a lot, but at least it's a precursor. So that's a lot of the interest within Thinking Machines in trying to do this sort of thing.
Basically, what WAIS was was a mechanism for using massively parallel machines from lots and lots of people. Thinking Machines makes a high-end machine. Utilities. The big-boy computers. And the only way that Thinking Machines is going to make lots of money in selling those machines is to have millions and millions of people use them day in, day out. That was the reason why Thinking Machine did WAIS.
Malamud: So in a fully-deployed WAIS world, because of all this massive indexing and searching and retrieval, you were hoping at least that you could sell a lot more Thinking Machines.
Kahle: Yes. And in fact Thinking Machines sold a Connection Machine to Columbia Law Library. That's an interesting example there where Columbia Law Library had a problem: they're in Manhattan. They can't afford more space. So they tried to evaluate whether it's cheaper to buy a computer or buy another building to store more books. And they basically found a computer was the way to go. So they're scanning huge amounts of information. The Rosenberg trial, the Nuremberg trials, lots of United Nations data. Storing that on computers, running it through optical character recognition, searching based on the optical character recognition, which has got lots of faults in it. So you find the right document, but you retrieve the pictures of the pages. This mechanism allows us to basically do retrospective conversion of paper at a very inexpensive rate. And what WAIS is allowing people to do is once you've done that, share that resource worldwide.
Malamud: What kind of storage are we looking at? If you're scanning in an image at what, 300 dots an inch, 600 dots an inch?
Malamud: And you're also running it through OCR. You've got your text. And then you index that text, and that takes some order of magnitude increase in space over the text itself. Can you give me any idea of how much disk space we're looking at to put a library online like that?
Kahle: It turns out that by today's standards, not very much. But to give it some hard numbers. What the Adobe people say is if you scan a page and use their new Acrobat product, it's down to 30 to 40k—bytes—per page to be able to store enough to reconstruct that page so that it looks exactly like what you had before. What we see in other compression technology is more of the sort of 80k, 100k, per page.
That's still very small. We're at a thousand dollars per gigabyte, which is what current disk drives cost. The twenty terabytes that people estimate in ASCII that's in the Library of Congress is just twenty million dollars. So that's not very much money in terms of being able to store and retrieve [crosstalk] the Library of Congress.
Malamud: It's not a lot in terms of disk space, but one of the things I've noticed with WAIS, it's very easy to ask a question like "is there any information out there?" And WAIS is a great searching technique, and it comes back and says, "Yes, I have a million documents that have information in them." Are we going to be flooding our networks? Do we have the network infrastructure that allows us to be truly a wide area information service?
Kahle: Well, you asked two good questions there. There's the how do you do the right filtering? And how do you make sure that you're… You can only read so much per day. So what's the right mechanism to help you filter? Your machine, your client program that's going to be spending twenty-four hours a day trying to find you information and filtering it. And if it's not doing the best job, you're going to go and get somebody else's client, that going to go and filter and find the best information for you.
Malamud: Is the filtering done at the client or at the server?
Kahle: Both. And now increasingly at intermediary sites. So the crude things that we're doing today in terms of doing content-based retrieval, finding documents with certain words in them, those are starting to be augmented by human editors that are saying, "These are the important documents." So when you have a flood of information, you really want to have sometimes human help to be able to find the right document. That human help can be embodied in servers by people saying, "Those are the good documents. You might want to think about those documents." And that is another technique that the WAIS system is supporting to help people find what they want out of the gigabytes that are out there.
The other question you asked is is the network infrastructure good enough? And the answer is absolutely. What we're finding is 56 kilobits is plenty for doing text and business graphics. The biggest problem we have is getting reliable networks to people's workstations. We run into all sorts of problems. Novell networks that aren't compatible, people that have antiquated routers…56 kilobits is plenty. And in fact, I on my laptop have a 9,600 baud modem. I use AppleTalk Remote Access from it. And I use WAIS all the time, as a packetized profile, not a dialup-type interface but a real graphical user interface. It's great.
Announcer: You're listening to Geek of the Week. Support for this program is provided by O'Reilly & Associates, recognized worldwide for definitive books on the Internet, Unix, the X-Windows system, and other technical topics. Additional support for Geek of the Week comes from Sun Microsystems. Sun, the network is the computer.
Don't touch that mouse. Internet talk radio will be right back.
Carl Malamud: This is The Incidental Tourist, non-technical reports from out-of-the-way places. So you're working in Asia and you've got the night off. You're sick to death of the hotel coffee shop and don't know where to eat. Here's an easy hack.
Most department stores in big Asian cities have either a basement or a penthouse devoted to food. Occasionally, the food area—and we're talking massive square footage here—is full of restaurants which are usually not too bad. Certainly not as good as that charming little place back behind the temple but hey, you have no idea how to find that place, let alone read the menu.
The restaurant courts are alright. But if you're in luck you get a real market. The kind of place local yuppies come to do their shopping after a hectic day of answering calls on those cellular phones. These places are truly incredible food bazaars, citified versions of the traditional outdoor markets. There are a few butchers, fishmongers, and the like. But the bulk of the spac is usually devoted to prepared renditions of every delicacy the country knows.
In Japan for example, you can go to Shinjuku Railroad Station, thread your way through the other three million people a day using that station, and go to the head of the Odakyu Line. There you'll find the Odakyu Department Store, an establishment the size of a good-sized suburb. Wander around aimlessly until you stumble across the escalator heading to the second sub-basement.
Now, you may think you're used to crowds but hold on. The Odakyu basement is the gastronomic equivalent of Times Square on New Year's Eve. Grit your teeth and dive in. Walk up to a stand, hold out your five hundred yen, and you'll get a box of potstickers or a scrumptious piece of eel wrapped around sugarcane and grilled, or an assortment of pickled fish.
For a real treat, look for the tofu counter. They hand you a basket and you can pick out all manner of little baked and fried tofu miracles. One has a piece of shrimp in the center, another has a little flat round burst of black sesame seeds, some are rolled around a fat little squid.
This same strategy works all over Asia. In Bangkok for example, try the Central Department Store on Phloen Chit Road, an adventure in dining and most things cost about 80 cents for a healthy portion. For the price of a bowl of onion soup from room service, I got enough food to feed a small army. You can get all the classic street food here like satay sticks. You can get popiah sot, the fresh local spring rolls, right next to a stand of Chinese-style turnip and fishcakes.
Particularly good was the sakoo, steamed tapioca dumplings stuffed with pork, peanuts, sugar, and garlic. If you want an authentic local meal, skip the tourist joint with the charming buffet and the semi-authentic local dancers, and head to a department store food bazaar. This is Carl Malamud, the accidental epicure, for for Internet Talk Radio.
Announcer: Internet Talk Radio. Asynchronous times demand asynchronous radio.
Malamud: If a few people need to get information, obviously a 56k link into their desktop is fine. What happens when there's millions of WAIS users? Does our overall network infrastructure support it? Can we be running massive WAIS servers in Finland and have massive classrooms in California retrieving those documents?
Kahle: So far, so good. When people start to search and retrieve things like Internet Talk Radio, that's going to put some real limitations on what our network can do. But text, business graphics, even weather maps, those sorts of things can be supported pretty easily currently. A lot of smart people are working on getting bigger and bigger pipes going around, and they seem to be running ahead of what we need in terms of finding and retrieving information. Video is putting a great deal of strain, when people are starting to do video WAIS. So you might go and ask, "What news programs are talking about what's going on in Bosnia?"
Malamud: Does someone have to catalog that?
Kahle: We don't have anything that really understands video in an automatic way. What people are using is the audio tracks that are often transcribed for the handicapped. So index that. Often it's around. There's transcripts of all news programs around. Use that as the guide to help you find the right video that you want to be looking for. If people start to understand how to draw something on a piece of paper and find other documents that are like it, if that were useful, we can support that type of thing. It's just not there yet.
Some of the interesting search and retrieval things are actually not based on text, that are going on with WAIS. The people at US Geological Survey are making map databases where you can search based on latitude and longitude and retrieve maps. It's not using text at all. It understands only a few types of queries. But it's getting really good stuff for you.
The weather map server? It only knows how to answer a couple of questions. But basically what it gives you is the current up-to-date, up to the hour, weather map that's available. The DNA sequence people are using people submitting DNA sequences. And they're matching against these huge volumes of DNA sequences to find relevant documents. It's not using text at all.
Malamud: So that's a new kind of a server. Would my very old WAIS client software be able to interact with this new server, or do I have to upgrade my client every time there's a new kind of service?
Kahle: Oh. Basically your old WAIS clients can get at all these new services. The key piece is the protocol. Making it so that new information services can come up, and the tens of thousands and soon hundreds of thousands of people that are using WAIS can get at that new service. That's what the information providers want. And what the information consumers want is, all they want to do is learn one damn interface rather than one new interface for everything that comes along. They want to have lots and lots of value from having to just learn one interface, or just a couple of interfaces.
Malamud: It seems like users are going to have to learn multiple interfaces, because if you look at the area of resource discovery in which WAIS is one example, there's other things out there. There's the World Wide Web, there's Archie. How do these things all fit together?
Kahle: Ah. They're beautiful pieces of work. Gopher, World Wide Web, are two of my favorite interfaces to WAIS. And that may sound a little bit strange. But Gopher has a really nice browsing mechanism to help you get going, to tell you a little bit of what's out there, to help direct where you might want to go. And WAIS is just one of the things you can get to. We think of applications as becoming more and more WAIS-enabled. So instead of having a dedicated WAIS interface, you're going to have your own interfaces that are doing whatever else you want to do. Your email package should be WAIS-enabled. Your software packages, when you help, it should go out to bulletin boards that are indexed with WAIS.
Malamud: So are we back to emacs, then, where emacs is the the interface to the world?
Kahle: Ah, emacs of course has a WAIS interface to it. But what we think is there are going to be hundreds of interfaces to WAIS. And they're going to be built into all sorts of things. Your CD-ROM players should have WAIS things so that if you want to get up-to-date information it can call out, get that new information, and bring it back to your CD-ROM-based interface.
So WAIS is more of a piece of plumbing. It's more the signpost, it's the lines on the road. And most people don't think of that as terribly interesting. That's fine. All we want to be is useful.
Malamud: You talk a lot about individual use of WAIS, and you were talking about a 56k line is enough to get a user on, and you know, occasionally 9.6 will do the job. What's it going to take to get 56k to individual users? Do you have any ideas? Because your service depends on that underlying infrastructure.
Kahle: Basically, people making money. The major driving factor of a lot of this, there's ISDN, which is about… You know, been about to happen for decades, and they just… The phone folks just don't think there's money in it. If they can start making money at it, this stuff can happen extremely fast. All the wires are already laid. It's just a matter of using it in this new way. That's to the home, say.
Businesses, often they have even more than that running around their places. And what we're seeing is some of the proprietary protocols washing away. We're seeing the DECnets and the SNAs being replaced by things like TCP/IP. And there's the proprietary protocol period that seems to set we computer scientists back for ten years or so. And what we're trying with WAIS is to go out there in front with a good standard, and an open standard, and say, "It's time to bypass the proprietary protocol period and get WAIS in place."
We are completely dependent on network infrastructure. And by Al Gore's being our new Vice President and having his clarion call to make sure that we have a national digital infrastructure, that's helping a great deal. I've been in Washington now for a week or so. There is more a buzz around here about how to get with it, how to get our databases up. The United States government databases, and how offer those. For free access, often, and sometimes for fee.
Malamud: Are we going to be in a world where good citizens go out and put databases together and let the rest of the world access them? Is that what you're trying to find, kind of a worldwide free public library?
Kahle: Some things won't be free. Some things will cost money. Other things will have other types of restricted access, because it's your own private information and there's privacy concerns that go all the way through that sort of thing. But yes, most of us would be just perfectly happy to have anybody listen to us, right? That's why I'm sitting here on this radio program not charging anything. I'd love to have people know about what it is we're doing. So, lots and lots of people will publish and make their information available for free.
Announcer: This is Geek of the Week, featuring interviews with prominent members of the technical community. Geek of the Week is brought to you by O'Reilly & Associates, and by Sun Microsystems.
This is Internet Talk Radio. You may copy these files and change the encoding format, but may not alter the content or resell the programs. You can send us mail to firstname.lastname@example.org.
Internet Talk Radio. Same-day service in a nanosecond world.
Malamud: We had another experiment in information for free, and it's called the Usenet. And if you look at Usenet, there's all these newsgroups out there. And increasingly, at least my personal feeling is it's very hard to find information. Are we going to end up with the same situation in the WAIS world, where there's a lot of databases but maybe the quality of the information isn't there?
Kahle: Most of the information even available on WAIS right now is not very good. So yes, we're going to have just a glut. The Internet is opening up lots of sluices to just get at lots of information. And trying to think that you're going to be able to browse it all or get an idea of what's out there is about as… That's not gonna happen. Just try to read books in print sometime. You can't use it in that way. You need mechanisms to help you find the right thing. And the key piece of WAIS is to not have the producer necessarily say who should be reading it, which is how to Usenet is built. WAIS is trying to help the reader go out and find the things that he wants, or she wants, out of all of those sources. So some more sophisticated filtering mechanisms than the user-supplied filtering mechanism that's in Usenet or email lists, for that matter.
Malamud: So rather than increase the quality of the databases, you think we should increase the power of the tools that search through those databases.
Malamud: Well, how do we increase the quality of databases?
Kahle: At WAIS Incorporated, a lot of what we're trying to do is help encourage and work with publishers to make their information available. There's advertising-supported information that'll be out there and people will pay to make it look good. Like, Sun is making a lot of this information available through WAIS.
But a lot of other publishers, traditional publishers, are going to need payment models. And so we're working with them to try to come up with those payment models that are reflective of their costs, which are often greatly diminished if they can distribute over the Internet or other networks like it. So I think that we're going to see more and more publishers jumping in. We'll see more professional databases. Like the US government. A lot of what they do is publish. And they'd love to have a mechanism for publishing cheaply, and they'll pay for it out of other budgets and then give away access. Those are the right sorts of databases, and we're working with those folks now—the EPA, Library of Congress, all sorts of people, to help them get their information out in formats that lots of people can use.
Malamud: WAIS started as a wonderful gift to the world from Thinking Machines, and now you've founded a company, WAIS Inc. Is WAIS no longer in the public domain? Have you taken it away and you charge for it now?
Kahle: Ah. No, the public domain environment, and the freeware world, is one of the most amazing worlds I've ever been involved in. I remember back at MIT, where they were lots and lots of freeware and sharing of code. It was a vibrant environment. The Internet has helped multiply that by thousands, to help people create and deploy information tools.
From the beginning, WAIS was a mixed commercial and freeware environment. The original participants were Thinking Machines, Apple Computers, Peat Marwick, and Dow Jones. The freeware world was part, a very important part, but only a part. The commercial world was a part, only a part, but an important part.
We're trying to straddle three worlds. The .edu world, the .gov world, and the .com world. And it makes for great conversations when you get those three groups together, because often they don't trust each other. They don't really know what motivates each other. So trying to keep those worlds together is important. What WAIS is about is a protocol to help people find and retrieve information and make their information available. And we're trying to make one protocol, an open protocol, good enough, that all three of those worlds will want to participate.
So what happened was we used the Internet as a mechanism for proving out the technology, doing a lot of R&D, and get lots a lots of good people working in the system. And now that we're… When I started to go more commercial with the services, and I started planning out WAIS Incorporated, I really needed the freeware to be done well. And I worked with NSF to make a set of money available to start a WAIS center. And there is now one in North Carolina, and there's starting to be other WAIS centers that are really servicing other domains than what we're doing.
It's all got to go hand in hand, and we all have to work together. And it's the exciting part. And if we lose, what I fear most is either we're going to make something so bad that people won't want to use it, or we'll self-destruct, or we'll get cocky. And what will happen then is we'll see proprietary standards come in.
Malamud: Are you competing with North Carolina? Are they your competition for the WAIS database provider trying to figure out how to do things?
Kahle: No, they're our brothers. They're extremely important to the success of WAIS. This world is growing by leaps and bounds. Millions of dollars a year are going into WAIS from all sorts of areas. And there are lots of niches that need different types of services. A lot of the universities really need things for free. So they can play with it, and understand, and build it into systems and do research based on it.
But a lot of people, when they've got mission critical databases, they can't depend, frankly, on freeware. They need to have a phone number, somebody they can call. They have to know that when the new version of the operating system comes out, the new version of the software is going to come out. And those are the environments that people are trying to work with.
US Geological Survey is doing probably the best work in the government domain in helping the geographic information people use WAIS up a storm. So I think the North Carolina people are really helping move the university environment in the free world. Though we're working with Rice University, where they've gone and put up current content. This is a for-pay proprietary source that they bought rights to run at the Rice campus. So theirs is an example of a commercial WAIS server. But it's restricted access. And that is an example where Rice needed better than what was available in the freeware.
Malamud: But what if North Carolina does a wonderful job putting together freeware? How are you going to make money? Why would they pay you money when they can get it for free from North Carolina?
Kahle: Oh, we'd love more and more pieces to come out of the freeware domain, whether it's North Carolina, whether it's out of the Gopher people, whatever. Our goal at WAIS Inc. is to try to keep the world together. To try to make it so that consumers know where to find the right information, and to help people that think of themselves as just consumers to start publishing.
Malamud: But how do you make money at that?
Kahle: We're doing it by selling software tools. So servers, enhanced clients. But we find what most people need is help in understanding what this stuff is. So we do consulting and contract work to help modify the existing set of servers out there, and do things for them. We also help publishers put things up and help them run those services. Sometimes we get a percentage of their revenues off of those systems.
So, we're flexible. And as much as people start to step forward and say, "We're going to do this piece well," then we're often perfectly up for stepping back. The only way we're going to win is by leveraging lots and lots of institutions to do what they want to do. So, systems integrators are starting to step forward and say, "We want to do the system integration." We have people that are trying to bundle the WAIS server code into lots of products. People are starting to bundle the clients into products, where they're making their things WAIS-enabled. And we're helping those that need to be dealing with a commercial entity, or they can't take it seriously.
Malamud: WAIS is making the Internet look like a single library, a single database. Is this the beginning of a new kind of library, a distributed library, a global library?
Malamud: I think a library's maybe not the right analogy. I would think of it as a huge bookstore, or a set of information services that are available. You know, is a weather map updated every hour? A book? Is a library? No, it's kind of like your television is going and downloading Geek of the Week, and finding that that's what you want to be listening to. Is that a bookstore or a library? No, that's more like the radio.
The internet is a new communication structure. And it's a new way for humans to communicate. And what WAIS is trying to do is help you navigate that. What is really exciting with a new communications structure, say printing or the telephone, is all sorts of things happen. Industries come and go. There's realignment of how companies work, how whole institutions work. And what's fun about this is it's an open environment for all of us to take part in shaping it. And then those people that shaped the technology have an inordinate control over what it looks like. Why are these telephone looking the way they are? And why weren't they used for all the things they're used for now, fifty years ago? Because the people that were involved early had one vision. So the invitation is, this is an open world, let's shape it into what we want it to be.
Malamud: There you have it. We've been talking to Brewster Kahle, and this has been Geek of the Week.
Announcer: This has been a Geek of the Week, brought to you by Sun Microsystems. And by O'Reilly & Associates. To purchase an audio cassette or audio CD of this program, send a electronic mail to email@example.com.
Internet Talk Radio. The medium is the message.
Geek of the Week: Brewster Kahle at the Internet Archive