Carl Malamud: Internet Talk Tadio, flame of the Internet. This is Geek of the Week and we’re talk­ing to Dr. Clifford Lynch, who’s Director of Library Automation at the University of California. Welcome to Geek of the Week, Cliff.

Clifford Lynch: Glad to be here.

Malamud: You got your doc­tor­ate in data­bas­es from Dr. Michael Stonebraker no less, the guru of data­bas­es. And yet you’re work­ing in the library com­mu­ni­ty. Do com­put­ers and libraries come togeth­er? Is there a coa­les­cence there?

Lynch: Well, let me answer that a lot of dif­fer­ent ways. Certainly this I think is going to be the decade when infor­ma­tion tech­nol­o­gy real­ly moves into the pub­lic side of libraries on a very large-scale basis, far more than we’ve seen so far. So from a library per­spec­tive yes, infor­ma­tion tech­nol­o­gy is invad­ing, and we can say a lot more about that in a few minutes.

From the com­put­er sci­ence side, though, I think it’s inter­est­ing to note that com­put­er sci­ence has paid rel­a­tive­ly lit­tle atten­tion I think to some of the prob­lems that come up with very-large scale library automa­tion and pub­lic access to infor­ma­tion. I think that these are hard prob­lems and also fruit­ful prob­lems from a com­put­er sci­ence point of view. So much of my work in com­put­er sci­ence and inter­est in data­bas­es has sort of been moti­vat­ed from the appli­ca­tion back to what we need in the tech­nol­o­gy to cre­ate or facil­i­tate those applications.

Malamud: And what are some of those tech­ni­cal issues, some of the tech­ni­cal require­ments that we need out of net­works in order to sup­port large library archives?

Lynch: Well, it goes the whole gamut, of course, from data­bas­es to net­works. The data­base side has a lot of very spe­cif­ic prob­lems with large tex­tu­al data­bas­es. From the net­work side, you’ve real­ly got a full gamut of ques­tions from tech­ni­cal to things that real­ly go beyond tech­ni­cal to almost intel­lec­tu­al issues. Technically many of the key issues real­ly revolve around estab­lish­ing stan­dards and stan­dards that work. Interchange stan­dards for var­i­ous forms of information—text, mul­ti­me­dia, things like that. Um—

Malamud: But don’t we have those already—MIME mes­sag­ing for exam­ple, isn’t that an inter­change stan­dard for mul­ti­me­dia messages?

Lynch: To an extent it is. Although it’s impor­tant to note that that’s quite new, too. One of the things peo­ple I think often over­look when you think about library-type prob­lems is the sim­ple issue of scale. It’s easy enough to intro­duce mul­ti­me­dia in the sense that you upgrade some­body’s mail­er to do mul­ti­me­dia. And next thing you know, mail is com­ing out with some pasted-in bitmapped images or a bit of voiceover. When you start think­ing about well, I have a data­base of you know, umpteen mil­lion vol­umes of stuff that I need to con­vert, for­mat changes and things like that hap­pen quite slow­ly. And right now, I think a lot of the not just library com­mu­ni­ty but infor­ma­tion com­mu­ni­ty more broad­ly is sit­ting on a huge mass of con­tent and is sort of on the verge of mov­ing this to dig­i­tal forms in a big way. 

Malamud: Does that means scan­ning in the books, or does it mean retyp­ing, or wait­ing for new books to be produced?

Lynch: Well, I’m think­ing here of the sort of exist­ing base of information—and don’t just think books, think also sound record­ings, and movies, and all of these types of mate­r­i­al archives as well. We have mas­sive mas­sive archival col­lec­tions of man­u­scripts and things that have been rel­a­tive­ly inac­ces­si­ble. The schol­ar had to actu­al­ly phys­i­cal­ly go some­place and you know, spend years in a dusty room min­ing this stuff. Once we start mov­ing this into dig­i­tal form, sud­den­ly these archives will be acces­si­ble nation­al­ly and inter­na­tion­al­ly, and I think this will make a huge dif­fer­ence to schol­ar­ship. But one of the issues right now is that you’re only going to want to do those con­ver­sions once. They’re fero­cious­ly expen­sive. So, peo­ple are very ner­vous about mak­ing sure that there are sen­si­ble stan­dards in place before they invest in those conversions. 

There’s anoth­er inter­est­ing phe­nom­e­non, too, which is the more you cap­ture intel­lec­tu­al con­tent as opposed to sur­face form, the more expen­sive it gets. We talk about con­vert­ing most old print mate­ri­als by basi­cal­ly scan­ning and cre­at­ing bitmapped images. Even OCR at its present lev­el of qual­i­ty is large­ly con­sid­ered out of the ques­tion because of the error rate except as a sup­ple­ment to sup­port search­ing on the mate­r­i­al in some cas­es. There have been peo­ple who have tak­en cer­tain col­lec­tions of mate­r­i­al and put them in SGML markup, for exam­ple. There’s a com­pa­ny called Chadwyck-Healey that has got SGML markup of some of the key works in late ancient and ear­ly Medieval Latin. I’m told they’ve spent sev­er­al mil­lion dol­lars cre­at­ing that database.

Malamud: Well putting ear­ly Latin works into SGML is some­how appropriate.

Lynch: Yes. It is.

Malamud: Is SGML going to be the lan­guage of the future? Are we gonna expect at least future books to be cod­ed in SGML and post­ed on the net in that language?

Lynch: I think you will see some use of SGML. Now whether they’re post­ed on the net in that form is a very inter­est­ing ques­tion. If you talk to for instance many of the large sci­en­tif­ic and tech­ni­cal pub­lish­ers, com­pa­nies like Elsevier or Springer-Verlag, their edi­to­r­i­al process­es now are being con­vert­ed and upgrad­ed in many cas­es to cre­ate SGML mate­r­i­al as part of the pro­duc­tion of the jour­nals. However, it’s not at all clear that they are going to mar­ket mate­r­i­al in SGML form. In some ways they’re think­ing of this as an inter­nal data­base out of which they can spin mul­ti­ple products. 

One of the things that per­haps I should­n’t be sur­prised at but I’ve found a bit sur­pris­ing in talk­ing to some pub­lish­ers about the tran­si­tion to the age of elec­tron­ic infor­ma­tion is that they are very con­cerned about pre­sen­ta­tion integri­ty of their mate­r­i­al. These are print pub­lish­ers. And they are quite hor­ri­fied at the thought of dis­trib­ut­ing SGML and hav­ing con­sumers or repack­agers of that infor­ma­tion do say type­set­ting on the fly or oth­er types of refor­mat­ting to adapt it to dif­fer­ent dis­play envi­ron­ments. They feel that’s a loss of con­trol of their infor­ma­tion that could threat­en the per­cep­tion of the qual­i­ty of their pub­li­ca­tions, and they’re very ner­vous about it. In that sense, one finds them a lot more san­guine about dis­trib­ut­ing bitmapped images.

Malamud: You’re lis­ten­ing to Geek of the Week. Support for this pro­gram is pro­vid­ed by O’Reilly & Associates, rec­og­nized world­wide for defin­i­tive books on the Internet, Unix, the X Window System, and oth­er tech­ni­cal topics. 

Additional sup­port for Geek of the Week comes from Sun Microsystems. Sun, the net­work is the computer.

We’re talk­ing to Cliff Lynch, Director of Library Automation at the University of California. Cliff, we’ve been look­ing at the ques­tion of bitmap images ver­sus SGML as a way of mov­ing data out onto the net­work. Do you think pub­lish­ers will ever put their data out on the net­work in revis­able form?

Lynch: Well, it’s clear of course that they’re con­vert­ing to revis­able forms and SGML seems to be a pop­u­lar one inside their own process­es. I know that some of the large sci­en­tif­ic and tech­ni­cal pub­lish­ers, peo­ple like Elsevier and Springer-Verlag, are invest­ing heav­i­ly in such con­ver­sions at this point. However it’s not clear they’re real­ly going to mar­ket this out­side of their com­pa­nies. They may use this as a data­base to spin off a series of prod­ucts. One of the things that I’ve found in talk­ing to pub­lish­ers, par­tic­u­lar­ly pub­lish­ers of cost­ly schol­ar­ly jour­nals, is that they have a great con­cern about the pre­sen­ta­tion integri­ty of their mate­r­i­al. They come from a print world where they invest a lot of mon­ey in nice type­set­ting, attrac­tive art­work, qual­i­ty paper and print­ing, and they’re very con­cerned about mov­ing into an elec­tron­ic world where peo­ple are retype­set­ting their mate­r­i­al on the fly, leav­ing out pic­tures, or oth­er­wise pre­sent­ing a poor image of that mate­r­i­al. So, they feel I think to some extent more com­fort­able with bitmaps as a way of con­trol­ling the integri­ty of the pre­sen­ta­tion of their material.

Malamud: I know as a pro­gram­mer, when I’m imple­ment­ing pro­grams and I’m look­ing at net­work stan­dards, which are a library of doc­u­ments, I want the revis­able form. I want be able to plop out the def­i­n­i­tion of an object and stick it into my code. Is there a way to bal­ance the pre­sen­ta­tion integri­ty with the need for a revis­able form on the network?

Lynch: Well, first let me say I absolute­ly agree with you on the need for revis­able form. And it’s quite inter­est­ing to see what’s hap­pen­ing inside the schol­ar­ly com­mu­ni­ty. For exam­ple there’s an activ­i­ty called the Text Encoding Initiative, which is pri­mar­i­ly schol­ars in com­put­ing and the human­i­ties from…about twenty-two I think pro­fes­sion­al asso­ci­a­tions are involved in it. And what they’re doing is defin­ing a set of SGML tags to essen­tial­ly sup­port [depth?] markup that would be suit­able for computer-driven lin­guis­tic analy­sis, things like deep text analy­sis of vari­ant edi­tions of clas­sic texts, things such—

Malamud: What does that mean? What is deep text analysis?

Lynch: Well, the thing would be for exam­ple think of Shakespeare. Now, many of his plays exist in mul­ti­ple ver­sions and schol­ars are very con­cerned with the vari­a­tion between ver­sions. They’re con­cerned with the way words are used through­out the ver­sions. They’re con­cerned with allu­sions or sto­ries that fol­low on the words. Shakespeare’s plays derive to some extent from oth­er plays, like say the Spanish revenge tragedies. So there’s a thought that they will be able to devel­op cor­po­ra of things like a Shakespeare play in all its ver­sions, and you’d have a very intel­li­gent view­er which would allow you to say things like, I’d like to see all the ver­sions inter­cut here,” or, I’d like to see this as it was in the First Folio,” and then, No, I’d like to see it as it was in the Second Folio.” They’re start­ing to do real inter­est­ing things like that. Now this is not an area where I’m expert, par­tic­u­lar­ly, but I do find it inter­est­ing that there is so much ener­gy being devot­ed to com­ing up with schemes to real­ly cap­ture this kind of intel­lec­tu­al con­tent and make it wide­ly avail­able for the schol­ar­ly community.

Certainly this sort of thing under­scores the need for mate­r­i­al in revised form. Now, I think we can hope that as pub­lish­ers become more com­fort­able with the net­worked envi­ron­ment that we will see them becom­ing more com­fort­able with dis­trib­ut­ing revis­able form mate­r­i­al. Because it’s not just the pre­sen­ta­tion integri­ty issue, they’re also wor­ried that in some sense the revis­able form is eas­i­er to steal than a bitmapped image and is more valu­able in some sense than a bitmapped image. And cer­tain­ly pub­lish­ers have many many con­cerns about their abil­i­ty to con­trol their intel­lec­tu­al prop­er­ty on a net­worked environment.

Malamud: Is it pos­si­ble that the cur­rent gen­er­a­tion of pub­lish­ers, the firms like Prentice Hall and Addison-Wesley, just won’t sur­vive the tran­si­tion and it’s gonna take a new kind of pub­lish­er, a new type of firm, to be able to han­dle pub­lish­ing in the next twen­ty or thir­ty years?

Lynch: Um, I think that one use­ful way to think about this is a set of analo­gies that I first heard from Peter Lyman and Paul Peters, which was a way of think­ing about the intro­duc­tion of new tech­nolo­gies is going through a stage of mod­ern­iza­tion, where you essen­tial­ly take what you’re doing already and do it more effi­cient­ly by apply­ing tech­nol­o­gy. And then pro­ceed­ing from there to inno­va­tion which is where you use the tech­nol­o­gy to do fun­da­men­tal­ly new things that you could­n’t do pri­or to that tech­nol­o­gy. And then ulti­mate­ly up to a trans­for­ma­tion­al stage where you’ve kind of absorbed the tech­nol­o­gy into your process­es and the things you do, and it starts fun­da­men­tal­ly chang­ing those. 

Now, I think we can view a lot of what’s hap­pen­ing right now with the rela­tion­ship between the tra­di­tion­al print pub­lish­ers and the net­worked infor­ma­tion world as mod­ern­iza­tion. For exam­ple they’re think­ing basi­cal­ly about things where the user inter­face of choice is still paper. And we may talk about putting it on the net­work, stor­ing it, and trans­port­ing it through the net­work. But the pre­sump­tion at least right now is for any­thing oth­er than pret­ty casu­al brows­ing much of this will be print­ed back very close to the end user onto paper. That in my view is real­ly just sort of a mod­ern­iza­tion activity. 

We’re start­ing to see inno­v­a­tive things that begin to explore the sort of indige­nous­ly new capa­bil­i­ties of the elec­tron­ic media, and those range from the sort of thing you’re doing with Internet Talk Radio through some of the mul­ti­me­dia things. Much of the mul­ti­me­dia stuff that’s most inter­est­ing I think up till now has been on stand­alone work­sta­tions, often using CDs or var­i­ous kinds of video disks. I think we’re going to see that change pret­ty quick­ly now that the net­work is get­ting faster or the stan­dards are com­ing along, and mul­ti­me­dia will become much more of a net­worked com­mon­place. As that hap­pens, I think peo­ple will start explor­ing more of those possibilities. 

Even in the sort of text-constrained net­work that we’ve been accus­tomed to, we have seen a num­ber of cre­ative peo­ple do inter­est­ing things to look at what you can do with the tra­di­tion­al jour­nal as a point of depar­ture mov­ing into the net­worked envi­ron­ment, things like very heavily-linked cita­tions back­wards and for­wards from arti­cle to arti­cle. The abil­i­ty to gath­er up read­ers’ com­ments and reac­tions and attach them to a pri­ma­ry arti­cle, those sorts of things. Those are only the begin­ning of the inno­va­tion we’ll see.

Now, going back to the ques­tion about pub­lish­ers, it’s clear just giv­en the body of rights that these pub­lish­ers con­trol, their brand name recog­ni­tion if you will, and cer­tain­ly in schol­ar­ly areas there is very much of a brand name recog­ni­tion on cer­tain jour­nals as very pres­ti­gious jour­nals to pub­lish in, these will clear­ly moved for­ward and mod­ern­ize into the net­worked envi­ron­ment. And I think they’ll be with us for a long time. How many of those pub­lish­ers are pre­pared to take lead­er­ship posi­tions in explor­ing real­ly inno­v­a­tive uses of the net­work I think is an open ques­tion. And my guess is that we’ll see a whole new set of pub­lish­ing and infor­ma­tion cre­ation indus­tries com­ing up along­side the old ones on the net. A few of those will be tra­di­tion­al pub­lish­ers or infor­ma­tion pro­duc­ers more gen­er­al­ly sort of rein­vent­ing them­selves, oth­ers will be new upstart firms. 

And I would­n’t focus just on the pub­lish­ers. There’s been a lot of buy­ing and sell­ing of things like vaults from large movie hous­es and those sorts of things, which could be quite inter­est­ing in a mul­ti­me­dia world.

Malamud: Do you think an indi­vid­ual can be a pub­lish­er? AJ Liebling as has said that free­dom of the press belongs to those who own one. Do you think we’re enter­ing an era where the indi­vid­ual can be a pub­lish­er, or are there pro­fes­sion­al skills that you have to learn before you can be one?

Lynch: Well, I think we are already at the point where an indi­vid­ual on the net­work can very casu­al­ly become a pub­lish­er and many do. You don’t have to be a rock­et sci­en­tist or to invest very much to set up a mod­est FTP archive. Or in the most min­i­mal case, you can think of peo­ple sim­ply set­ting up mail­ing reflec­tors and send­ing mail into them as being pub­lish­ers in a cer­tain sense. So, clear­ly the net­work has moved for­ward the democ­ra­ti­za­tion of publishing. 

Now, we should point out it’s not that hard to be a pub­lish­er in print any­more, either, giv­en that there’s a copy place on every cor­ner and they’re not real­ly that expensive.

Malamud: It’s easy to be a bad publisher.

Lynch: Um, it’s espe­cial­ly easy to be a bad pub­lish­er. And we see a lot of badly-published things in print. I think it’s very chal­leng­ing in the net­work envi­ron­ment because we don’t know exact­ly what it means to be a good pub­lish­er, nor do we have as many exem­plars as we do in the print world. Certainly some of the integri­ty issues about being a good pub­lish­er clear­ly extend across all media, but there are some issues I think that are per­haps more spe­cif­ic to the net­worked envi­ron­ment which we’re still understanding. 

Malamud: You’re lis­ten­ing to Geek of the Week. Support for this pro­gram is pro­vid­ed by Sun Microsystems. Sun Microsystems, open sys­tems for open minds.

Additional sup­port for Geek of the Week comes from O’Reilly & Associates, pub­lish­ers of books that help peo­ple get more out of computers.

Cliff Lynch, you’ve been active in the Coalition for Networked Information, a body that brings togeth­er librar­i­ans, and aca­d­e­m­ic com­put­er cen­ter man­agers, and archi­tects, and a wide vari­ety of groups. What’s the pur­pose of the of the Coalition?

Lynch: Well, the stat­ed pur­pose of the Coalition in its char­ter, in the short form, is basi­cal­ly to advance schol­ar­ship and intel­lec­tu­al pro­duc­tiv­i­ty through the use of infor­ma­tion tech­nol­o­gy and specif­i­cal­ly by exploit­ing the promise of networks.

Malamud: Could you trans­late that into action items? What do they do?

Lynch: Well, this is tricky to trans­late into action items because some of the action items I think tend to be short-range, some tend to be long-range, but I think it’s impor­tant to have that gen­er­al context. 

Now, to under­stand a lit­tle about the Coalition, it was formed back in 1989, towards the end of 89 by CAUSE, Educom, and the Association for Research Libraries, which is a group of the about 110 biggest libraries in North America. Now, if you think back to that time, we were in one of the cycles with the Gore bill at that time, and we were…many of us in the high­er edu­ca­tion com­mu­ni­ty were very hope­ful that it was going to make it through Congress. Now I guess it ulti­mate­ly made it through Congress on the next round, not on that round, but cer­tain­ly by 89 I think that was the sec­ond or third incar­na­tion of the Gore bill and the high­er edu­ca­tion com­mu­ni­ty had been on board for a year or two. I believe the idea of an NREN real­ly start­ed about 87 and it was orig­i­nal­ly to be most­ly nation­al research net­works. The high­er edu­ca­tion com­mu­ni­ty got into it in 87, 88 and under­scored the role of edu­ca­tion. The library com­mu­ni­ty began to wake up to it about 88, 89 and start­ed ask­ing ques­tions about what’s the appro­pri­ate role for libraries in here.

Backing off from that one step, there was this sort of emp­ty feel­ing in cer­tain peo­ple’s stom­achs as it occurred to them that they might actu­al­ly get this NREN con­cept moved ahead, cre­ate this research and edu­ca­tion net­work, they’d get all the sci­en­tists and schol­ars on it who would have a won­der­ful week send­ing elec­tron­ic mail to each oth­er, and then ask, Well, where’s the world’s lit­er­a­ture? Where are the infor­ma­tion resources? What can I do real­ly do with this thing besides send elec­tron­ic mail to my col­leagues and maybe use a super­com­put­er which I might or might not be inter­est­ed in?” So there was a lot of empha­sis on get­ting a focused group togeth­er which includ­ed library peo­ple, infor­ma­tion tech­nol­o­gists, and also peo­ple like pub­lish­ers, to start talk­ing about what do we need to do to real­ly increase the amount of con­tent acces­si­ble through the net­work, and to pro­vide tools to allow peo­ple to locate con­tent, nav­i­gate from resource to resource, and to real­ly use these elec­tron­ic resources. 

So that was a lot of the sort of themes that were float­ing around at the time the Coalition was formed. And of course it was­n’t just schol­ar­ly infor­ma­tion, the Coalition is very inter­est­ed in improv­ing access to gov­ern­ment infor­ma­tion at all lev­els as well, just to take one more example. 

Now, the Coalition is pur­su­ing a wide range of activ­i­ties. These range from sort of policy-related activ­i­ties look­ing at some of the ini­tia­tives involved in things like the GPO Window Bill, and in some cas­es pro­vid­ing tes­ti­mo­ny to Congress on these sorts of things, help­ing the par­ent orga­ni­za­tions and the insti­tu­tions to for­mu­late pol­i­cy posi­tions on these. 

At the oth­er end of the spec­trum, there are some sub­stan­tial­ly more tech­ni­cal things that the coali­tion’s been involved in. Things like there’s a work­ing group on direc­to­ries which has been doing some­thing called the Top Node Project, which is an attempt to start under­stand­ing what sort of data ele­ments you want to describe net­worked infor­ma­tion resources. I lead a group that does archi­tec­tures and stan­dards, and one of the main things that we’ve been con­cerned with is inter­op­er­abil­i­ty issues. Much as I think the IETF has always had a strong theme of inter­op­er­abil­i­ty, libraries and infor­ma­tion providers have start­ed to real­ize that in order to do infor­ma­tion access in a dis­trib­uted envi­ron­ment, inter­op­er­abil­i­ty is going to be absolute­ly crit­i­cal if this is gonna work and mar­kets are going to be cre­at­ed and peo­ple are going to be able to have access to these resources. 

So, my group has been look­ing a lot at inter­op­er­abil­i­ty issues in a pro­to­col called Z39.50 that you may have bumped into at some point.

Malamud: You’re lis­ten­ing to Geek of the Week. Support for this pro­gram is pro­vid­ed by Sun Microsystems. Sun Microsystems, open sys­tems for open minds.

Additional sup­port for Geek of the Week comes from O’Reilly & Associates, pub­lish­ers of books that help peo­ple get more out of computers.

Z39.50 is a library automa­tion pro­to­col. It’s often referred to that way and that’s one of those neb­u­lous sets of phras­es strung togeth­er that describe absolute­ly noth­ing. Can you give us a bet­ter descrip­tion of what Z3950 does?

Lynch: Yeah. I mean it’s actu­al­ly a real tragedy and also a real irony that it has been char­ac­ter­ized to the extent it has as a library automa­tion pro­to­col. There’s a lot of fun­ny his­to­ry with Z39.50. I guess before we go into that I should just explain a lit­tle bit about what it is. Z39.50 is an appli­ca­tion pro­to­col which deals with infor­ma­tion access and retrieval. Now, I want to dif­fer­en­ti­ate that fair­ly care­ful­ly from things like dis­trib­uted data­bas­es in the sense that a Z39.50 client-server inter­ac­tion is talk­ing about real­ly infor­ma­tion in terms of seman­tic mean­ing not data lay­out. Whereas in a data­base appli­ca­tion you might ask for Column X of a rela­tion­al table, in Z39.50 you speak about things for exam­ple in a bib­li­o­graph­ic con­text like key­words in a title or authors with this last name; things that are in terms of the intel­lec­tu­al con­tent of the infor­ma­tion rather than the specifics of how a giv­en site chose to store it and lay it out in a database.

Malamud: So it’s a way of say­ing I have some key­words I’m inter­est­ed in, give me back all the bib­li­o­graph­ic records that you main­tain that match those keywords.”

Lynch: Yes. It allows you to say things like that although of course with much greater pre­ci­sion because you can, and typ­i­cal­ly do in large data­bas­es, restrict those key­words to cer­tain fields. It’s very impor­tant to have this degree of abstrac­tion because when you look at how com­plex large tex­tu­al or bib­li­o­graph­ic infor­ma­tion bases can be, it’s real­ly imprac­ti­cal on an inter­op­er­abil­i­ty basis to start doing dis­trib­uted data­base access. As a client you have to know far far too much about a very com­pli­cat­ed struc­ture on the serv­er. This moves us up one lev­el of abstrac­tion and keeps us out of a lot of those poten­tial rat holes of details of servers. 

The sort of vision that we have with Z39.50 is that a Z39.50 client should be able to access a wide vari­ety of infor­ma­tion resources through a con­sis­tent user inter­face that might run on a work­sta­tion, might run on a time-shared host. But it would give the user a com­mon view of mul­ti­ple infor­ma­tion resources around the net­work. That’s sort of step one. Step two is since it gives you a com­mon pro­to­col inter­face to infor­ma­tion resources, I believe it’s going to enable the devel­op­ment of all sorts of intel­li­gent client tech­nol­o­gy, since you’ve got sud­den­ly a clean inter­face to retrieve infor­ma­tion from mul­ti­ple sources and you get struc­tured records back rather than try­ing to inter­pret a screen in a ter­mi­nal emu­la­tion as a pro­gram, which as we know does­n’t work well. You can start think­ing about pro­grams that cor­re­late infor­ma­tion from mul­ti­ple sources on behalf of the user, do peri­od­ic search­ing and build per­son­al data­bas­es on ones work­sta­tion, all sorts of things. One of the most you’re tak­ing things right now is there are a lot of over­lap­ping infor­ma­tion sources around, and as human beings try­ing to search com­pre­hen­sive­ly we do a lot of dedup­ing intel­lec­tu­al­ly, which should be turned over where fea­si­ble to com­put­er programs. 

So, that’s a lit­tle bit of the pic­ture we have in our minds for Z39.50. Now, Z39050 is as I said an appli­ca­tions lay­er pro­to­col that was devel­oped under the aus­pices of NISO, the National Information Standards Organization. That’s an ANSI standards-writing body that serves the pub­lish­ing, library, and infor­ma­tion ser­vices com­mu­ni­ty. Z39.50 has some unfor­tu­nate her­itage. It’s writ­ten as an OSI appli­ca­tion lay­er pro­to­col. It has inci­den­tal­ly a par­al­lel inter­na­tion­al pro­to­col ISO 10162 and 10163, which is sort of a sub­set of Z39.50 as done in the US, which is also of course in the OSI framework. 

Very few imple­menters, not sur­pris­ing­ly, are using it in the OSI frame­work. The Library of Congress is doing some­thing with OSI, and the Florida State Center for Library Automation is doing some­thing with the OSI. Several with the ven­dors, the library automa­tion ven­dors, have indi­cat­ed they’re going to do a OSI as well as TCP-based stacks because they believe that’s going to be nec­es­sary to mar­ket in Europe. But with­in the US, cer­tain­ly the main action is on the Internet run­ning this over TCP/IP.

Malamud: So Z39.50 over OSI is one of the state­ments of direc­tion of polit­i­cal cor­rect­ness? Or are there actu­al­ly imple­men­ta­tions out there that do that?

Lynch: Well, there cer­tain­ly is a polit­i­cal cor­rect­ness issue here. And some of it too is a mar­ket response where I think you have to be a lit­tle care­ful about how to inter­pret it. Particularly in Europe, many libraries, par­tic­u­lar­ly nation­al libraries and things like that, still are writ­ing RFPs that say We have to have the OSI.” So, many of the ven­dors try­ing to posi­tion them­selves to be respon­sive to those are say­ing they will do or intend to do or are work­ing on OSI. There’re not too many I think deliv­ered and run­ning because there’re not too many OSI things deliv­ered and run­ning generally.

Malamud: OSI, tomor­row is the future.

Lynch: Right. Now, I think that in some ways the OSI her­itage of Z39.50 has been unfor­tu­nate because the imple­men­tor com­mu­ni­ty spent a lot of time back in the late 80s and ear­ly 90s strug­gling with what to do with this OSI bag­gage and how it fit into the net­worked envi­ron­ment that at least many of us viewed as real­i­ty, which was the Internet. There were peo­ple who believed that the right thing to do was take all the OSI things from trans­port up and run those over TCP, as has been done in sys­tems like ISODE. There were oth­ers of us who real­ly felt that that was kind of an ugly solu­tion, par­tic­u­lar­ly since Z39.50 has the inter­est­ing attribute that it tries to real­ly make use of the pre­sen­ta­tion lay­er and in order to work requires fea­tures of the pre­sen­ta­tion lay­er that do not appear to be imple­ment­ed in any known OSI imple­men­ta­tion. Certainly we know they’re not a ISODE, they’re not in IBM’s OSI/CS prod­uct, things like pre­sen­ta­tion con­text alter­ation on the fly. You can see where you need that when you come into a serv­er that’s got 400 data­bas­es you real­ly don’t want to list a trans­fer syn­tax for records in each of those data­bas­es up front when you open the connection.

Malamud: Or reestab­lish the con­nec­tion each time you switch the type of media.

Lynch: Precisely.

Malamud: Or then say, Well, I’d like to look at micro­fich­es now,” and they’ll say, Well, call us back.”

Lynch: Mm hm. Yeah. This sort of thing is clear­ly a non-starter. So we dis­si­pat­ed a lot of time wor­ry­ing about what to do about this and there were many camps. What we ulti­mate­ly chose to do was to throw out all the lower-level OSI stuff, run Z39.50 direct­ly on top of TCP, and get on with it. And in the last year we’ve seen at least sev­en or eight inter­op­er­a­ble imple­men­ta­tions that were inde­pen­dent­ly devel­oped up and run­ning on the net and talk­ing to each oth­er. It’s real­ly quite grat­i­fy­ing after all these years of talk.

Malamud: This is Geek of the Week, fea­tur­ing inter­views with promi­nent mem­bers of the tech­ni­cal com­mu­ni­ty. Geek of the Week is brought to you by O’Reilly & Associates and by Sun Microsystems. 

This is Internet Talk Radio. You may copy these files and change the encod­ing for­mat, but may not alter the data or sell the pro­grams. You can send us mail at mail@​radio.​com.

Internet Talk Radio, same-day ser­vice in a nanosec­ond world.

Lynch: I would like to believe that some of the stan­dards devel­op­ers are get­ting more real­is­tic. Some of the for­mal stan­dards devel­op­ers, the NISOs and the OSIs of the world. While they’re not ready to say, Well, OSI maybe isn’t going to quite work,” at least some of the appli­ca­tions pro­to­cols are start­ing I believe to think a lit­tle bit in terms of run­ning over mul­ti­ple pro­to­col stacks. Certainly as we’ve done the work on draft­ing the new ver­sion of Z39.50, which we hope to take to bal­lot with­in the next eigh­teen months or so, we’ve put a few things into the pro­to­col that I think will make it a lot eas­i­er to run in a dual-stack envi­ron­ment with min­i­mal changes to the appli­ca­tion’s code if peo­ple real­ly do need to do that. 

Malamud: WAIS, the Wide Area Information Server that was devel­oped by Brewster Kahle and Thinking Machines and is now fair­ly wide­ly avail­able in the pub­lic domain, also comes out of a Z39.50 her­itage. Are those imple­men­ta­tions inter­op­er­a­ble with the Z39.50 that you’re talk­ing about for libraries?

Lynch: As of right now they are not. When Brewster devel­oped WAIS he used some­thing called Z39.501988, or Z39.50 ver­sion 1. That was the first ver­sion and the stan­dard came out in 88, and real­ly was not imple­ment­ed much by the library com­mu­ni­ty. Furthermore, Brewster had to do quite a bit of…what shall we say, car­pen­try, on that ver­sion of the stan­dard in order to get a work­ing appli­ca­tion out of it. So, he sort of took 88 and extend­ed it as he need­ed to build WAIS. Meanwhile the library com­mu­ni­ty was work­ing on what ulti­mate­ly became ver­sion 2 of the stan­dard, or Z39.5092, and imple­menters had been work­ing off drafts of those stan­dards. Unfortunately Brewster did­n’t quite get in cont— Brewster and his folks real­ly did­n’t get togeth­er with the library com­mu­ni­ty till about 91, at which point Brewster already was pret­ty far down the path.

Now, while tech­ni­cal­ly these things won’t inter­op­er­ate, the amount of work nec­es­sary to upgrade WAIS is not great. It’s my under­stand­ing that Brewster’s new com­pa­ny WAIS Incorporated will do a Z.39.5092-compliant ver­sion of WAIS. In addi­tion there is work going on at the Center for Networked Information Resource Discovery and something—it’s CNIDIR—down at North Carolina—George Brett, Jim Fulton, and those folks—on tak­ing the exist­ing WAIS sys­tem, doing a lot of upgrad­ing to it, and putting it on Z39.5092. That work is fair­ly well down the pike as I under­stand it, so I expect in 1993 to see prob­a­bly sev­er­al inter­op­er­a­ble WAIS imple­men­ta­tions. This is going to pro­duce some kind of weird things because you should be able to take your stan­dard WAIS client and point it at your library cat­a­log if you want, if that speaks Z39.50, or to log on to one of the online cat­a­logs on the Internet and run that inter­face against WAIS data­bas­es. There’s going to be a lot more mix­ing and match­ing of inter­faces as this gets down the pike. 

Malamud: You stand at the cusp between the library and the net­work­ing com­mu­ni­ties. Currently there’s a nation­al debate about a National Research and Education Network ask­ing whether that net­work is a few very fast pipes for leading-edge researchers, or whether the National Research and Education Network means get­ting net­work­ing con­nec­tiv­i­ty out to every­one. Are those two goals fun­da­men­tal­ly opposed to each oth­er? Can we have one NREN that solves both the researchers and the kinder­garten kids?

Lynch: Well, um, I think tech­ni­cal­ly you can have one nation­al or inter­na­tion­al net­work that serves that range of con­stituen­cies. Um, cer—

Malamud: Can we afford it?

Lynch: Certainly just before we get off the tech­ni­cal thing, the key to get­ting away with that is the whole con­cept of inter­net­work­ing. Logically it may look like one net­work, you may be able to do appli­ca­tions across all com­po­nents of it, but it may be mul­ti­ple con­stituent net­works serv­ing as dif­fer­ent constituencies. 

Now, can we afford it? What are our pub­lic pol­i­cy pri­or­i­ties here? That’s a real good ques­tion. If you look at the NREN legislation—what is it, Public Law 102194, I guess. That’s pret­ty clear about the NREN being a place that’s hos­pitable to libraries, to K through 12, to state and local gov­ern­ment. That calls out lots and lots of groups that are wel­come on the NREN. It’s a lot vaguer about whether it’s going to fund any of them to get on the NREN. I mean it’s kind of a curi­ous thing because it says, Well, if you can find your way on we’re glad to see you here. And we’ll send out the wel­come wag­on for you.” But it does­n’t say that it’s gonna fund it.

Now, I think there’s some things that need to be under­scored there. If you look at many of those communities—libraries, K through 12 par­tic­u­lar­ly, which are both big com­mu­ni­ties, rel­a­tive­ly lit­tle of their fund­ing comes from fed­er­al sources. These have tra­di­tion­al­ly been fund­ed more at the state and local lev­el than at the fed­er­al lev­el. It’s not clear that it’s nec­es­sary or that it’s going to be polit­i­cal­ly accept­able, par­tic­u­lar­ly in the cur­rent bud­get cli­mate, for the fed­er­al gov­ern­ment to take on the respon­si­bil­i­ty of net­work­ing these com­mu­ni­ties. And cer­tain­ly in some states I would say the states are step­ping up to the chal­lenge pret­ty aggressively—in Texas, for exam­ple, in the K through 12 world. 

So I think that you may see a lot of these peo­ple get­ting on through state and local ini­tia­tives. If we fol­low the mon­ey cer­tain­ly most of the NREN mon­ey so far, with the excep­tion of some con­nec­tiv­i­ty grants out of NSF, has been aligned along the high-performance com­put­ing and com­mu­ni­ca­tions axis and real­ly has been about let’s con­nect a few high-end, fast, advanced applications. 

Malamud: But you think that local gov­ern­ments and state gov­ern­ments should begin tak­ing respon­si­bil­i­ty for net­work­ing their com­mu­ni­ties and their states?

Lynch: I think they have to.

Malamud: This has been Clifford Lynch on Geek of the Week. Thanks.

Lynch: My pleasure.

Malamud: This has been Geek of the Week. Brought to you by Sun Microsystems and by O’Reilly & Associates. To pur­chase an audio cas­sette or audio CD of this pro­gram, send elec­tron­ic mail to radio@​ora.​com.

Internet Talk Radio. The medi­um is the message.