Hi, every­one. Thank you for hav­ing me hav­ing me here. My name is An Xiao Mina. You can call me An, and I’m a pro­duct design­er and an inde­pen­dent researcher and writer based in the Bay, but orig­i­nal­ly from LA. So I’m psy­ched to be back here.

Today I’m talk­ing about dif­fer­ent sorts of divides, specif­i­cal­ly around lan­guage divides, and some bias­es around lan­guage that exist in our tech­nolo­gies and our tech­no­log­i­cal spaces. I want­ed take a moment to imag­ine this next bil­lion” group of peo­ple who are com­ing online and the sheer diver­si­ty of lan­guages that they’re speak­ing. It’s hun­dreds and thou­sands of dif­fer­ent lan­guages. And one lan­guage might be Khmer, a Cambodian lan­guage. A col­league of mine, researcher Ben Valentine, (he’s based in Cambodia) point­ed out that when he’s look­ing at Khmer web sites with cer­tain browsers, the very lan­guage, which has its own cus­tom script, appears like this:

It looks like box­es. So lit­er­al­ly the lan­guage of Khmer is invis­i­ble to many tech­nolo­gies. It’s just one exam­ple of how the lan­guage that you speak shapes the Internet that you have access to, both as a read­er and as a speak­er.

When we think about net­work graphs and we talk about how the net­work effects that make up an impor­tant part of how social move­ments and how infor­ma­tion is dis­trib­ut­ed online, there’s this assump­tion in those visu­al­iza­tions that every node in that net­work is equal. But very often, and you can slice data in many dif­fer­ent way, the lan­guages that we speak actu­al­ly lim­it the net­works that we have access to and that we’re inter­act­ing with. 

This is a visu­al­iza­tion from 2010 by Mike McDandless, who’s a researcher who scraped the Twitter data for the lan­guages that peo­ple are speak­ing based on their loca­tion that they’re tweet­ing from, and Eric Fischer then visu­al­ized this. And you can see how the lan­guages that peo­ple are speak­ing (each col­or rep­re­sents a dif­fer­ent lan­guage), it falls along geopo­lit­i­cal lines. And this is not peo­ple just speak­ing Italian because they’re in Italy, and we’re not visu­al­iz­ing what peo­ple are speak­ing based on this map. It’s actu­al­ly the lan­guage itself recre­ates the map of Europe. And you can expand this into oth­er coun­tries and oth­er regions as well.

This can have an effect. One speci­fic exam­ple of this. So, often peo­ple talk about the impor­tance of Wikipedia and the impor­tance of open knowl­edge and open access to knowl­edge and the abil­i­ty to con­tribute to a col­lec­tive data­base of knowl­edge. Wikipedia has built-in trans­la­tion fea­tures, it allows peo­ple to con­tribute lan­guage and trans­la­tion. But again, if you’re speak­ing a minor­i­ty lan­guage, your access to that knowl­edge can be severe­ly lim­it­ed. These are the num­bers of arti­cles avail­able for dif­fer­ent lan­guages.

If you’re speak­ing major­i­ty lan­guages, or lan­guages for peo­ple who’ve made a con­cert­ed effort to trans­late that con­tent, you have access to mil­lions of arti­cles and it’s a great data­base. But if you’re speaking—especially minor­i­ty Asian and African lan­guages, that num­ber starts to drop sig­nif­i­cant­ly. Ten thou­sand for Afrikaans, Tagalog, Kiswahili, and down to a hun­dred for even small­er minor­i­ty lan­guages. We can expect sim­i­lar pat­terns, I think, with oth­er web sites and oth­er sorts of con­tent, Wikipedia being just one exam­ple.

And then in addi­tion to read­ing, it’s also the access to voice. I think a lot of us are famil­iar with the Internet in build­ing social move­ments and the abil­i­ty to ampli­fy one’s voice. Certainly the Umbrella Movement in Hong Kong and Black Lives Matter here in the US rely on the abil­i­ty to broad­cast a mes­sage, to use hash­tags, to ampli­fy a voice and cre­ate a pipeline from social media to main­stream media, and then hope­ful­ly to oth­er audi­ences.

And cer­tain­ly we can think about major hash­tags and major move­ments that’ve been in English or a major­i­ty lan­guage. #TweetLikeAForeignJournalist in Kenya was a cri­tique of media cov­er­age of East Africa. And then #JeSuisCharlie, a sim­ple enough French phrase for peo­ple to remem­ber and to under­stand.

But there are a num­ber of oth­er move­ments in oth­er lan­guages that are more dif­fi­cult to under­stand, and get sig­nif­i­cant­ly less atten­tion. #sas­soufit in Congo. There’s a gau wu (#鳩嗚) move­ment that’s part of the Hong Kong Umbrella Movement, but is a sort of sep­a­rate group with sort of dif­fer­ent aims and strate­gies. #lumaddi­nako, that’s in the Philippines. And then [#مصر_بتفرح] means Egypt delights,” a par­o­dy hash­tag which I’ll talk about a lit­tle lat­er. These sorts of move­ments and con­ver­sa­tions are often lim­it­ed to the lan­guage sphere that they’re in, because they’re often work­ing with minor­i­ty lan­guages.

Just to illus­trate this even fur­ther, I just love this quote from Sarah Kendzior, who’s a writer on social jus­tice in Middle America and Central Asia. She’s speak­ing about the kind of quan­dries that an Uzbek activist might have to go through to raise aware­ness for their cause. I just want to read through the whole descrip­tion, because it real­ly shows you some of the chal­lenges with ampli­fy­ing voice when your lan­guage is not very well rep­re­sent­ed in tech­no­log­i­cal plat­forms, and there’s no pipeline for trans­lat­ing those lan­guages into main­stream and major­i­ty media.

If she knows Russian, she has to decide whether writ­ing in Russian—and poten­tial­ly reach­ing an inter­na­tion­al audi­ence as well as the 41 per­cent of Uzbeks who can read Russian—outweighs not being able to reach non-Russian speak­ing Uzbeks or seem­ing to val­ue a for­eign lan­guage over one’s native tongue.
Sara Kendzior, Can Minor Languages Make Revolution?”

So even the deci­sion to speak Russian over Uzbek, even though there are ben­e­fits to that ampli­fi­ca­tion, there are polit­i­cal con­se­quences to not speak­ing in Uzbek. And here’s where the avail­abil­i­ty of fonts, typog­ra­phy, and input sys­tems of the Uzbek lan­guage have con­se­quences for polit­i­cal action. 

If she writes in Uzbek, she has to choose which alphabet—Cyrillic, to reach old­er gen­er­a­tions and Uzbeks in neigh­bor­ing for­mer Soviet republics who only know the Cyrillic ver­sion? Or Latin, to reach the younger read­ers who com­prise the bulk of Uzbekistan’s Internet users?
Sara Kendzior, Can Minor Languages Make Revolution?”

So the­se sorts of dilem­mas are much more com­mon when you’re speak­ing a minor­i­ty lan­guage, espe­cial­ly if that lan­guage has non-Latin script.

As a design­er as well as a pro­duct thinker, I’m also think­ing about what are poten­tial solu­tions. And for provo­ca­tion and for con­ver­sa­tion, I want­ed to throw out some poten­tial ideas for how we can think about improv­ing lan­guage inclu­sion [and] lan­guage access across the world and also here in the Unites States for peo­ple who are speak­ing many dif­fer­ent lan­guages.

One of the pos­si­bil­i­ties here is crowd­sourcing. Crowdsourcing cer­tain­ly has a lot of prob­lem­at­ics. But when you think about the pos­si­bil­i­ties of trans­la­tion, machine trans­la­tion can scale very quick­ly but it’s often inac­cu­rate. Anyone who’s done trans­la­tions, even between English and Spanish…it leads to much hilar­i­ty. At the same time, the trans­la­tion mod­el as cur­rent­ly exists just sim­ply can­not scale for the sort of con­tent and con­ver­sa­tions that need to be trans­lat­ed.

And again, crowd­sourcing can have its prob­lems. This is not a crowd­sourced sub­ti­tle. This was actu­al­ly a famous meme, All Your Base Are Belong to Us. But it’s the sort of risk that hap­pens when fan­sub­bing com­mu­ni­ties trans­late pop­u­lar media. Fansubbing is fan sub­ti­tling. So an exam­ple of trans­lat­ing ani­me movies into English, or trans­lat­ing American English movies into Chinese can be done by com­mu­ni­ties, but you have to have a great deal of faith and trust that those trans­la­tions will be accu­rate.

At the same time, the fan­sub­bing com­mu­ni­ties can be very suc­cess­ful, and there are more for­mal­ized ways of doing crowd­sourced trans­la­tion that also seem to be hav­ing some uptake. TED has the Open Translation Project, where hun­dreds of vol­un­teers who are trans­lat­ing into hun­dreds of lan­guages can trans­late the­se videos. And we can see sim­i­lar exam­ples with sites like viki​.com, where peo­ple can trans­late con­tent.

And I think part of the risk of crowd­sourcing of course is the risk of free labor, and I think we need to talk about what fair com­pen­sa­tion looks like. But at the same time a broad­er mod­el for trans­la­tion can help ensure that con­tent reach­es oth­er lan­guages. Yeeyan in China is anoth­er crowd­sourced site where peo­ple were trans­lat­ing arti­cles from English into Chinese as an impor­tant way of increas­ing access for sites like The Economist. It was shut down, and it’s kind of at a neu­tral space right now, but it’s an exam­ple of the poten­tial for this.

And then my own expe­ri­ence is build­ing a light plat­form for trans­lat­ing the Chinese artist Ai Weiwei from Chinese into English and his tweets, which back in 2009/2010 were very rarely under­stood by English speak­ing media, despite the fact that he had a major media pres­ence. This mod­el kind of shows that may­be just five trans­la­tors can have an impact with 31,000 fol­low­ers. So it doesn’t take a lot, but it does take moti­va­tion; it does take inter­est. And we’re try­ing to pro­duc­tize that at Meedan with a pro­duct called Bridge that allows for crowd­sourced trans­la­tion around social media.

Screenshot of ">Amharic alphasyllabary via Wikipedia

Screenshot of Amharic alpha­syl­labary via Wikipedia

Secondly, we need to change the struc­ture. Language inequity is a full stack prob­lem. I think you can trans­late all the things, but if your lan­guage is not sup­port­ed, if your font— This is the Amharic alpha­bet, with over 300 let­ters or alpha­syl­la­bles, and we have to design bet­ter ways to input the­se lan­guages. We need to design bet­ter ways to read them, to access them. And we just need a bet­ter struc­ture for sup­port­ing lan­guages, ensur­ing that they can be read and input, espe­cial­ly on mobile devices.

One pos­si­bil­i­ty (this is a pic­ture of Leon Messi inter­act­ing with an app called WeChat) is we also need to think about audio inter­ac­tions and audio input. A researcher friend of mine, Christina [?] point­ed out that QR codes are very pop­u­lar in China as a form of input because the very act of typ­ing in a Chinese URL can be bur­den­some. So it’s much eas­ier to take a screen­shot of a QR code. So we need to think about dif­fer­ent inter­ac­tions, and espe­cial­ly when we get to lan­guages that may not have a for­mal writ­ten form or any writ­ten form. Audio inter­ac­tion and oral engage­ment through tech­nol­o­gy I think will be very crit­i­cal and impor­tant.

And just to close, I’ll give one exam­ple of a bit of col­or that can be exposed through trans­la­tion and why this can be both very excit­ing and inter­est­ing, this process of build­ing our glob­al imag­i­na­tions. Our abil­i­ty to empathize and inter­act and val­ue peo­ple from oth­er cul­tures, in dif­fer­ent parts of the world that can often be invis­i­ble to the West is through bring­ing out that cit­i­zen con­tent, bring­ing out con­tent that can be inter­est­ing and valu­able.

This is just one exam­ple. The 2011 Wenzhou train crash in China was a major train crash where hun­dreds of peo­ple were killed and injured. This was the sort of event that would’ve been cen­sored in Chinese media because it poten­tial­ly an exam­ple of gov­ern­ment mis­man­age­ment. But the role of social media in bring­ing this out was so com­pelling, and actu­al­ly telling the speci­fic sto­ries of this is a way of high­light­ing how and why the­se online engage­ments can be quite pow­er­ful. And it real­ly takes trans­la­tion, though, to high­light the­se.

This was one image where some­one Photoshopped this into this mon­ster movie. And you can see it’s a kind of par­o­dy con­ver­sa­tion. I’d rather believe this than the offi­cial expla­na­tion for the train crash.” There’s dif­fer­ent sorts of memes. You can­not escape the blame of pro­fan­ing the dead.”

Time to dis­em­bark. We’re home.” Just kind of poet­ic mes­sages. This is a friend of mine, who had Photoshopped the train tick­et start­ing point: Hell, des­ti­na­tion point: Hell.” And that speci­fici­ty of trans­lat­ing this con­tent and bring­ing it out becomes an impor­tant act of jour­nal­ism, I would say.

Just to close, as we think about the role of lan­guage on the Internet, it real­ly bias­es our expe­ri­ence, and there are a lot of risks and chal­lenges there, espe­cial­ly as peo­ple from the Global South are com­ing online. The abil­i­ty for them to access con­tent and for them to con­tribute to impor­tant con­ver­sa­tions online will be severe­ly lim­it­ed. It’ll look more like this, and I think some of the most impor­tant work we can do in tech is to bring it out into lan­guages that they can under­stand.

Thank you so much.

Help Support Open Transcripts

If you found this useful or interesting, please consider supporting the project monthly at Patreon or once via Square Cash, or even just sharing the link. Thanks.