Hi, every­one. Thank you for hav­ing me hav­ing me here. My name is An Xiao Mina. You can call me An, and I’m a prod­uct designer and an inde­pen­dent researcher and writer based in the Bay, but orig­i­nally from LA. So I’m psy­ched to be back here.

Today I’m talk­ing about dif­fer­ent sorts of divides, specif­i­cally around lan­guage divides, and some biases around lan­guage that exist in our tech­nolo­gies and our tech­no­log­i­cal spaces. I wanted take a moment to imag­ine this next bil­lion” group of peo­ple who are com­ing online and the sheer diver­sity of lan­guages that they’re speak­ing. It’s hun­dreds and thou­sands of dif­fer­ent lan­guages. And one lan­guage might be Khmer, a Cambodian lan­guage. A col­league of mine, researcher Ben Valentine, (he’s based in Cambodia) pointed out that when he’s look­ing at Khmer web sites with cer­tain browsers, the very lan­guage, which has its own cus­tom script, appears like this:

It looks like boxes. So lit­er­ally the lan­guage of Khmer is invis­i­ble to many tech­nolo­gies. It’s just one exam­ple of how the lan­guage that you speak shapes the Internet that you have access to, both as a reader and as a speaker.

When we think about net­work graphs and we talk about how the net­work effects that make up an impor­tant part of how social move­ments and how infor­ma­tion is dis­trib­uted online, there’s this assump­tion in those visu­al­iza­tions that every node in that net­work is equal. But very often, and you can slice data in many dif­fer­ent way, the lan­guages that we speak actu­ally limit the net­works that we have access to and that we’re inter­act­ing with. 

This is a visu­al­iza­tion from 2010 by Mike McDandless, who’s a researcher who scraped the Twitter data for the lan­guages that peo­ple are speak­ing based on their loca­tion that they’re tweet­ing from, and Eric Fischer then visu­al­ized this. And you can see how the lan­guages that peo­ple are speak­ing (each color rep­re­sents a dif­fer­ent lan­guage), it falls along geopo­lit­i­cal lines. And this is not peo­ple just speak­ing Italian because they’re in Italy, and we’re not visu­al­iz­ing what peo­ple are speak­ing based on this map. It’s actu­ally the lan­guage itself recre­ates the map of Europe. And you can expand this into other coun­tries and other regions as well.

This can have an effect. One spe­cific exam­ple of this. So, often peo­ple talk about the impor­tance of Wikipedia and the impor­tance of open knowl­edge and open access to knowl­edge and the abil­ity to con­tribute to a col­lec­tive data­base of knowl­edge. Wikipedia has built-in trans­la­tion fea­tures, it allows peo­ple to con­tribute lan­guage and trans­la­tion. But again, if you’re speak­ing a minor­ity lan­guage, your access to that knowl­edge can be severely lim­ited. These are the num­bers of arti­cles avail­able for dif­fer­ent lan­guages.

If you’re speak­ing major­ity lan­guages, or lan­guages for peo­ple who’ve made a con­certed effort to trans­late that con­tent, you have access to mil­lions of arti­cles and it’s a great data­base. But if you’re speaking—especially minor­ity Asian and African lan­guages, that num­ber starts to drop sig­nif­i­cantly. Ten thou­sand for Afrikaans, Tagalog, Kiswahili, and down to a hun­dred for even smaller minor­ity lan­guages. We can expect sim­i­lar pat­terns, I think, with other web sites and other sorts of con­tent, Wikipedia being just one exam­ple.

And then in addi­tion to read­ing, it’s also the access to voice. I think a lot of us are famil­iar with the Internet in build­ing social move­ments and the abil­ity to amplify one’s voice. Certainly the Umbrella Movement in Hong Kong and Black Lives Matter here in the US rely on the abil­ity to broad­cast a mes­sage, to use hash­tags, to amplify a voice and cre­ate a pipeline from social media to main­stream media, and then hope­fully to other audi­ences.

And cer­tainly we can think about major hash­tags and major move­ments that’ve been in English or a major­ity lan­guage. #TweetLikeAForeignJournalist in Kenya was a cri­tique of media cov­er­age of East Africa. And then #JeSuisCharlie, a sim­ple enough French phrase for peo­ple to remem­ber and to under­stand.

But there are a num­ber of other move­ments in other lan­guages that are more dif­fi­cult to under­stand, and get sig­nif­i­cantly less atten­tion. #sas­soufit in Congo. There’s a gau wu (#鳩嗚) move­ment that’s part of the Hong Kong Umbrella Movement, but is a sort of sep­a­rate group with sort of dif­fer­ent aims and strate­gies. #lumaddi­nako, that’s in the Philippines. And then [#مصر_بتفرح] means Egypt delights,” a par­ody hash­tag which I’ll talk about a lit­tle later. These sorts of move­ments and con­ver­sa­tions are often lim­ited to the lan­guage sphere that they’re in, because they’re often work­ing with minor­ity lan­guages.

Just to illus­trate this even fur­ther, I just love this quote from Sarah Kendzior, who’s a writer on social jus­tice in Middle America and Central Asia. She’s speak­ing about the kind of quan­dries that an Uzbek activist might have to go through to raise aware­ness for their cause. I just want to read through the whole descrip­tion, because it really shows you some of the chal­lenges with ampli­fy­ing voice when your lan­guage is not very well rep­re­sented in tech­no­log­i­cal plat­forms, and there’s no pipeline for trans­lat­ing those lan­guages into main­stream and major­ity media.

If she knows Russian, she has to decide whether writ­ing in Russian—and poten­tially reach­ing an inter­na­tional audi­ence as well as the 41 per­cent of Uzbeks who can read Russian—outweighs not being able to reach non-Russian speak­ing Uzbeks or seem­ing to value a for­eign lan­guage over one’s native tongue.
Sara Kendzior, Can Minor Languages Make Revolution?”

So even the deci­sion to speak Russian over Uzbek, even though there are ben­e­fits to that ampli­fi­ca­tion, there are polit­i­cal con­se­quences to not speak­ing in Uzbek. And here’s where the avail­abil­ity of fonts, typog­ra­phy, and input sys­tems of the Uzbek lan­guage have con­se­quences for polit­i­cal action. 

If she writes in Uzbek, she has to choose which alphabet—Cyrillic, to reach older gen­er­a­tions and Uzbeks in neigh­bor­ing for­mer Soviet republics who only know the Cyrillic ver­sion? Or Latin, to reach the younger read­ers who com­prise the bulk of Uzbekistan’s Internet users?
Sara Kendzior, Can Minor Languages Make Revolution?”

So these sorts of dilem­mas are much more com­mon when you’re speak­ing a minor­ity lan­guage, espe­cially if that lan­guage has non-Latin script.

As a designer as well as a prod­uct thinker, I’m also think­ing about what are poten­tial solu­tions. And for provo­ca­tion and for con­ver­sa­tion, I wanted to throw out some poten­tial ideas for how we can think about improv­ing lan­guage inclu­sion [and] lan­guage access across the world and also here in the Unites States for peo­ple who are speak­ing many dif­fer­ent lan­guages.

One of the pos­si­bil­i­ties here is crowd­sourc­ing. Crowdsourcing cer­tainly has a lot of prob­lem­at­ics. But when you think about the pos­si­bil­i­ties of trans­la­tion, machine trans­la­tion can scale very quickly but it’s often inac­cu­rate. Anyone who’s done trans­la­tions, even between English and Spanish…it leads to much hilar­ity. At the same time, the trans­la­tion model as cur­rently exists just sim­ply can­not scale for the sort of con­tent and con­ver­sa­tions that need to be trans­lated.

And again, crowd­sourc­ing can have its prob­lems. This is not a crowd­sourced sub­ti­tle. This was actu­ally a famous meme, All Your Base Are Belong to Us. But it’s the sort of risk that hap­pens when fan­sub­bing com­mu­ni­ties trans­late pop­u­lar media. Fansubbing is fan sub­ti­tling. So an exam­ple of trans­lat­ing anime movies into English, or trans­lat­ing American English movies into Chinese can be done by com­mu­ni­ties, but you have to have a great deal of faith and trust that those trans­la­tions will be accu­rate.

At the same time, the fan­sub­bing com­mu­ni­ties can be very suc­cess­ful, and there are more for­mal­ized ways of doing crowd­sourced trans­la­tion that also seem to be hav­ing some uptake. TED has the Open Translation Project, where hun­dreds of vol­un­teers who are trans­lat­ing into hun­dreds of lan­guages can trans­late these videos. And we can see sim­i­lar exam­ples with sites like viki​.com, where peo­ple can trans­late con­tent.

And I think part of the risk of crowd­sourc­ing of course is the risk of free labor, and I think we need to talk about what fair com­pen­sa­tion looks like. But at the same time a broader model for trans­la­tion can help ensure that con­tent reaches other lan­guages. Yeeyan in China is another crowd­sourced site where peo­ple were trans­lat­ing arti­cles from English into Chinese as an impor­tant way of increas­ing access for sites like The Economist. It was shut down, and it’s kind of at a neu­tral space right now, but it’s an exam­ple of the poten­tial for this.

And then my own expe­ri­ence is build­ing a light plat­form for trans­lat­ing the Chinese artist Ai Weiwei from Chinese into English and his tweets, which back in 2009/2010 were very rarely under­stood by English speak­ing media, despite the fact that he had a major media pres­ence. This model kind of shows that maybe just five trans­la­tors can have an impact with 31,000 fol­low­ers. So it doesn’t take a lot, but it does take moti­va­tion; it does take inter­est. And we’re try­ing to pro­duc­tize that at Meedan with a prod­uct called Bridge that allows for crowd­sourced trans­la­tion around social media.

Screenshot of ">Amharic alphasyllabary via Wikipedia

Screenshot of Amharic alpha­syl­labary via Wikipedia

Secondly, we need to change the struc­ture. Language inequity is a full stack prob­lem. I think you can trans­late all the things, but if your lan­guage is not sup­ported, if your font— This is the Amharic alpha­bet, with over 300 let­ters or alpha­syl­la­bles, and we have to design bet­ter ways to input these lan­guages. We need to design bet­ter ways to read them, to access them. And we just need a bet­ter struc­ture for sup­port­ing lan­guages, ensur­ing that they can be read and input, espe­cially on mobile devices.

One pos­si­bil­ity (this is a pic­ture of Leon Messi inter­act­ing with an app called WeChat) is we also need to think about audio inter­ac­tions and audio input. A researcher friend of mine, Christina [?] pointed out that QR codes are very pop­u­lar in China as a form of input because the very act of typ­ing in a Chinese URL can be bur­den­some. So it’s much eas­ier to take a screen­shot of a QR code. So we need to think about dif­fer­ent inter­ac­tions, and espe­cially when we get to lan­guages that may not have a for­mal writ­ten form or any writ­ten form. Audio inter­ac­tion and oral engage­ment through tech­nol­ogy I think will be very crit­i­cal and impor­tant.

And just to close, I’ll give one exam­ple of a bit of color that can be exposed through trans­la­tion and why this can be both very excit­ing and inter­est­ing, this process of build­ing our global imag­i­na­tions. Our abil­ity to empathize and inter­act and value peo­ple from other cul­tures, in dif­fer­ent parts of the world that can often be invis­i­ble to the West is through bring­ing out that cit­i­zen con­tent, bring­ing out con­tent that can be inter­est­ing and valu­able.

This is just one exam­ple. The 2011 Wenzhou train crash in China was a major train crash where hun­dreds of peo­ple were killed and injured. This was the sort of event that would’ve been cen­sored in Chinese media because it poten­tially an exam­ple of gov­ern­ment mis­man­age­ment. But the role of social media in bring­ing this out was so com­pelling, and actu­ally telling the spe­cific sto­ries of this is a way of high­light­ing how and why these online engage­ments can be quite pow­er­ful. And it really takes trans­la­tion, though, to high­light these. 

This was one image where some­one Photoshopped this into this mon­ster movie. And you can see it’s a kind of par­ody con­ver­sa­tion. I’d rather believe this than the offi­cial expla­na­tion for the train crash.” There’s dif­fer­ent sorts of memes. You can­not escape the blame of pro­fan­ing the dead.”

Time to dis­em­bark. We’re home.” Just kind of poetic mes­sages. This is a friend of mine, who had Photoshopped the train ticket start­ing point: Hell, des­ti­na­tion point: Hell.” And that speci­ficity of trans­lat­ing this con­tent and bring­ing it out becomes an impor­tant act of jour­nal­ism, I would say.

Just to close, as we think about the role of lan­guage on the Internet, it really biases our expe­ri­ence, and there are a lot of risks and chal­lenges there, espe­cially as peo­ple from the Global South are com­ing online. The abil­ity for them to access con­tent and for them to con­tribute to impor­tant con­ver­sa­tions online will be severely lim­ited. It’ll look more like this, and I think some of the most impor­tant work we can do in tech is to bring it out into lan­guages that they can under­stand.

Thank you so much.


Help Support Open Transcripts

If you found this useful or interesting, please consider supporting the project monthly at Patreon or once via Square Cash, or even just sharing the link. Thanks.