An Xiao Mina:Hi, every­one. Thank you for hav­ing me hav­ing me here. My name is An Xiao Mina. You can call me An, and I’m a prod­uct design­er and an inde­pen­dent researcher and writer based in the Bay, but orig­i­nal­ly from LA. So I’m psy­ched to be back here.

Today I’m talk­ing about dif­fer­ent sorts of divides, specif­i­cal­ly around lan­guage divides, and some bias­es around lan­guage that exist in our tech­nolo­gies and our tech­no­log­i­cal spaces. I want­ed take a moment to imag­ine this next bil­lion” group of peo­ple who are com­ing online and the sheer diver­si­ty of lan­guages that they’re speak­ing. It’s hun­dreds and thou­sands of dif­fer­ent lan­guages. And one lan­guage might be Khmer, a Cambodian lan­guage. A col­league of mine, researcher Ben Valentine, (he’s based in Cambodia) point­ed out that when he’s look­ing at Khmer web sites with cer­tain browsers, the very lan­guage, which has its own cus­tom script, appears like this:

It looks like box­es. So lit­er­al­ly the lan­guage of Khmer is invis­i­ble to many tech­nolo­gies. It’s just one exam­ple of how the lan­guage that you speak shapes the Internet that you have access to, both as a read­er and as a speaker.

When we think about net­work graphs and we talk about how the net­work effects that make up an impor­tant part of how social move­ments and how infor­ma­tion is dis­trib­uted online, there’s this assump­tion in those visu­al­iza­tions that every node in that net­work is equal. But very often, and you can slice data in many dif­fer­ent way, the lan­guages that we speak actu­al­ly lim­it the net­works that we have access to and that we’re inter­act­ing with. 

This is a visu­al­iza­tion from 2010 by Mike McDandless, who’s a researcher who scraped the Twitter data for the lan­guages that peo­ple are speak­ing based on their loca­tion that they’re tweet­ing from, and Eric Fischer then visu­al­ized this. And you can see how the lan­guages that peo­ple are speak­ing (each col­or rep­re­sents a dif­fer­ent lan­guage), it falls along geopo­lit­i­cal lines. And this is not peo­ple just speak­ing Italian because they’re in Italy, and we’re not visu­al­iz­ing what peo­ple are speak­ing based on this map. It’s actu­al­ly the lan­guage itself recre­ates the map of Europe. And you can expand this into oth­er coun­tries and oth­er regions as well.

This can have an effect. One spe­cif­ic exam­ple of this. So, often peo­ple talk about the impor­tance of Wikipedia and the impor­tance of open knowl­edge and open access to knowl­edge and the abil­i­ty to con­tribute to a col­lec­tive data­base of knowl­edge. Wikipedia has built-in trans­la­tion fea­tures, it allows peo­ple to con­tribute lan­guage and trans­la­tion. But again, if you’re speak­ing a minor­i­ty lan­guage, your access to that knowl­edge can be severe­ly lim­it­ed. These are the num­bers of arti­cles avail­able for dif­fer­ent languages.

If you’re speak­ing major­i­ty lan­guages, or lan­guages for peo­ple who’ve made a con­cert­ed effort to trans­late that con­tent, you have access to mil­lions of arti­cles and it’s a great data­base. But if you’re speaking—especially minor­i­ty Asian and African lan­guages, that num­ber starts to drop sig­nif­i­cant­ly. Ten thou­sand for Afrikaans, Tagalog, Kiswahili, and down to a hun­dred for even small­er minor­i­ty lan­guages. We can expect sim­i­lar pat­terns, I think, with oth­er web sites and oth­er sorts of con­tent, Wikipedia being just one example.

And then in addi­tion to read­ing, it’s also the access to voice. I think a lot of us are famil­iar with the Internet in build­ing social move­ments and the abil­i­ty to ampli­fy one’s voice. Certainly the Umbrella Movement in Hong Kong and Black Lives Matter here in the US rely on the abil­i­ty to broad­cast a mes­sage, to use hash­tags, to ampli­fy a voice and cre­ate a pipeline from social media to main­stream media, and then hope­ful­ly to oth­er audiences.

And cer­tain­ly we can think about major hash­tags and major move­ments that’ve been in English or a major­i­ty lan­guage. #TweetLikeAForeignJournalist in Kenya was a cri­tique of media cov­er­age of East Africa. And then #JeSuisCharlie, a sim­ple enough French phrase for peo­ple to remem­ber and to understand.

But there are a num­ber of oth­er move­ments in oth­er lan­guages that are more dif­fi­cult to under­stand, and get sig­nif­i­cant­ly less atten­tion. #sas­soufit in Congo. There’s a gau wu (#鳩嗚) move­ment that’s part of the Hong Kong Umbrella Movement, but is a sort of sep­a­rate group with sort of dif­fer­ent aims and strate­gies. #lumaddi­nako, that’s in the Philippines. And then [#مصر_بتفرح] means Egypt delights,” a par­o­dy hash­tag which I’ll talk about a lit­tle lat­er. These sorts of move­ments and con­ver­sa­tions are often lim­it­ed to the lan­guage sphere that they’re in, because they’re often work­ing with minor­i­ty languages.

Just to illus­trate this even fur­ther, I just love this quote from Sarah Kendzior, who’s a writer on social jus­tice in Middle America and Central Asia. She’s speak­ing about the kind of quan­dries that an Uzbek activist might have to go through to raise aware­ness for their cause. I just want to read through the whole descrip­tion, because it real­ly shows you some of the chal­lenges with ampli­fy­ing voice when your lan­guage is not very well rep­re­sent­ed in tech­no­log­i­cal plat­forms, and there’s no pipeline for trans­lat­ing those lan­guages into main­stream and major­i­ty media.

If she knows Russian, she has to decide whether writ­ing in Russian—and poten­tial­ly reach­ing an inter­na­tion­al audi­ence as well as the 41 per­cent of Uzbeks who can read Russian—outweighs not being able to reach non-Russian speak­ing Uzbeks or seem­ing to val­ue a for­eign lan­guage over one’s native tongue.
Sara Kendzior, Can Minor Languages Make Revolution?”

So even the deci­sion to speak Russian over Uzbek, even though there are ben­e­fits to that ampli­fi­ca­tion, there are polit­i­cal con­se­quences to not speak­ing in Uzbek. And here’s where the avail­abil­i­ty of fonts, typog­ra­phy, and input sys­tems of the Uzbek lan­guage have con­se­quences for polit­i­cal action. 

If she writes in Uzbek, she has to choose which alphabet—Cyrillic, to reach old­er gen­er­a­tions and Uzbeks in neigh­bor­ing for­mer Soviet republics who only know the Cyrillic ver­sion? Or Latin, to reach the younger read­ers who com­prise the bulk of Uzbekistan’s Internet users?
Sara Kendzior, Can Minor Languages Make Revolution?”

So these sorts of dilem­mas are much more com­mon when you’re speak­ing a minor­i­ty lan­guage, espe­cial­ly if that lan­guage has non-Latin script.

As a design­er as well as a prod­uct thinker, I’m also think­ing about what are poten­tial solu­tions. And for provo­ca­tion and for con­ver­sa­tion, I want­ed to throw out some poten­tial ideas for how we can think about improv­ing lan­guage inclu­sion [and] lan­guage access across the world and also here in the Unites States for peo­ple who are speak­ing many dif­fer­ent languages.

One of the pos­si­bil­i­ties here is crowd­sourc­ing. Crowdsourcing cer­tain­ly has a lot of prob­lem­at­ics. But when you think about the pos­si­bil­i­ties of trans­la­tion, machine trans­la­tion can scale very quick­ly but it’s often inac­cu­rate. Anyone who’s done trans­la­tions, even between English and Spanish…it leads to much hilar­i­ty. At the same time, the trans­la­tion mod­el as cur­rent­ly exists just sim­ply can­not scale for the sort of con­tent and con­ver­sa­tions that need to be translated. 

And again, crowd­sourc­ing can have its prob­lems. This is not a crowd­sourced sub­ti­tle. This was actu­al­ly a famous meme, All Your Base Are Belong to Us. But it’s the sort of risk that hap­pens when fan­sub­bing com­mu­ni­ties trans­late pop­u­lar media. Fansubbing is fan sub­ti­tling. So an exam­ple of trans­lat­ing ani­me movies into English, or trans­lat­ing American English movies into Chinese can be done by com­mu­ni­ties, but you have to have a great deal of faith and trust that those trans­la­tions will be accurate.

At the same time, the fan­sub­bing com­mu­ni­ties can be very suc­cess­ful, and there are more for­mal­ized ways of doing crowd­sourced trans­la­tion that also seem to be hav­ing some uptake. TED has the Open Translation Project, where hun­dreds of vol­un­teers who are trans­lat­ing into hun­dreds of lan­guages can trans­late these videos. And we can see sim­i­lar exam­ples with sites like viki​.com, where peo­ple can trans­late content.

And I think part of the risk of crowd­sourc­ing of course is the risk of free labor, and I think we need to talk about what fair com­pen­sa­tion looks like. But at the same time a broad­er mod­el for trans­la­tion can help ensure that con­tent reach­es oth­er lan­guages. Yeeyan in China is anoth­er crowd­sourced site where peo­ple were trans­lat­ing arti­cles from English into Chinese as an impor­tant way of increas­ing access for sites like The Economist. It was shut down, and it’s kind of at a neu­tral space right now, but it’s an exam­ple of the poten­tial for this.

And then my own expe­ri­ence is build­ing a light plat­form for trans­lat­ing the Chinese artist Ai Weiwei from Chinese into English and his tweets, which back in 2009/2010 were very rarely under­stood by English speak­ing media, despite the fact that he had a major media pres­ence. This mod­el kind of shows that maybe just five trans­la­tors can have an impact with 31,000 fol­low­ers. So it does­n’t take a lot, but it does take moti­va­tion; it does take inter­est. And we’re try­ing to pro­duc­tize that at Meedan with a prod­uct called Bridge that allows for crowd­sourced trans­la­tion around social media.

Screenshot of ">Amharic alphasyllabary via Wikipedia

Screenshot of Amharic alpha­syl­labary via Wikipedia

Secondly, we need to change the struc­ture. Language inequity is a full stack prob­lem. I think you can trans­late all the things, but if your lan­guage is not sup­port­ed, if your font— This is the Amharic alpha­bet, with over 300 let­ters or alpha­syl­la­bles, and we have to design bet­ter ways to input these lan­guages. We need to design bet­ter ways to read them, to access them. And we just need a bet­ter struc­ture for sup­port­ing lan­guages, ensur­ing that they can be read and input, espe­cial­ly on mobile devices.

One pos­si­bil­i­ty (this is a pic­ture of Leon Messi inter­act­ing with an app called WeChat) is we also need to think about audio inter­ac­tions and audio input. A researcher friend of mine, Christina [?] point­ed out that QR codes are very pop­u­lar in China as a form of input because the very act of typ­ing in a Chinese URL can be bur­den­some. So it’s much eas­i­er to take a screen­shot of a QR code. So we need to think about dif­fer­ent inter­ac­tions, and espe­cial­ly when we get to lan­guages that may not have a for­mal writ­ten form or any writ­ten form. Audio inter­ac­tion and oral engage­ment through tech­nol­o­gy I think will be very crit­i­cal and important.

And just to close, I’ll give one exam­ple of a bit of col­or that can be exposed through trans­la­tion and why this can be both very excit­ing and inter­est­ing, this process of build­ing our glob­al imag­i­na­tions. Our abil­i­ty to empathize and inter­act and val­ue peo­ple from oth­er cul­tures, in dif­fer­ent parts of the world that can often be invis­i­ble to the West is through bring­ing out that cit­i­zen con­tent, bring­ing out con­tent that can be inter­est­ing and valuable.

This is just one exam­ple. The 2011 Wenzhou train crash in China was a major train crash where hun­dreds of peo­ple were killed and injured. This was the sort of event that would’ve been cen­sored in Chinese media because it poten­tial­ly an exam­ple of gov­ern­ment mis­man­age­ment. But the role of social media in bring­ing this out was so com­pelling, and actu­al­ly telling the spe­cif­ic sto­ries of this is a way of high­light­ing how and why these online engage­ments can be quite pow­er­ful. And it real­ly takes trans­la­tion, though, to high­light these. 

This was one image where some­one Photoshopped this into this mon­ster movie. And you can see it’s a kind of par­o­dy con­ver­sa­tion. I’d rather believe this than the offi­cial expla­na­tion for the train crash.” There’s dif­fer­ent sorts of memes. You can­not escape the blame of pro­fan­ing the dead.”

Time to dis­em­bark. We’re home.” Just kind of poet­ic mes­sages. This is a friend of mine, who had Photoshopped the train tick­et start­ing point: Hell, des­ti­na­tion point: Hell.” And that speci­fici­ty of trans­lat­ing this con­tent and bring­ing it out becomes an impor­tant act of jour­nal­ism, I would say.

Just to close, as we think about the role of lan­guage on the Internet, it real­ly bias­es our expe­ri­ence, and there are a lot of risks and chal­lenges there, espe­cial­ly as peo­ple from the Global South are com­ing online. The abil­i­ty for them to access con­tent and for them to con­tribute to impor­tant con­ver­sa­tions online will be severe­ly lim­it­ed. It’ll look more like this, and I think some of the most impor­tant work we can do in tech is to bring it out into lan­guages that they can understand.

Thank you so much.