Monica Manney 0:02
Welcome back to UVA data points. I'm your host, Monica Manney. And today's episode we're featuring a conversation between RAF Alvarado Allison and Bigelow, RAF and Allison are both professors at UVA researchers, intellectuals and all around awesome people. Their conversation focuses on multiple project, an ongoing research initiative that analyzes the Kiche Mayan book of creation. It's a fascinating and wide ranging conversation that explores the intersections of data science, digital humanities, history, language and culture. Like any well rounded data science project, the Multepal project intersects with all four areas of the model of data science, but it has a particular focus on the area of design. For good description of this area, we spoke with Professor Pete Alonzi.
Pete Alonzi 0:45
Hello there. My name is Peter Alonzi . And I am a data scientist and particle physicist. And I serve as an assistant professor here in the School of Data Science at the University of Virginia. When I think of design, the first thing I think of is how many conversations I've had, trying to explain what it means and how hard it is to do that. It's tricky. Design is where some of the critical decisions are made, that influence how your work comes to pass. Essentially, the test that you have to do is you have to draw connection between the infinite universe and infinite complexities. And the human mind, the human mind is magnificent, but it cannot contain the infinite universe, we have to leave things in, we have to leave things out, we have to ask ourselves, what is the important piece that we need to keep so that the human can understand what is going on in the universe. And I think usually the best way to go forward is with an example. And the one I always used as time. Humans walking around all over the place know what time is, and you probably have a device that helps you understand it, it might be strapped to your arm, it might sit in your pocket. But they're everywhere. And there's a lot of decisions. So I want you to do is think close your eyes and think of a wall clock and picture it. And then when you do that, you probably thought of either an analog or a digital clock. And those are the dominant paradigms is what we would say to be as nerdy as possible. And you know, they have different strengths and weaknesses are pros and cons. When you have a digital clock, it's very easy to be precise, we have great technology, that piezo electric effect, lets us get these little courts actions inside. So we know exactly what time it is. And we can just add more digits. So you can put seconds on it, you can put milli seconds on it, or me in physics land, you can put nano seconds on the clock, and anybody can read it. And we're all gonna agree exactly on what time it is down to the nano second. But there's an enduring popularity to that other kind of clock, the analog clock, The clock that isn't as precise, you can put a second hand on the clock, and you can get down to the seconds. And that's great. But you might even not agree on the minute though. Unless you build an enormous clock so big you can get real close and see right where that hand is you don't have the precision. But there's something else that you do have it gives you a physical representation of the passage of time, as opposed to a completely abstract representation that you get with a digital clock. That is a design choice. When you are making a clock when you are thinking about how people need to use time you make a decision to use an analog or a digital clock. Those are design decisions. That's what we mean when we say design in data science. Think about your digital clock today and your nanoseconds. Think about your analog clock and how the hand sweeps out a physical representation. You can advance your technology, you can miniaturize, you can get that precision it can fit in your pocket. But when you do it and you do your data science and you think about design, you have to ask yourself the question, what do I lose when I make that design choice.
Monica Manney 3:48
And with that, let's get into the conversation with Allison and Raph
Raf Alvarado 3:51
My name is Rafa Alvarado. I'm the Program Director of the residential master's degree in the School of data science. And I'm also an associate professor here and I concentrate in teaching courses on text analytics, and cultural analytics. My background is in digital humanities, which I've been involved with for some time since the 90s. Since basically completing my dissertation in cultural anthropology. Before that I had a degree in philosophy. But I should point out that I studied engineering for two years before I jumped ship as an undergraduate, which sort of gave me the tools to develop into a career as a data scientist. And
Allison Bigelow 4:28
I feel like my path is also circuitous. So I'm Allison Bigelow. I have a really long title right now. I'm the Tom Scully discovery chair, associate professor in the Department of Spanish, Italian and Portuguese. And I'm also affiliate faculty in Latin American Studies, women gender and sexuality and the equity center for the redress of historic and inequities.
Raf Alvarado 4:47
Yeah, you know, I think I should point out because, Allison to me, it's a very interesting figure intellectually, because you have a degree in English.
Allison Bigelow 4:56
Oh, yeah. So my undergraduate degrees are English and Spanish. Were My graduate degrees, master's in English and comparative literature, PhDs, English and comparative literature, postdoc at the Omohundro Institute of early American history and culture, and then got hired in a Spanish department as a colonial Latin American.
Raf Alvarado 5:15
And I think if your work and this ties into this, you know, Multepal as very much historical right, so she just wrote a book called mining language that one the American historical associations, I don't know what you call it book of the year
Allison Bigelow 5:30
was the Raleigh award in Atlantic world history.
Raf Alvarado 5:34
Okay, but historians, an award that usually reserved for historians, or as
Allison Bigelow 5:39
far as we know, it has never gone to someone, it will certainly it's never gone to someone in a Spanish department and that 20 something years of the award, and it doesn't seem to have ever gone to someone outside of history either.
Raf Alvarado 5:50
See, that's what I think is really interesting. And I think that actually hits on a point about our project, the Multepal project, because it's about language, but it's also about history. Right? And its its history through language.
Allison Bigelow 6:03
So should we say what Multipal is? Sure. Give it a shot. So well, rough coined the term so I feel like you should go Yeah.
Raf Alvarado 6:11
All right. So Multepal is a is a to two things, we have a project called the multepal project, which I think broadly speaking, would be characterized as a digital humanities project that is focused on indigenous Mesoamerican literature, and, and other sort of artifacts of writing and language. It could be more broadly conceived than that. But that's its main focus. And of course, we're focusing on the one narrative major narrative just survive the Spanish conquest, which was called the Popol Wuj. I don't know if you all know but the Spanish pretty much destroyed all the other texts that the Mayas had, there was something called the Alto de, Diego de Landa destroyed all the books I forget the exact date, but doesn't doesn't matter. Anyway, there are other things there that we want to digitize, but we're trying to part of its trying to decolonize digital humanities and focus on other traditions. The word Multepal actually comes from Maya. So Yucatec Maya word that that means common. I translate it as Commonwealth because if you look at the roots, mole and teapot, it's roughly the same as Commonwealth. But what it really means is historically, the Maya had a really interesting form of rulership, which is that they rotated rulership are among four polities. And I believe they did it in 20 year intervals using the calendar system, which has this sort of political period called the katoon, which is a 20 year period. And every 20 years it would rotate, which is an interesting form of rulership. Be interesting if Europe were organized that way. I mean, one year, England's in control the next year, or the next decade, France is or something like that. And so it's an image of shared rulership and shared power, which is how we envision how this project works, because we want to collaborate with people, not just the United States, of course, with the people who actually, as it were, own these texts and for whom these texts are not just, you know, research material, but but part of identity.
Allison Bigelow 8:11
So the project began in the spring of 2017, when we taught a graduate seminar focused on Latin American digital humanities, and this was designed to fill a particular gap at UVA, almost all of the digital humanities coursework, particularly at the graduate level, focused on Anglophone cultural productions and students in Spanish and Latin American Studies, were eager to get some sort of training. So I paired with Raf, and we developed this idea to teach a class that would focus on the Popol Wuh because of its historical and cultural importance to Mesoamerica. So the Popol Wuh is an ancient Maya kiche narrative that explains the origins of the universe and the emergence of the k'iche' people. So the narrative arc of the story begins from the moment of creation when Earth and Sky, which existed beforehand become separate domain. So unlike Christian accounts of creation, which a re x Nilo, you already have a world that existed, it was just still and existed without motion, or agency or movement. The gods come together as a group and decide to create the earth and to give it movement and potential. Then they go through a series of creating humans, they fail a couple times, and then they finally land on humans who are made of corn and that big those humans become the progenitors of the modern k'iche' people who then split into various political factions all throughout Mexico and Guatemala. So the story begins in the broadest possible cosmological terms and ends with very specific political lineages that terminate around 1554. So what we think happened based on iconographic evidence, and the way that the story unfolds is that The narrative circulated in oral forms in lowland Maya communities and Highland Maya communities in pictographic, and visual forms, and that much like the Chilam Balam of Mexico, each town probably had its own Popol Wuh. But because of the threat of Spanish invasion, because the Spaniards were destroying so many primary sources, around 1554, writers or scribes from three ruling lineages in the highlands, collaborated to write this one account. So the account the one Popol Wuh that we have reflects three particular political lineages. And they explained in the beginning that they're writing this down in alphabetic letters, because the old Popol Wuh can no longer be seen, which suggests either that it was written in hieroglyphics, which couldn't be read any minerals
Raf Alvarado 10:55
for which was lost during the collapse, or
Allison Bigelow 10:59
that they that this was a totally new world of Spanish colonial rule. And so the forms of wisdom that they could access in the past aren't available to them, and they have to find a new way of doing cliffs. Yes, yeah. And so they they write down the story using what we think of as an alphabet, with k'iche'in one column, and Spanish on the other. The text that they wrote sometime around 1554, is lost. But a colonial Friar made a copy of it in around 1701, or 1703. And that copy is the basis of the 1200 different editions that exist in 25 world languages, including 12 Mayan languages.
Raf Alvarado 11:39
The one thing I would add is when I, when you go back to the the narrative structure, what I like to think of is, and it's excuse the pun, but maybe it's an intent upon the kernel of the story is, so how do you get from this cosmology to the to the k'iche' , people? And you mentioned that, you know, the humans were invented a couple of times, they were made of wood that were made of dirt, and they weren't quite good enough, right? For the gods. And what at one point, they're so smart, that the gods say, you know, what, you're way too smart. And we're gonna blur your vision, because you're too smart. And anyway, the cool thing is real people emerge from corn, they eat corn. And so how does, it's really a story of how at the center of which, which you'll like, because you're writing a book on corn, of how corn is invented, and it has to do with sacrifice. So in the underworld, the hero twins have cut the cut to the chase here, they defeat death. So they go and they find the Lord of Death, and they trick the lord of death into and they kill the Lord. So they basically they killed death, which is a really interesting, they trick him into killing himself. Yeah, so they, you know, death is death is dead. So double negation, right, and so with through that act, and corn, I believe it's, it's a condition of possibility for corn. So it's really interesting at the at the kernel of it, you know, the center of it really is this narrative about the efficacy of sacrifice, and its relationship to corn, and also just shows you how important corn is to Mesoamerica. That goes back to I don't remember, I don't know if you're familiar with Kirchhoffs, distinctive features of Mesoamerica. And he has this like, it shows the extent of Mesoamerica into Central America up into Mexico. And there are these common features that these cultures share. And one of them obviously, is corn. It's fundamental,
Allison Bigelow 13:24
which is also why the book is so important for language preservation, because there's a lot of vocabulary that is not necessarily practiced in communities today, but are the k'iche' team and the Yucatec team are really excited about the possibility of revitalization or reintroduction of terms for specific agricultural techniques and planting practices for food ways. Yeah, there's a lot of cool stuff you can do with the corn vocabulary and knowledge in the book once you encode it in the ways that we're trying to do. So right now. And maybe we can talk I'll give an overview of what the what we're doing. And then maybe we can talk about how your earlier projects are influencing the work that we're doing now. But basically, in the course we have the students encode the one surviving manuscript from the colonial period, they used to I light schema, and we gave each student a certain number of folios to work on. And then after the seminar, we realized that there was a lot more work to be done. And that we have this amazing opportunity, instead of just focusing on creating a resource for scholars and something that university teachers might use in the classroom. Our work could really be extended to communities in ways that were very powerful. So in the communities where we're now working, particularly quiche, a speaking communities and Yucatec Maya speaking communities, there's a real question of limited access to printed materials. So it is often the case in these three gins of Mexico and Guatemala that there will be a major investment in printing books and grammars and resources for primary school students in secondary school students in native languages for the goal of language preservation. But after the first print run, the books are almost impossible to find. And a lot of times the books don't make their way physical books don't make their way into the communities where the knowledge than the language was generated and needs to be used. However, cellphone access is extraordinarily high in the regions where we work. So the figures are about 80% of citizens over the age of six in Yucatan have a cell phone. And in Guatemala, there are more cell phones than there are people and not all of those are smartphones. And not everyone has the same degree of access. But in collaboration with our partners at Universidad Rafael Landívar (Guatemala) and Universidad de Oriente (Yucatán), we learned that digital resources would have much wider penetration in the communities then printed materials would be so with that idea, we began to kind of reframe our work, which was really focused on an intervention of in colonial archives into something that could be used as a tool for youth language learning and language preservation in two languages that score moderate risk on the scale of endangered languages, according to Ethnologue. Yeah, so
Raf Alvarado 16:34
the role that digitization plays in this whole thing is really interesting, because in order to accomplish those goals, and in order to pivot the way that we did, so we, you know, moved from you're saying an intervention in the archive, you know, our original task was, okay, let's let's encode this text using an XML standard called TE I, which stands for text encoding initiative. And just use this as a way to, to represent the text and encode our interpretive decisions. And we pivoted to let's let's share this text through these through these media, these other forms of media that are that are digital and electronic, and so forth. The way that's possible, it's actually it's not trivial. Like if you digitize something in according to certain techniques and methods, you might not be able to do that pivot. Like, for example, if you put all of your efforts in digitizing a text using a CD ROM, and using proprietary standards, and you're focusing on the images, and as opposed to the transcription of the text, it's very hard to share that. In fact, that's the state of affairs that we encountered when we started working on this project that other digital projects laudable as they were in a scholarly as they were in terms of the content, in terms of form, we're just not really usable. And so the digital humanities contribution is really to think about the project of text encoding, as a form of data representation that is authentic to the to the source, and independent of any particular use so that you can use it for all these different purposes. And that's really what the text encoding initiative is all about. There were actual sort of intellectual battles fought in the 1990s, about the way tech should be encoded. And with these arguments that it needs to be device independent, it needs to be independent of outcome, so that it can be used for all these other things later. And it turns out to be true and really helpful in our case. So we encoded this using standards, but also a lot of methods that I developed over the years working on other projects. One project in particular that I worked on, was something called the charrette project, which involved in coding of Old French manuscript of one of the Romans. One of the a theory and romances is the one of Lancelot in the cart. Anyways, I won't go into that particular story, but the the the work of encoding that text, and doing this overlay of marking up all of the poetic figures in the text, kind of taught me how to structure that kind of markup to capture both the text content, and the semantic or interpretive content that the text sort of bears. And so in our case, what we've done is we've taken the text and we've marked it up, we collectively took every single manuscript page, and looked at every single word down to the character weird, anomalous characters, coffee stains, if you will, on the manuscript pages, dealing with all that Pele, graphic stuff, but also with the goal of going through every single passage and finding elements of proper names, if you will, that represent Mayan things, you know, dieties, toponyms, place names, names of lineages, names of substances, like blood or rubber, which play an important role in the story. And marking up the text that way is a big part of what we did. And we're able to do it because of this particular method of using markup. And so really what we have I guess I should branch off into sort of the architecture of the page Subject sort of, you know, from a data science perspective. So you've got this text, and it's marked up. And a text is basically a tree structure. In mathematical terms, it's what's called a directed acyclic graph. So it's just basically a tree with a whole bunch of branches. And then when you mark that up, you can take all those elements in that tree and put them into a database, which has a more tabular structure. It's kind of like an encyclopedia where every entry is like, Oh, here's one up poo, or here's, you know, and each thing, each one of these things has an entry. And then we've created a map between the text and the MS database. And we've had, I don't know how many people we've had working on this. But these things, the things in the database we call damos, are topics that basically point to the instances in the text of where all these things appear. And over time, what we've done is, is basically extracted, you know, kind of worldview from the text or the the realm of things that are talked about, you know, as opposed to how they're talked about in a narrative sequence, you know. Yeah. And so that's really the sort of architectural, you know, structure of the site as it is. Now, I should also point out, we have annotations.
Allison Bigelow 21:15
So I feel like Raf always gives the really high level technical definition, and then I'll do the multiple for dummies. So this means that if you're looking at our website, and you are a student trying who is assigned to read the Popol Wuh, most of our traffic comes from US, Mexico and Guatemala, primarily between Tuesdays and Thursdays, which suggests that it's being used by students who are in classes, like no one's going to our website on Saturday, you basically go, you see two columns, one column is the text marked up in k'iche'. And next to it is the text marked in Spanish. And it's really, this is how it's written on the original manuscript. But it also allows you to compare the to the k'iche' original texts in the Spanish translation to spot moments where the colonial friar who copied the manuscript may have intervened in it, and it allows you to see how names change between the two, you can click on any term that appears in red. And that will take you to our encyclopedia, and you'll get it first you get a brief record. So if all you need is a quick description that says, This is what this animal is, then you're good. If you are still confused, after you read the brief record, you can click Full Record. And that takes you to our extensive encyclopedia page, which has a much larger description of what this particular animal is includes oftentimes, images, links to secondary sources, and links to related topics so that you can see, okay, this is how this particular snake so tall relates to this kind of snake. There are three different forms for the word snake, which is an example of what we're trying to do with language preservation. Some of the terms are like in Yucatec, Maya, there's really only one word used now Khan. But by showing how the Yucatec and the key che relate, we can generate tools for researchers and for teachers who are trying to study things like language contact and language change over time. So that's sort of what the user sees when they go to our site. They don't necessarily see the architecture, but the architecture they can't Yeah, what happens?
Raf Alvarado 23:24
Well, yeah, let's let's pull back on that little bit. So that's really important, because there is a, there are two sort of ways that people can access the information, right. One is there is an interactive HTML viewer, which is designed for people to use. And that's what you're describing and sort of pulls everything together. But we're, but it's really important that we're also creating an open source and open access project. So all the raw materials are available to users to if they want to, like dive into it and look at the raw code. The thing is that stuff in keeping with sort of Digital Humanities principles and scholarly principles, that stuff's not meant to be sort of hidden, like, Oh, don't look at that. It's actually if you're interested, you can, you can look at it not only that you can take and use it for your own purposes, right? That TI and you can do other things with it, for example, you know, if we develop a collection of enough texts, we might be able to do a comparative study of narrative sequences within texts. Like you could look at the distribution of certain entities in the text and plot those over time or narrative time. There's just a lot of cool things you can do with with the data. But it's definitely designed for, you know, maybe a second level of access or, you know, not every every reader.
Allison Bigelow 24:38
But because our project has changed the way that we think about our primary user. This does influence a little bit our approach to markup. So yesterday, I was working with one of the students, Rob said, How many people have worked on the text. We had nine students from the seminar, seven graduates and two undergraduates. We've also had two students who work on the project as part of the Yusor program, which is a program here at UVA that matches. work study students with faculty research, mentors and train students to get their hands wet and see what research is like. So we were stuck because the Yucatec edition of the Popol Wuh by fidencio pretty senior child represents the artisan brothers as two separate characters, but they're facing Spanish language translation has it as just one. And this kind of makes sense because in k'i'che' , a and Yucatec the two names of the characters are Hoon bots, and is the lowland Maya version of the the word monkey. And bots is the Highland Maya version. So if you know both Mayan languages, it's, they're called the same thing. And so in Spanish, they just call him one mono, one monkey. But for markup, this creates a lot of challenges. Because we want to show that in the Yucatec column, they're following the same naming pattern that you find in the k'iche'. In the Spanish, they're making a different interpretive decision about what to name this character. And they then make all of the nouns like my older brother into goon, they make it singular, or they say, me or mono my yard. The Castilian doesn't, right. And so we have to encode a singular noun as a plural, because it's technically pointing to both of the characters. And the reason why we have to do that is because some of our users are probably going to want to download the data set to be able to see how many times are these brothers referenced in the text and to be able to count, you know, when does unbots up here? And when does when, when, when do they appear as a payer. And so it the fact that we're trying to work with these two different audiences who have two different needs and interests and working with the texts, means that we spend a lot of time discussing, you know, very specific examples of how someone is named or not named in the text. Like each Keke, the mother of the hero twins who are at the center of this story is often referred to as Kob, which is like maiden. And in Spanish, it's done seller. And so we had a long debate at the beginning about whether we should tag those references with her name, because she's not technically named by the author's, but they're referencing her. And so if we don't tag her by this net, if we only use proper nouns, then we're missing a lot of the female characters in the text. Yeah, so we decided to mark up all of the don't say yes. And as pronouns, yep. Yeah, no, that's really pronouns and nouns. Yes, that's really this, I
Raf Alvarado 27:44
think, raises a really interesting point about this sort of project. And actually, its relationship to data science, because so one of the really important concepts in data science that I'm really keen on surfacing and making sure that it's out there and understood that it's part of what we do, is, it's not just about taking data, there's this expression in data science called tidy data. And you have data, I call it cleanroom data, we've got your swivel chair, and you're working, it's all nice and ready to go. And you're doing all your statistics, but you don't really concern yourself with the actual production of data. And in reality, data is really messy. And it's, it's created by humans, and it's full of all kinds of decisions. And it's super important for for anyone in the pipeline working with data to know what those decisions are, and how sort of like lack of a better phrase how the sausage is made, right?
Allison Bigelow 28:31
And this is why our friends, the economists call it massaging the data. And then when they escalate, they call it waterboarding. , you should be married to an economist
Raf Alvarado 28:40
What you're doing is really the whole the thing that that digital humanists have been sort of, into from the beginning, which is, when you encode data, like you have a, you have a facsimile of a manuscript. And then you have this standard of how you encode it according to a sequence of characters that can be understood by a computer, you are making all kinds of decisions about that content. And it's not just like, you know, bookkeeping decisions. These are intellectual decisions. They make you think about, well, what is a now what is its really what is the relationship between a plural and singular, and you end up theorizing, I call this the rationalization effect, where you start to come up with theories to explain things that were normally tacit or unspoken, normally, when you just leave media as it is, but when you actually do remediation, you end up in this really interesting space where you're theorizing and your data is a representation of that theorisation .And each choice that you've made a choice and so that's your data. And so someone downstream gets this TTI marked up text, they are getting an artifact of this really deep and profound intellectual and scholarly process. And so part of what one has to do, it's more than just what we call a data dictionary. That you know, these these these encoded texts are also associated with commentary about the editorial process and this stuff goes into the header. But also in documents, like you wrote for your your guide to the encoders. On what to do basically represents your understanding of what this text is. And what my in textuality is, to some degree in my own language.
Allison Bigelow 30:18
But yeah, because because our team has so many students, and now because we have eight collaborators working in Mexico, for Yucatec, Maya and for are non native, and then we have five Highland Maya scholars, and the Guatemalan team who are from k'iche', a k'iche'. And so he'll communities it, I realized that we had never actually written down our theory of the text, we just called it. We had never written down on coding guidelines in a way that would guide people step by step. So that was what I was working on this week until my kid got sent home from daycare, or childcare.
Raf Alvarado 30:54
It's really It's fascinating, that document and it's really like I was telling you yesterday, I think it's the it's one of the fruits of our labor, right, this collective labor that goes in, it's this understanding of the text. And the point is, you know, it's a false separation to think of like technical and scholarly stuff, you know, this stuff happens, it's interwoven. And this is part and parcel of what I think data sciences, it's not just, you know, doing your analytics, or that sort of stuff, that's obviously super important. But it's in the production of data and in the representation of the world in data. And that is to speak to the, its relationship to what we in the School of data science called the Four Plus One model. This is very much, obviously, it's in the space of value to a great degree, because it's about it's about, you know, involving indigenous communities in the in the, in the production of these texts, and in the end, and the results and so forth. But it's all involves data sovereignty issues that we should probably talk about. But it's also in the space of what we call design, which is all about how the world is represented in data, which is not a trivial process, which is a human process. And that's what we're doing, basically, I think it's just, it's fascinating to me, how much how much is being produced in that in that in that work,
Allison Bigelow 32:17
you know, so should we talk about what the Yucatecan the K'iche' teams are producing and data sovereignty? Yeah, absolutely. Okay. So basically, the way that we conceived of the current iteration of the multiple project was we reached out to scholars we knew from Native communities who work extensively in their communities, and said, we do this thing with the Popol Wuh, is there any way that this would be helpful for your scholarship, your research, or your community engagement and your teaching? And each pi said, yes, we could, we can find a way to use that. And so each team developed its own separate project. So in the K'iche' team based in Antigua is producing an extremely scholarly digital critical edition of the Popol Wuh in k'iche', based on the original surviving manuscript, but with modern orthography. correcting things like punctuation, word breaks, they're finding a lot of morphosyntactic change. They're finding really interesting linguistic elements in the text. And they're explaining all of their decisions about where they deviate from the manuscript or where they update things in footnotes that are in Spanish and k'iche' so that teachers in Guatemala's national program of bilingual and intercultural education can use that material to explain linguistic concepts to students in the classroom. So what they're producing is extremely detailed painstaking work of going line by line, word by word character by character through the original, the 56 folios of the original manuscript. To produce something that's really designed for classroom use.
The Yucatec team identified a totally separate need for their students. They said, you know, serving the communities here, we find that there is a decent amount of printed materials. There are also things for kids like memory games that are in Yucatec Maya to build vocabulary. So we sort of have a set amount of physical materials, what we really need is cool stuff that kids can watch on their phones to engage with the language. So they are making a series of five five minute videos that are animated using a combination of classic Maya iconography and like anime style superheroes. They are they have a team of student illustrators who are painting all of the background scenes, developing the characters, and then putting everything into motion with music and their own original soundscapes. The audio is entirely in Yucatec Maya, but you can put the subtitles on in Yucatan Maya, Spanish or English, so that students can either practice systemic phonics to hear Maya and read it at the same time, or hear the Maya and have the Spanish translation so that they can build comprehension with spoken Maya, or practice their English at the same time, particularly if they're more comfortable in Mayan than they are in English. So what they're developing as a resource for youth language, learning and building excitement, but also translation for primarily, like sixth grade through 12th grade students. And so each team serving their own language learning needs of their people has developed a radically different project. And then our challenge here is to design the architecture, to be able to host these two projects and, and create a platform for them to be shared with students and teachers and faculty.
Raf Alvarado 37:19
Right, and also just in keeping with our dialectic of abstract and concrete. One of the challenges of the grant and our project in general, we have all these variants versions of the Popol Wuh, and of course there are many that already exist. There are hundreds that already exist but there are a hald full of ones out there that are sort of more popular. But in any case, there's a whole variety of variants of the Popol Wuh and we're sponsoring the development of others. ANd we'd like to incorporate those into a single system that somtetimes called a muli-text, sometimes called a variorum. And the idea is that it should in principal be possible, because so many of them are based on the original transcribed oral narrative Franciscan Frier, Dominican excuse me, whose text is the basis of so many translations. We would like to be able to map every text to its geneology coming from that orginal text or coming from something else if it comes from there. So you can look at a passage in the newly emerging versions of the K'iche' digital critical collection and see how it is connected to the Jimenez manuscrip then how that's connected to Christianson's version. And that's all afforded by the fact that we're using a common mark up schema. A markup schema is just a data representation of text. And so once you have something in that data rep there are ways of connecting things in very interesting ways and then you can visualize the stuff later, or you can traverse the network of textual content And then also produce which are these very interesting versions. And then also produce the things you re talking about which are interesting versions of the text that people can explore and read o their own
Allison Bigelow 38:04
The only thing I can think of that we mentioned that we haven't talked about are annotations and data sovereignty.
Raf Alvarado 38:09
Yeah. So I'll mention something about annotations. So the annotations are, that's an annotation, just as a side note is a very traditional way of reading a text. That goes back to the ancient times. It's a biblical scholarship, classical scholarship, where you're reading a text, and the text is considered very important. And so you want to comment on that text. There are volumes, for example, have commentary on Dante's Inferno, for example. And we've done the same thing with with this text, where we have our students and other scholars who read the texts and say, Oh, that's interesting passage here. So what we what we've done is created a tool where people can write sort of mini essays and tell and specify exactly where they point to in the text. And so when you look at the text, or the interactive viewer, you can see oh, there's a commentary on this passage, let me go check it out, you click on the link, and you get to the commentary. And that's an addition to the sort of the encyclopedia that we're talking about. And that's really cool. So we have a lot of interesting work there. And this work, to me, it's fascinating, because it's like, these are short little pieces. But some of them, many of them have, you know, a really deep scholarly contributions to the field and could be publications in their own right, especially if we pull them out and sort of grouped them together. So that's the annotation component. And again, that's part of the multi text, you know, it's all you know, annotation points to an element in the text, that element in the text may be a translation from some other element and a primary text. And so it forms this this network of, of lexical units, if you will, say something about data sovereignty.
Allison Bigelow 39:51
Sure. So the principle of indigenous data sovereignty really grows out of what information science studies scholar MIT said two articles. First World Indigenous communities so US, Canada, Australia. It maps someone uneasily onto Latin American indigenous communities and their concerns. But the principle I think, remains the same. So the basic principle as it's articulated by groups like the Maori in what is now in New Zealand, and the Native Nations Institute at the University of Arizona, is that, for since 1492, data and information and knowledge about indigenous communities has been extracted from them. An extremely important exploitative ways, has not been used to benefit the community, but as instead been used to make others wealthy or famous for their publications, or it's been used as a form of social and professional currency. And so the theory of data sovereignty is that indigenous communities should get to control who knows what about them, and how they use their own data. So for our project, that means that each community that develops its own approach to the Popol Wuh, its own particular product, whether it's a text or it's a video, gets to control how it circulates in the world, what it looks like, we don't make content or editorial decisions about what each team does. The teams disagree to at team meetings, for example, the K'iche' thought the Yucatec team was a little too modern in its interpretation of the Popol Wuh. But out of respect for the intellectual and data sovereignty of the Yucatec team, the K'iche' team makes its comment shares the feedback, and then lets the team proceed as it wants, which is the same thing that we do. We weigh in as colleagues, but we don't own the data in any way.
Raf Alvarado 41:54
Yeah, and it's really, it's really interesting issue because there is an ethos in I guess, data science shares, just with the sort of techie worldview that that, you know, it's been with us since the web became popular. It's this view. And we talked about open source and open access, there's this view that you often hear called information wants to be free. And it's, it's the view that motivates a Wikipedia, the Internet Archive and a lot of other efforts just to get information out there. For example, the internet archives mission is to get everything, anything of cultural value historical value out on the web, so that everybody can access it. And they'll even go into the very aggressive about this leaving going to museums and take pictures of things that they're not supposed to take pictures of. And and if they still do that, they you know, cut corners on copyright. And it's all in this sort of, you know, this value, that information wants to be free, however, that runs into, you know, severe conflict when you're looking at, for example, an Australian group who uses tjuringa to basically legitimating land, right claims to waterfalls, and things like that. And there are these these artifacts, which encode stories that are really cool, you can look at them as art. And in fact, we have here, the Klugey Art Museum in Charlottesville, Margot Smith has set up, you know, she did fulfill, they're gonna Australia, there's all this, you know, Australian art. But the issue there is a lot of that stuff isn't quote, unquote, art, it's, it's, it's basically a proprietary information. And it's what people use to legitimate, like I was saying, just now claims and so forth, and it's not meant to be shared. And then that's just one example. And so they want to basically own that information and be in control of who gets to see it, what you learn about it, it could be lore, for example, that's meant to be not even to the objects themselves, but the knowledge of how to decode them as it were. It's not meant to be shared with the world. And so those issues are, you know, front and center in projects like this, we are just a little bit different, because the primary object was surface so long ago, I mean, and it was, it was an extraction in itself, right, the Popul Wuh that we have the manuscript was transcribed from an what we think is an oral narrative from the keychain Maya, by a Christian priest in order to convert them. So it was a part of a project to learn about indigenous beliefs, correct in ideas of divinity, and so forth, and to take these ideas and use them as a language both to learn about the indigenous, but also to translate into Christian ideas to translate Christian ideas into and so it's part of this project already. And it's already out there. And so it's problematized by that, and then you find I think, is it true that in the Guatemalan case, do you feel like there's more of a sense of ownership of the narrative because it is Guatemala On, and it is tied to Guatemala nationalism these days,
Allison Bigelow 45:04
I would say the k'iche' team has a more overall has a more traditional approach to understanding the text. But it's also hard to know how much of that is just the team leader versus the whole team. Because we don't attend each of their team meetings, we only hear his updates. Right. So it's entirely possible that like, Aj Xol, who's the poet on the team has a different interpretation than Ajpub. And my guess is that, my guess is that they all the of the five, there are probably five different interpretations.
Raf Alvarado 45:37
To me. That's a really interesting point, too. I mean, you've probably noticed to this in terms of how translations differ. There's one thing I noticed, like translators decide which metaphors they keep and which ones they bury, you know, like, like, for example, sitting on the mat is a metaphor for rulership. And so some translators will say, if there's a passage where that you know, somebody sat on the mat, they'll say, somebody sat on the mat and have a footnote that says, sitting on mat, by the way, means sending to power, others will say, the he ascended to power and they'll just allied the metaphor. And so anybody downstream reading the text won't get what I call the metaphoric or the imagery that sort of animates the the idea. And I don't think anybody's consistent about which things they gloss in which things they treat as concrete. But when you do the work that you're doing, you you have to be concrete the whole time. Because you are you're marking up the text, and you're marking up the particular language that's used to signify something that might be signified otherwise more abstractly in the translation. And I think that's super important to understanding what the text says.
Allison Bigelow 46:43
And then people who use the text map that you're building can see all of those variant translations and visions that others have made.
Raf Alvarado 46:50
Yeah, so in some way, our culture is a collection of metaphors. And that's kind of what we're documenting here