Discover how UVA professors Allison Bigelow and Rafael Alvarado are partnering with Mayan scholars and communities to preserve and revitalize K’iche and Yukatek languages. Through innovative digital tools, including text encoding and hand-drawn animations, their project bridges a centuries-old narrative with modern technology. Learn about the principles of data sovereignty, the challenges of cultural preservation, and how community-led efforts are shaping the future of Mesoamerican language and literature.
Digitalizing the Popol Wuj
Digitalizing the Popol Wuj
Emily Mellen 0:07
Welcome to Global Research Bytes. I'm Emily Mellen, and I'm here with Alison Bigelow, Professor of Spanish in the College of Arts and Sciences, and Rafael Alvarado, Professor of data science in the School of Data Science. Congratulations on your recent CGII Center Grant. Could you tell us about the background of the project?
Allison Bigelow 0:24
Yeah, of course. And thank you so much for having us. It's always fun to get to talk about the work that we're doing and the ways that UVA and the wider community have helped to support us.
The Popol Wuj is the longest and most complete pre-1492 Indigenous narrative to survive the Spanish conquest and the invasion. The oldest copy that we know, the oldest version of it-- because it circulated, probably in visual forms, as an oral narrative, and in hybrid forms-- was written down around 1552 to 1556. That document is now lost, but there was a copy made of it in 1701 by a friar who, ironically, sought to eradicate all of the cultural norms that he recorded on paper. So, the one text that we have is actually deeply unreliable and suspicious, and yet it is the only one that we have. Based on that one manuscript copy from the early 18th century, some 1200 editions of the text have been published in 30 world languages, including at least 10 or 12 Mayan languages.
So, there are, as you might imagine, a range of interpretations. Everyone reads the text in different ways, and there's no one definitive way to read it. This represents really interesting opportunities for scholars and for linguists, historians, literary scholars, anthropologists and especially for native speakers, because it means that everyone has a chance to weigh in on what they think the text is saying.
So, our project essentially uses text encoding, so we take the manuscript, and we have marked it up so that if you are reading our edition online, and you get to the name of a god or a town or a material cultural artifact, like a particular stone or metal, color, a plant resin, and you don't know what it is, you can click on it, and the description will pop up in a different window, usually in English and in Spanish, with links to images and related themes from the text and a list of secondary sources. And so the idea is that we're building this database of 2500 Mesoamerican cultural topics that help readers understand the text as they read it, and eventually they'll be able to compare across editions.
Emily Mellen 2:49
And in addition to being in collaboration with each other and utilizing each other's talents, you're also really grounded in the communities that you're working with. Could you share why it's important to have these community members leading the creation of digital tools and how this approach impacts that project's outcomes?
Rafael Alvarado 3:06
Well, I could speak to the two parts of this. One of them is the authenticity of just having the actual people whose language it is inform the interpretation and the encoding of text. And let Allison speak to that. But there's also just a very general principle of what we call data sovereignty, which is that this idea that you go into a community and say --“Hey, we're going to encode all your texts. We're going to take this stuff, and we will kind of talk to you afterwards, or maybe include you at some point in the footnote.” -- is not a practice we want to condone. We think that that reproduces colonial structures, and we want to make sure that the information flow is bidirectional, if you will, and to actually include the people who actually own these texts in the process of encoding them and sharing them on the web as a resource. And so, there's a really strong, you know, emphasis there on making sure the process of what we call remediation doesn't reproduce these structures of extraction.
Allison Bigelow 4:06
Those tools are fine if we write the content based on published scholarly sources, but we are not the best positioned people to say what the text is actually trying to do, to understand the larger meaning, because we don't... We're not native speakers of K’iche or Yukatek, and we don't come from those communities. And so, the people who are best positioned to be able to interpret this very fragmented, suspicious document that we have are the people who are living in community today. So, that's why our team is-- we have three teams on our project. We run the UVA team, but we also have a Guatemala team that's led by Ajpub’ Pablo García Ixmatá, who's a Maya Tz’utujil scholar, and a Yukatek team led by Irma Yolanda Pomol Cahum, who's Yukatek, and Miguel Óscar Chan Dzul, who is also Yukatek, and each of the teams is creating a different version of the Popol Wuj to respond to the unique language learning needs of their own community.
So, in the Guatemala team, they noticed that teachers in Guatemala's national program of bilingual and intercultural education didn't have good, authentic, accessible editions of the Popol Wuj who would use in the classroom the most, the easiest addition for them to get is a Spanish version that actually erases a lot of the indigenous elements from the text, from its phonetics, from its cultural systems. So that's what they wanted to do.
The Yukatek team had felt that there was a good, reliable addition in Yukatek, but they didn't have anything that was really fun that kids would want to use to learn the language. So, they are making a series of animated videos. They're like five-minute digital shorts. Everything is drawn by hand by students at La Universidad de Oriente, which is a Maya-serving institution in Valladolid, in Mexico. And so that's why our project outcomes are sort of inseparable from the community members who are leading them, because they are scholars who are from the community, who have surveyed what their people have and don't have, and then I developed a specific project to address what they don't have.
Emily Mellen 6:23
You were just mentioning the use of this in schools. What is the importance of sharing this narrative and revitalizing it for these younger generations, specifically through digital resources?
Allison Bigelow 6:35
Yeah, so digital resources represent a really important avenue for sharing knowledge and for getting kids using the language, right? Language depends upon a large body of living speakers. I don't know if you've read the Department of the Interior's report about the Indian boarding schools in the US, but one of the it was just published a couple months ago. One of the major conclusions was that if you are looking to enact a program of cultural genocide, the most efficient way for a state to do that is by separating children from their families, because then the language dies, and then they lose those connections to their heritage and through their culture, to ways of understanding the world and to the tools that language gives you to make sense of things.
So, the two language communities that we're working with, K’iche and Yukatek. K’iche has about a million native speakers in Guatemala and in the diaspora. Yukatek has about 700,000. But, both of them face a lot of pressure from languages like Spanish and English. They don't always have good documentation on the language. There might be an edition of a dictionary that is really credible and reliable and wonderfully produced, but it's printed on paper and it lives in a government office and is not in a school room where kids can use those resources. And it is highly unlikely that the regions where we work are going to have cable and broadband, but almost everyone has a cell phone.
So, in Guatemala, there are more cell phones than there are citizens. In Yucatan, the rate is about 80% of citizens over the age of six who have a phone, a cell phone. So, mobile communication of information is actually a lot easier than print-based dissemination of texts and of literature and narrative and history that connects the next generation of leaders and scholars to the traditional knowledge that has guided their people since time immemorial. So, engaging with young people through digital resources is really critical for the long-term language preservation and revitalization.
Emily Mellen 9:06
Looking ahead, what do you hope the long-term impact will be of these digital tools on the preservation and revitalization of Mayan languages, these Mayan languages?
Rafael Alvarado 9:17
So, to answer that question, that's really great question. It's helpful to really understand, I think it's helpful to think about what we mean when we're talking about the technology and what supports all the kind of activity we're doing. So, we're talking about digitization and converting it into a digital medium, but our technology choices are different than what have come before.
So, for example, there have been other efforts to digitize the Popol Vuh and resources, but they're oftentimes using tools that are not open source or not necessarily developed for scholarly purposes. They're meant for other sorts of things, like putting something in a PDF. You know, PDFs are designed for just printing. And so, our technologies that we use from the beginning would support all these activities. It has two qualities that I would point out and then show how these sort of help in the future.
One is that they make these texts available in a very open way. So, you don't have to go to a library, you don't have to have a special tool to read the text. It's available online. And as Allison was saying, it's something you can read on your phone. And so, it's widely distributed. Basically, if something's not on the web going forward, and as we go even further into the future and look toward large language models. If it's not in that space, it's probably not going to exist in a certain real sense. These things become esoteric, and so putting it in that space is really helpful.
The other thing is that the encoding model that we use is really generous and adequate, I would say, so like we were saying we don't own the interpretation of a text, and so the technology that we use this kind of markup language allows you to explicitly encode your assumptions about the text, where you say we think this is actually a paragraph break, or we think this is actually a phrase that refers to a deity, and if someone else has a disagreement, that can be encoded as well. And so, we have a very open model, if you will, for how you actually encode the text, and that kind of future-proofs it and makes it available to more people and to future changes and amendments.
When you look at the future of this, one thing that we have our eyes on is expanding this model to other texts and coming up with a corpus of Mesoamerican literature in general, and making sure that the body of literature such it is as it is, or other things besides the Popol Vuh, but they're not quite as extensive that exists colonial texts and so forth, we'd like to encode those as well in the same open model.
And speaking of large language models, we are increasingly becoming mediated by the use of large language models for better or for worse, for all kinds of things, for information retrieval. And it's a challenge for what we call “low resource” languages to participate in this new sphere of data, if you will. And that's one thing that I think we need to be paying attention to, is making sure that these languages do end up being part of whatever resources are out there that make use of language, and which have impacts on people's lives and decision making and representation. So, we're thinking in terms of that as well.
Emily Mellen 12:16
Absolutely. Thank you so much for talking with me today, and I look forward to hearing more about how your project continues.
Rafael Alvarado 12:23
Thank you for having us.
Allison Bigelow 12:23
Thank you! *indistinguishable*