Machine learning deciphers the world's oldest texts
AI reveals the hidden stories of Cuneiform texts.
When most of us think about artificial intelligence, something futuristic probably springs to mind. Self-driving cars. Voice-enabled virtual assistants, for example.
But sometimes, to understand the future, it helps to look back.
In this case, way back.
That's what UCLA researcher and University of Toronto PhD candidate Émilie Pagé-Perron is doing. She's coordinating a program called "Machine Translation and Automated Analysis of Cuneiform Languages."
Cuneiform is the oldest known writing system in the world. It's more than 5,000 years old. It evolved in ancient Mesopotamia, where modern-day Iraq is now.
For three thousand years, cuneiform texts, which are symbols carved into clay tablets, documented life in Babylonia and Assyria, where, among numerous other inventions, the wheel, maps, and the 60-minute hour were first conceived.
The Epic of Gilgamesh was originally written in Cuneiform.
But as the civilization declined, cuneiform vanished, along with the Sumerian and Akkadian languages.
Now, most of the cuneiform tablets remain untranslated. Pagé-Perron wants to address that, and so she's turned to AI. Using the small number of existing translations as a guide, her goal is to develop an algorithm and techniques that scholars—and, indeed, anyone—could use to translate and interpret the hundreds of thousands of texts that remain untranslated.
Nora Young: So you got this huge number of texts. How big a job would it be to have human beings actually go and translate these texts?
Émilie Pagé-Perron: I'm not sure if it's possible, actually, because there are not many people who actually understand it as of right now.
NY: So you train the AI on this existing corpus of data and then the idea is that it goes out and tries to find pattern matches in the remaining sets?
EPP: Absolutely. We're using two different methods, so we are training our algorithms on a specific set that we've created manually, but we're also using methods that don't require training. We're using both and we're trying to find the best methods in both camps. And at the end of the project we hope to merge them into a pipeline that will render the best machine translation results possible.
NY: Aren't there some inherent problems in translating the language of a culture that's so remote from us in terms of time? This is thousands and thousands of years ago, and a very different society. Are there challenges in that respect?
EPP: Yes, absolutely. We can't go further than what we know already about those texts, of course.
NY: One of the things that we're seeing AI used more and more for is trying to assess or process very large amounts of data in new and interesting ways. How broad do you think the impact of this kind of technology is not just on your specific project, but any field of study that requires looking at these huge datasets?
EPP: For us one of the most important aspects is that we'll be able to treat the data statistically. We have a lot of studies—tons of books and articles written about research on one specific city, and one specific corpus.
But what will be really interesting then is to be able to compare all of those together using these large amounts of data, that will also be linguistically annotated. So then we have even more rich data to work with.