Resurrecting Dead Languages With AI, Machine Learning

Algorithms and Dead Languages

Here is your fun fact for the day – Napoleon actually broke the Rosetta Stone. Go figure. In a way, it’s a great metaphor. The Rosetta Stone has been an incredible tool for translating multiple languages in the centuries since its discovery, proving itself a valuable aid in helping put back the pieces of many languages that tend to get broken and lost over time. The value though is not merely in being able to translate ancient languages, it’s in all the history that comes with being able to read ancient texts for the first time. Suddenly a whole perspective on historical events opens up, or knowledge of things we could never have known about otherwise is unlocked. Putting an ancient language back together doesn’t just open up words, it opens up literal worlds.

Now, the geniuses over at MIT have come up with another tool that we can use to unlock a few more. A new system has been developed by the Computer Science and Artificial Intelligence Laboratory (CSAIL) that can actually decipher lost languages. Best of all, it doesn’t need extensive knowledge of how it compares with already known languages to crack the code. The program can actually figure out on its own how different languages relate to one another.

So, how does that wizardry work? One of the chief insights that make CSAIL’s program possible is the recognition of certain patterns. One of these is that languages only develop in certain ways. Spellings can change in some ways, but not others due to how different certain letters sound. Based on this and other insights, it was possible to develop an algorithm that can pick out a variety of correlations.

Of course, such a thing has to be tested before it can be trusted. If you don’t test your language detector, you get bad languages. That’s probably how the whole “Aztecs said the end of the world would be in 2012” thing started. One intern with a bad translator program took it from, “And then I decided I could stop chiseling the years now. I’m a few centuries ahead,” to “the earth will stop completely rotating in 2012”. Fortunately, the researchers at MIT were a bit brighter than that. They took their program and tested it against several known languages, correctly pointing out the relationships between them and putting them in the proper language families. They are also looking to supplement their work with historical context to help determine the meaning of completely unfamiliar words, similar to what most people do when they come across a word they don’t know. They look at the entire sentence and try to figure out the meaning from the surrounding context.

Led by Professor Regina Barzilay, the CSAIL team has developed an incredibly useful tool to help us understand not just the events of times gone by, but the way people thought back then. By better understanding the languages of the past, we can learn why people did what they did. We could gain valuable insight into cultures long dead to us. That knowledge will in turn help us to better understand our past and how we got to where we are. It gets us more information, information straight from the source, or at least closer to it. If TARTLE likes anything in the world, it’s getting information straight from the source.

After all, that’s what we preach day in and day out around here. Getting our information from the source, minimizing false assumptions and bias when it comes to analyzing information. It’s great to see that same spirit at work in one of the world’s premier research centers and to see it being applied to our past.

What’s your data worth?

Algorithms and Dead Languages

Made with ❤ in

New Mexico