Preserving Language with Machine Learning
Everyone talks about different species going extinct. And for good reason. Anytime a species goes extinct, something unique and unrepeatable has been lost. While most don’t stop to think of it, the same is true for languages.
Over the course of human history, a great many languages have been lost to the sands of time. When a language disappears, it takes away more than just a few words or sounds, it takes with it a way of thinking, of seeing the world and expressing thoughts about it. When a language is lost, we lose the most important tool for understanding a culture. In fact, you could say that when a language dies, a culture dies with it. That’s because every culture has certain concepts or ways of putting thoughts together that are simply lacking in others. Just as an example, German has a feature that lets people string multiple words together to create one new word that represents a new concept.
There are numerous old fishing villages in Ireland. In many ways, these villages are the last vestiges of the old Irish language. Not only are there still those who speak the language of their ancestors, there are words and concepts used that are unique to each village, words for the different waves and for different tools that might not exist anywhere else.
There are a lot of different ways languages are lost. Sometimes, a language evolves so much that it becomes an entirely new one for all intents and purposes. Just try to read a copy of Beowulf in the original Old English. In the past, it was not unheard of for conquering power to outlaw the language of their defeated enemy in order to destroy their culture and assimilate them into that of the victor. Other times, the loss of language is a function of trade. As an upstart company or industry to move in, people will adopt the language that opens up the most economic opportunity. Coupled with the fact that shifting economics can do away with the need for certain concepts, it’s easy to understand how people might unconsciously let the old words and concepts disappear.
What can be done to preserve languages that are on the verge of being lost? There must be something behind finding the couple of villages that are still speaking Gaelic and putting them in an isolated biosphere. What if we could actually use machine learning to help us preserve at-risk languages?
These old words can be collected into databases, like giant digital dictionaries. Not only the words and their meanings but the concepts and histories of them can be stored in an easily searchable format. Not only that, but (as has been previously discussed here) machine learning is very good at recognizing patterns. As such, it can be used to help fill holes in the language. To determine the meaning of words whose definition no one recalls, or even point the direction towards whole words that have gone missing. Even better, machine learning can help researchers determine how words were pronounced or how entire sentences might have been put together and so not only preserve a dying language but resurrect one already dead.
Why, though? Why is any of this important? Because these languages, these cultures are a part of our past and anyone with a hint of historical knowledge will tell you that if you want to know where we are heading, we need to know where we have been. If we want to preserve anything of our own culture, we had best learn why others disappeared in order to prevent ours from taking the same route.
What’s your data worth?
Speaker 1 (00:07):
Welcome to Tartle Cast with your host Alexander McCaig and Jason Rigby where humanity steps into the future and source data defines the past.
Alexander McCaig (00:25):
So, you ever heard someone speak Gaelic?
Jason Rigby (00:29):
I may have. I probably have.
Alexander McCaig (00:31):
Difficult language. Also when you hear it, you're like, "I don't know what the hell is going on." And then imagine if you had a derivative of Gaelic or you had like a proto-Irish. Specific to very specific regions meaning very specific things, depending on the cultural working. So, like a group of fishermen, they have their own things that they talk about, certain nuances. There's a lot to be learned here from cultures, subcultures and sub subcultures and the languages used and how people view and interpret things.
Alexander McCaig (01:08):
So, there's a phenomenal article on this data surrounding languages like that, languages that are essentially ceasing to exist and why it's so important that we maintain a record, a historical record that we don't delete or omit because maybe there's a little bit of turmoil and may seem wrong at the time and we just want to erase it from human memory. No, keep it there.
Jason Rigby (01:32):
Yeah. And I think whenever we look at endangered species and we see that animal and we can see it and we're like, Oh my goodness, this guy is gone," it's like a reality. Like it's, okay, there's no more tigers, there's no more lions. I'm not saying there is now, but in the future. We look at giraffes and we see these beautiful giraffes, and then we're like, "They're gone forever." It makes your heart stop, like oh.
Alexander McCaig (01:56):
Jason Rigby (01:56):
But there are languages that are going extinct, and we have the ability right now with the responsibility of researchers to take these languages and put them in because they can help us solve the problems of what our world is facing, because we can always go back in history and you know this.
Alexander McCaig (02:17):
And we can better understand human nature. So much is baked into story, myth, legend and the way it's spoken and the context carried within that language and the way people understand ideas. German, for instance, has the ability to string together many different words to create a conceptual idea as one word. We can't do that in English. Imagine if that just eviscerated, like gone. Bye. What do you do? Then that concept, that idea, that way of thinking, that dynamicism, that explicitness of something no longer exists. You can't test against it. We need to have this record of language of the way people speak, understanding their behaviors, their mindsets. It's so important. Just as important as to look at someone's location, where they go.
Jason Rigby (03:03):
Well, it's even important when we look to the future in the sense of understanding... Machine learning's a love language. It's very easy to decipher language, to understand language.
Alexander McCaig (03:12):
So much input.
Jason Rigby (03:13):
Yeah. And so whenever we're... And there's rules and all that stuff. So, it loves language. But what if we had an AI in Ireland and it's just based off of modern society?
Alexander McCaig (03:25):
Jason Rigby (03:27):
Instead of looking like... This article talks about there's coastal Irish words that were being used, and they talk about them and I'm not going to get into them, but they were talking about there's this one where it's a three bladed knife on a long pole and there's a name for that, and there's no other place in the world that has the name for that specific-
Alexander McCaig (03:47):
Yeah, for harvest and kelp. Think about what it teaches you about that culture. It's so rich. Why do you think people were so bummed out when Napoleon just broke the Rosetta stone?
Jason Rigby (03:56):
Alexander McCaig (03:57):
It's a bomber, you know? Because we want to translate, we want to understand, we need to understand the context of history. Because our perspective of certain ideas now are completely different from how people viewed things. So ,we can't take how we view things now and then go back and start to translate maybe a dead language and say that they thought exactly the same way. We may interpret that word completely out of case.
Jason Rigby (04:19):
Yeah. This almost extinct coastal Irish fishing language... Here's one example. This is really fun. A sarcastic person. And I can't speak this, but it talks about... It's described like this. So, instead of us saying, "Hey Alex, you're sarcastic," here's what it is. "Like a salted herring from the bottom of a barrel." And if you think about that, it's so descriptive. It's so beautiful.
Alexander McCaig (04:43):
Think about the sensation of a salted herring. It's the one at the bottom. It's absorbing everything. And they're so sarcastic. It's so sharp. You're like, "My God." It's almost like a little jarring when you put- [crosstalk 00:04:54]
Jason Rigby (04:55):
But what did you say? It's so descriptive because they're observing... So, a sarcastic person, this ancient wisdom, you're absorbing everything around you and that... I don't want to get too psychological here, but in you absorbing all of that around you, you're creating this negative bias.
Alexander McCaig (05:12):
And the pressure of your species on top of you? You are going to be a little sarcastic, you know? These are the phenomenal things, and the preservation of this data is deeply, deeply important. So even, for instance, the data packet on Tartle, where you begin to capture specific aspects of dialogue relative to these locales, micro or macro locales, all over the globe.
Jason Rigby (05:31):
Yes, exactly. And European capital of culture is helping this Galloway's designation in the Ireland Atlantic communities. And I love this. They're creating a sea dictionary and on this link, there's an online database for recording this, but it's more than just taking a scribe and recording it and putting it in a book or whatever, when we put data online and we have that data forever. So, we have the ability. Languages? We have the ability for them not to go extinct. Animals? I mean, we can better the climate, be safe to get rid of all the people just killing animals. Great. We can do all that, but language is an easy win for us.
Alexander McCaig (06:20):
Yeah. It's a great way for us to go back, test and retest things. Here's a favorite one of mine here from this article, how they describe a choppy sea, you only get white caps. You know the caps of the waves, how they crest over small ones. So, they call them white flowers on the fisherman's garden. The ocean is the garden of the fishermen and the white flowers are these white capping. The sea's a little rough right now. It's a phenomenal way- [crosstalk 00:06:49].
Jason Rigby (06:51):
How could we take this language and say we're gardeners of the sea? What are we doing right now with the sea?
Alexander McCaig (06:58):
I don't want to use that word. We're pillaging the sea.
Jason Rigby (07:02):
What if we decided to be gardeners of the sea? What would that look like?
Alexander McCaig (07:06):
Oh, you mean like we care for the soil. We try and create diversity within it. We don't try and take too much because it will end up killing the garden. Right? You want to only take what needs to be taken. The fruit, the germination at the top, I don't need to scrape the whole base of the ecosystem. I don't have to rip the coral out. You know what I mean?
Jason Rigby (07:26):
Yes. Yeah. I love it. And to close us out, with so much uncertainty that we have now, and then so much heritage and history that we have, we can go into the past and extract meaning to help us for the future. And how do we do that with Tartle specifically when we look at companies that are coming in, because you can sign up for Tartle, that's easy, but I want to put responsibility on corporations. I want them to see how important it is for them not to take a person's data, but to purchase it from them. There's a philosophical view on that, that we have at turtle, why it's so important.
Alexander McCaig (08:09):
Because when you meet the person where they are and say, "I recognize the value in how you think. I recognize the value in your culture. I recognize the value in your actions. I want to give you something that resembles how much I value that information. Let me pay you for that data. Let me be responsible and respectful of you as a human being in what you create. Let me share in that with you. And I want to pay you to share because I have the resources to do so. I want to meet you where you are. I want to understand better. I want to know your story of a human being." That's a phenomenal act that a corporation could do as these resource holders, as these ones that have all the brains do the analysis. It's a wonderful, wonderful, ethical way to source that data like these fishermen on the Irish coast. That's how you want to be doing it. Be in touch with nature, be in touch with human beings, do not be disconnected from them, do not abuse them.
Jason Rigby (09:11):
So, how as a corporation... This will be the last question. You can go to Tartle.co and sign up. But how can they be the gardener of data?
Alexander McCaig (09:17):
You can be the gardener of data by fostering relationships with the garden, the garden being those people, the ones that germinate those thoughts and ideas, that germinate the roots and seeds of that data. And then you can come back to them, establish a gardening relationship. Give them that resource, that water, that economic fuel. You financially pay them for it. That is the water. That causes them to continue to come back and bloom for you with that data. It's not that hard to be a gardener
Jason Rigby (09:46):
And not look at it as a one-time use, or "I've got to be passive" and go around and put people in buckets.
Alexander McCaig (09:52):
No. When a flower blooms, you never really know the nature of how creation will decide the layout of that genetics and the beauty. Every single flower is individually different. So, let it define its difference and then change your view of the world and people and how you analyze it.
Speaker 1 (10:21):
Thank you for listening to Tartle Cast with your hosts, Alexander McCaig kid and Jason Rigby where humanity steps into the future and source data defines the past. What's your data worth?