Tartle Best Data Marketplace
Tartle Best Data Marketplace
Tartle Best Data Marketplace
Tartle Best Data Marketplace

Algorithms and Dead Languages

Here is your fun fact for the day – Napoleon actually broke the Rosetta Stone. Go figure. In a way, it’s a great metaphor. The Rosetta Stone has been an incredible tool for translating multiple languages in the centuries since its discovery, proving itself a valuable aid in helping put back the pieces of many languages that tend to get broken and lost over time. The value though is not merely in being able to translate ancient languages, it’s in all the history that comes with being able to read ancient texts for the first time. Suddenly a whole perspective on historical events opens up, or knowledge of things we could never have known about otherwise is unlocked. Putting an ancient language back together doesn’t just open up words, it opens up literal worlds.

Now, the geniuses over at MIT have come up with another tool that we can use to unlock a few more. A new system has been developed by the Computer Science and Artificial Intelligence Laboratory (CSAIL) that can actually decipher lost languages. Best of all, it doesn’t need extensive knowledge of how it compares with already known languages to crack the code. The program can actually figure out on its own how different languages relate to one another. 

So, how does that wizardry work? One of the chief insights that make CSAIL’s program possible is the recognition of certain patterns. One of these is that languages only develop in certain ways. Spellings can change in some ways, but not others due to how different certain letters sound. Based on this and other insights, it was possible to develop an algorithm that can pick out a variety of correlations. 

Of course, such a thing has to be tested before it can be trusted. If you don’t test your language detector, you get bad languages. That’s probably how the whole “Aztecs said the end of the world would be in 2012” thing started. One intern with a bad translator program took it from, “And then I decided I could stop chiseling the years now. I’m a few centuries ahead,” to “the earth will stop completely rotating in 2012”. Fortunately, the researchers at MIT were a bit brighter than that. They took their program and tested it against several known languages, correctly pointing out the relationships between them and putting them in the proper language families. They are also looking to supplement their work with historical context to help determine the meaning of completely unfamiliar words, similar to what most people do when they come across a word they don’t know. They look at the entire sentence and try to figure out the meaning from the surrounding context. 

Led by Professor Regina Barzilay, the CSAIL team has developed an incredibly useful tool to help us understand not just the events of times gone by, but the way people thought back then. By better understanding the languages of the past, we can learn why people did what they did. We could gain valuable insight into cultures long dead to us. That knowledge will in turn help us to better understand our past and how we got to where we are. It gets us more information, information straight from the source, or at least closer to it. If TARTLE likes anything in the world, it’s getting information straight from the source. 

After all, that’s what we preach day in and day out around here. Getting our information from the source, minimizing false assumptions and bias when it comes to analyzing information. It’s great to see that same spirit at work in one of the world’s premier research centers and to see it being applied to our past. 

What’s your data worth?

Deep Machine Learning   

Computers have helped drive the medical industry to better and faster diagnosis as well as helping to figure out new treatments for a variety of issues. Until recently, they have primarily relied on programming that uses a linear – or machine – learning model. This type of model is great if you already know the kind of thing that you need to know. Simple things like “does this person have lung cancer or is a brain tumor causing these problems?” But what happens when the situation is more complex? What happens when you don’t already know what you are looking for? What happens when you know there is a problem but don’t have any idea what it could be? How does a machine learning model handle that? In short, it doesn’t. A machine learning model can solve for ‘x’ very well so long it knows if ‘x’ is either apples or oranges. When ‘x’ might not even be fruit, you need a different approach.

That’s why deep learning models have developed. These models try to mimic the way the human brain takes in and processes information. Instead of looking at a set of predetermined variables and solving for one particular answer (like you used to do in Algebra class) the deep learning model takes in all the information at once and looks for correlations and patterns. Think of it this way. You look at a picture of someone and right away your brain takes in all the information. You can see if the person is sweating, the pupils are dilated, hair length, freckles, whiskers, and a ton of other information. If you already have a store of information at your disposal, you might already be able to make certain deductions about that person’s health. A deep learning model does the same thing. Already programmed with a vast amount of medical knowledge concerning symptoms and their causes, a deep learning machine can make a diagnosis before a human would see the problem. This is because the machine can hold more specialized information, recall it faster, and will often have much greater attention to detail than most people. 

A recent study in Nature Communications suggests that deep learning models could be very useful indeed for medical professionals, particularly in the area of brain imaging. This is because the brain is immeasurably complex with more variables than what can be truly accounted for in the machine learning models. These models have been held back though, generally by the fact they take some time to get the results needed. Data may need to be run through the machines a few times to train the program, to give it a chance to develop the best ways to analyze the data and produce good results. So long as the deep model is ‘trained’ in this way then the results are typically better than the older machine learning models. 

Does that mean that linear, machine learning is strictly a thing of the past? Not at all. Those models still do better when it comes to simple tasks with a limited amount of variables. In fact, the two models can even be used in conjunction. A trained deep learning model can be used to establish the necessary variables to enable a machine learning model to take over and provide the needed solutions. 

Where might deep learning take us in the future? In addition to analyzing the brain and being able to make early diagnosis of cancer, Alzheimer’s, and other diseases, it could be applied to other things like an annual physical. Imagine you go to the doctor and instead of a lengthy examination (preceded by a long wait) and blood work that might take days to get back, you just walked into a scanner, got a finger pricked and the deep learning model told you your level health and anything wrong with you in a few minutes, complete with recommended treatments? It isn’t that far-fetched.

How can you help make that future a reality just a little bit faster? By signing up with TARTLE and sharing your medical data with universities and hospitals working on these models. That will give them more data to work with which will help them better train those deep learning models, which will one day make your trip to the doctor’s office a lot easier and faster. 

What’s your data worth? Sign up and join the TARTLE Marketplace with this link here.

Bad Machine Learning   

We all know how machine learning plays the game. We search for something online, talk about a topic on Facebook, buy something on Amazon or even have a real conversation near our phones and within moments related ads will start popping up. In theory, these ads are based on algorithms that track our interests and present us with items that we might actually be interested in buying. Sometimes, this might well work very well and you’ll see exactly the backpacking tarp you never knew you needed. However, other times it might be clear that the algorithm is either drunk or badly written. For example, if you’re talking about private jets for some reason, it would be silly for your phone to present you with deals on the latest Gulfstream. Yet, exactly this has happened. Clearly there is some context missing. 

Let’s take a less extreme example. We’ve mentioned that TARTLE works on a Sherpa model. We help others achieve their goals, just like the Sherpas at Everest help others make the dangerous and rewarding climb. Let’s say this sparked an interest in Sherpa culture and we were searching eBay for books on the subject. Then, over on the sidebar, up pops an ad for a Sherpa jacket. Seriously? Wanting to learn about Sherpas doesn’t mean you want to dress like one. This just goes to show that artificial intelligence is a lot more artificial than intelligent. 

How does this happen? Quite simply, the algorithms take a lot less into account than you think. They very often work on keywords, focusing on one or two particular data points. They then try to apply that to another type of product that you aren’t already looking at. It’s a cross-marketing strategy designed to get you to spend as much as possible. So in this case, eBay’s algorithms keyed into the word Sherpa and fed that into the different businesses they have data sharing and marketing deals with. Through cross referencing the Sherpa keyword with all the products in the database the best they came up with was a Sherpa jacket. 

Now, based on that model, it’s an understandable error. How could they do better? How could they provide ads within a meaningful context? There are a few ways. One is to provide ads for more books on Sherpas, or maybe a documentary or two. After all, if you’re searching for information, what you want is (wait for it)… information. That’s something eBay can do internally, or maybe with the aid of a deal with a distributor. Or, they could write a better algorithm that takes more variables into account. What sort of variables? Past purchase history is an obvious one. If someone is searching for books on Sherpas, have they ever purchased anything else Sherpa related? Or even the amount of money a person would tend to spend is important. If the customer in question has never spent more than $60 on a single item online then that Sherpa jacket is probably going to be beyond that price point. For that matter, the algorithm should be smart enough to figure out whether or not you even buy clothes online.

What if eBay isn’t the only place you shop online? Plenty of people buy almost everything off Amazon. That would be an excellent source of data to help figure out the best possible ads to put up. However, do we really want these different companies sharing our data like that? If you have been around TARTLE at all you know that we aren’t cool with that. That data is yours and you should be the one deciding whether or not eBay gets to see it. If you are synching your accounts with us then your data is encrypted and stored on our servers, allowing you to choose to share or not. Even better, if eBay is trying to figure out the best way to market additional items to you, then they can actually run some options by you first. Then they would learn quickly that you have no desire to dress like a Sherpa. However, that nice new documentary from NatGeo might look pretty good. That’s what we are doing at TARTLE, connecting businesses with customers in a way that respects and benefits both.

What’s your data worth? Sign up and join the TARTLE Marketplace with this link here.