Tartle Best Data Marketplace
Tartle Best Data Marketplace
Tartle Best Data Marketplace
Tartle Best Data Marketplace

Big Data Bias

Data can do and has done great things to help improve life for people everywhere. That’s accelerated in the digital age (yes, data existed before computers, we just gathered it with our senses) as we’ve been able to gather more and more data faster from more people. Learning how to sort and analyze that data quickly has also been a game-changer in forming policy and developing new products. We’ve been able to better target where a problem may lie in a company, which roads need to be fixed, better ways to distribute medicine and the list goes on.

Of course, as with so many things, there is a downside. We’ve often looked at how companies can profit off of the data that you generate for them without your consent to gather it in the first place, or even if you have consented to the gathering, you might not appreciate some of the ways that data gets used. Often, a person might withdraw that consent if they knew how it was being utilized. However, there is another downside to the way data is currently handled that we haven’t discussed nearly as much. That would be biases in how it gets sorted, or that things are sorted in certain ways at all. 

The government, companies and other organizations often sort data into different categories based on race, cultural background, income, and shopping habits. What do you notice about all of that? Those are all attributes of people. Yes, it is often useful to classify and sort information into different categories. Yet, aren’t people more than the sum of a few superficial attributes? Aren’t people more than their race? More than their paycheck? TARTLE would like to think so.

What are some examples? Some universities will sort people based on these kinds of categories and then run it through a set of predictive algorithms to determine who they should and shouldn’t admit. So you have a kid from the inner city, low income, no father, a couple of petty robberies on his rap sheet. The algorithm rejects him. It’s easy to see why. Yet, what if this kid is eager to turn his life around and do better, to get out of a crappy cycle? What if all he needs is a chance? The algorithm won’t catch that. It doesn’t care that the kid is a human being and not a collection of attributes.

Another is at least one town used predictive analytics to determine who in the area was likely to be a criminal. That led to a lot of harassment when police officers took the information their algorithm spit out and started trying too hard to catch those people in a crime. In addition to the obvious injustice of being treated like a criminal before even committing a crime that also meant that resources weren’t getting directed where they needed to be. A number of crimes might have been prevented if the police weren’t focused on people their computers said were likely to be criminals. Not to mention, by repeatedly agitating some people, you might actually create a couple of criminals when they lash out.

This has of course infected the corporate world as well. Some companies actually grade their employees’ productivity based in part on how much digital interaction they engage in. People at these companies can be considered productive if they send a lot of emails and participate in the company’s group chat. Of course, the emails could be a series of memes saved on your phone, the group chat could be talking about your new car or any number of silly and irrelevant things that have absolutely nothing to do with productivity. It’s possible this is the worst metric ever.

All of these examples point to a central and significant problem, a problem that pervades Big Data. The problem of forgetting that behind all of those data points is a person, a person that probably will not fit perfectly into the box an algorithm will try to shove them in. What is the solution? How can we get and analyze data without losing sight of the people behind it? By going to the people themselves. By getting to know them, asking them questions, learning what their goals really are, instead of letting algorithms decide that for everyone. That is the mission of TARTLE, to get organizations to go to the source of the data, to go to you so they can get real information, information that will actually contribute to understanding what is really going on in the world. Something that no algorithm will ever be able to do.

What’s your data worth?