Have you ever been to an old library, or a used book store full of old books? It’s great. The musty smell, the feel of the yellowed pages that crinkle slightly as you turn them, there really is nothing like it. It evokes a sense of depth, wisdom, permanence.
That’s exactly the feeling I hoped to have when I visited the National Archives in Washington D.C. I was looking forward to rows upon rows of filing cabinets full of the personal writings of Washington, Franklin, Teddy Roosevelt, FDR, and more. However, I was disappointed. Instead of archives, it was full of TVs everywhere. Nearly everything was digital. Now, for many reasons this was a bit of a letdown. I was really looking forward to those poorly lit rooms full of dusty documents. Yet, looked at from another angle, this was actually hopeful.
The United States has the goal of having all of its documents completely digitized by 2022. As of this writing in the fall of 2020, this is really right around the corner and if a recent meeting of all the alphabet soup agencies on the matter is any indication, it would be wise not to hold your breath. When it comes to any significant change, all bureaucracies tend to move at a pace that would embarrass a glacier. Why might that be?
As beneficial as digitizing all the information at the government’s disposal is (which we’ll get to that shortly), the fact is that it’s difficult. While technology has improved greatly in recent years, including software from Google that will automatically strip the text from a photographed document, that technology isn’t necessarily available to everyone. In some cases, it might be expensive, there may be security concerns, and of course there is training. Think of when you first started introducing your older friends and family to email, but much, much harder. Addressing all of these issues takes both time and money.
There are also incentives to slowing down the process. One is simply money. The government of course doesn’t directly have the resources and know how to do this entirely on their own. That means contracting with major software companies, companies that are more than willing to offer their expertise for a government-sized paycheck. Anyone who has followed NASA for more than a few years knows that virtually every government-related project comes in behind schedule and over budget. Sometimes, that’s because the project in question is genuinely more difficult than expected. Other times, it’s because the contractor draws it out, getting as much money as they can.
Another incentive to delay the process of digitization lies in one of the benefits. The main purpose of putting that information in a digital format is to make it searchable, to be able to run it through an Artificial Intelligence/Machine Learning (AI/ML) program to find connections between different sets of data, to find patterns that would not be able to be identified any other way. It would also find inefficiencies, something government is infamous for. Needless to say, there may be a few people in government eager to make use of an inefficiency or two in order to keep the rest of the inefficiencies from being discovered.
Finally, there is a fear that the government would come to be run by machines, that those AI/ML programs would be doing more than just analyzing data, they’d be setting policy. In truth, the goal is simply what was said above, it will be used to analyze data and find correlations, not come to conclusions about what to do with that data. And even if that were the case, such software is limited by its programing.
Think of the old, trench coat wearing detective. He is typically shown with a wall of pictures, newspaper clippings, and scribbled notes, all connected by red strings, showing the different connections. AI/ML programs will do the same thing, yet they won’t replace the detective. He (just like the policy maker) still has to ask the questions and draw the conclusions and make the decisions. The data is only as good as the questions asked and the decisions made with it.
So, given all of these difficulties and fears, is it worth going to all the trouble just to be able to crunch data faster? The answer is clearly yes. While those musty rooms of old books are great, they have one key weakness – they are stagnant. They are full of information that just sits there, not doing anything. Digitization makes it much easier to access and make use of all that information. And information, data, is only valuable, only useful when it is used, when it is moving. To keep that data inert in a vault deprives us all of important insights and advances that can only be made through the movement of data. With data moving, its true power is unleashed through faster and better decisions.
What’s your data worth? Sign up and join the TARTLE Marketplace with this link here.
Speaker 1 (00:07):
Welcome to TARTLE Cast with your host Alexander McCaig and Jason Rigby, where humanities steps into the future and source data defines the path. The path.
Alexander McCaig (00:26):
Hello. Hello, let's step into the future. I just drank too much kombucha.
Jason Rigby (00:29):
Love it. It's-
Alexander McCaig (00:30):
Why do I always do that?
Jason Rigby (00:31):
It's clear mind.
Alexander McCaig (00:31):
Jason Rigby (00:32):
So now you're going to have a clear mind.
Alexander McCaig (00:34):
I pride myself to burp on this mic constantly.
Jason Rigby (00:36):
What if you become clairvoyant now-
Alexander McCaig (00:39):
Jason Rigby (00:39):
... off the kombucha.
Alexander McCaig (00:40):
Spirits are just coming to me.
Jason Rigby (00:42):
Maybe we just started something. People that are out there listening to this, a clairvoyant kombucha and then have like esoteric symbols all over the label-
Alexander McCaig (00:48):
Jason Rigby (00:48):
Alexander McCaig (00:49):
I would buy that brand.
Jason Rigby (00:50):
Alexander McCaig (00:51):
I'm all about that type of branding.
Jason Rigby (00:53):
And we invent all kinds of things.
Alexander McCaig (00:53):
Yeah. That's what we do, we're big idea people.
Jason Rigby (00:56):
Yes. That's funny.
Alexander McCaig (01:02):
We've talked a lot of philosophy and fundamental and explanatory things regarding TARTLE, its effects on society, what have you. What I want to now bring the series into... I got to stop with the booch. What I want to bring the series into is current events.
Jason Rigby (01:25):
Alexander McCaig (01:26):
With what's going on with data and the applicability of these events, and I kind of want to give our two cents on it. I know you've curated a lot of fresh stuff, but I think this is something that we should start moving into because it's important to talk about where we currently are in society with technology and where we kind of fit into that with TARTLE. And I'll kind of give, frankly, my insights and opinion on it.
Jason Rigby (01:49):
Yeah. And that's what I want to get into. I think the first one that people need to understand is there is a reaction and I want to use that word reaction, to large companies, to government identities with what is going on with decentralization in the government.
Alexander McCaig (02:09):
Jason Rigby (02:11):
And not just government, but these... And I'm not talking about Google, Facebook because everybody always kind of gets the rap on them, but I'm talking about banks, large hedge funds-
Alexander McCaig (02:21):
Sure. Yeah, yeah.
Jason Rigby (02:24):
Goldman Sachs. When you're going over into Europe and you're seeing companies that have literally been in place since the 1700s.
Alexander McCaig (02:31):
Jason Rigby (02:32):
So they're looking at data and they're saying, "How can we monetize this? What should we do with it? We know the direction that we should go, we pay to have all these reports done for us and we're seeing this." And it's a lot nearer than people think because what we're going to get into today, a lot of these large corporations and large governments and banks and stuff are all using 2025 and 2030, and that's not that far away.
Alexander McCaig (02:58):
No, it's not. I mean when we heard about 2020, it was like that's [crosstalk 00:03:03], that's quick.
Jason Rigby (03:05):
Yeah. And this one I want to talk about the National Archives and Records Administration. That's our government agency here in the United States-
Alexander McCaig (03:11):
Jason Rigby (03:12):
And they hold all the government files.
Alexander McCaig (03:14):
Have you been to the National Archives in DC?
Jason Rigby (03:16):
A long time ago, yeah.
Alexander McCaig (03:18):
I was a little bummed out. I went in there recently.
Jason Rigby (03:21):
You're trying to find UFO's.
Alexander McCaig (03:22):
Yeah. I want to know all about it.
Jason Rigby (03:25):
Who killed JFK?
Alexander McCaig (03:25):
Yeah, exactly. But when I went in there, it's just like TVs and bad exhibits.
Jason Rigby (03:34):
Alexander McCaig (03:35):
There's no archives.
Jason Rigby (03:36):
Alexander McCaig (03:36):
Most of the stuff's made digital, everything's locked up. I can't get my hands on the musty smell of the old paper.
Jason Rigby (03:40):
Right. Yeah, yeah.
Alexander McCaig (03:41):
You know what I mean?
Jason Rigby (03:41):
You want to actually see.
Alexander McCaig (03:42):
You know when you go into a library. Well for me, when I go into a library, I just want to really grip on it. I want to see the archives.
Jason Rigby (03:48):
Yes, yes. Yeah.
Alexander McCaig (03:49):
Even if it's a server room, someone's like, "Who's in the archives?" You know what I mean? Like, "Oh cool. It's behind glass, that's awesome." No, I just get some crap exhibit.
Jason Rigby (03:56):
What was the University? They have a little thing where they invented the internet. They're in California and they have all the old computers and all the modules and stuff like that.
Alexander McCaig (04:03):
Stanford, when they were working with DARPA and stuff like that.
Jason Rigby (04:05):
Yeah, they have all that available.
Alexander McCaig (04:06):
And UC Berkeley
Jason Rigby (04:06):
So I think there's a room, and viewers can correct us on this in the show notes, but I think there's a whole room and they preserved it.
Alexander McCaig (04:12):
I hope they did.
Jason Rigby (04:13):
Like, just like how it was, like that the clock is there.
Alexander McCaig (04:15):
You know what I wish it was like, you know the Da Vinci Code?
Jason Rigby (04:19):
Alexander McCaig (04:20):
Do you know when he goes in the Vatican into the library?
Jason Rigby (04:22):
Alexander McCaig (04:22):
That's what I wanted it to be like, but I was so bummed out going to the archives, you go in, you can donate to go in here, but it's just like, it was melodramatic.
Jason Rigby (04:32):
Yeah. Exactly. Yeah.
Alexander McCaig (04:33):
At the very least. It was dark and gloomy. Make the archives fun.
Jason Rigby (04:37):
I always picture it like being in a underground lights, fluorescent lights going [inaudible 00:04:45], like almost in a horror movie. And then all these metal file cabinets like that ugly brown color.
Alexander McCaig (04:47):
Ugly brown or like that weird teal from the 50s.
Jason Rigby (04:49):
Yeah and then in the middle is these old big tables with like, you remember the little pencils?
Alexander McCaig (04:55):
And they had no eraser, they're just cut at the top.
Jason Rigby (04:57):
Alexander McCaig (04:57):
Frankly useless writing instrument.
Jason Rigby (04:59):
Yeah and they're white.
Alexander McCaig (05:00):
Yeah. Oh that'd be awesome.
Jason Rigby (05:01):
So the National Archives and Records Administration, they made a pronouncement to all the agencies. So think of all the government agencies, thousands probably.
Alexander McCaig (05:09):
Too many three-letter agencies.
Jason Rigby (05:12):
That they're going to be a hundred percent digital, get this, by the end of 2022.
Alexander McCaig (05:17):
So they're going to be done scanning in-
Jason Rigby (05:19):
Alexander McCaig (05:19):
All the hard copy archives.
Jason Rigby (05:21):
And so the government said, you agency, whatever agency it is, NSA, you are going to be a hundred percent by... So they all had group heads all get together. And they had a meeting under the digital government institute, E-Discovery.
Alexander McCaig (05:34):
Jason Rigby (05:34):
Records and information management. It was a virtual conference because of COVID and stuff. This just happened. And they surveyed all these leaders to get a sense on how far along they've progressed in going a hundred percent digital by the end of 2022.
Alexander McCaig (05:49):
And what did they say?
Jason Rigby (05:50):
Alexander McCaig (05:50):
NSA is like we've been doing it since the eighties.
Jason Rigby (05:52):
So here's the three main responses, they were divided. One was making progress. One was, we have a long way to go. And the other one was, don't ask.
Alexander McCaig (06:03):
So essentially nothing has gotten done.
Jason Rigby (06:06):
Alexander McCaig (06:07):
Making progress means somebody moved a piece of paper from this department-
Jason Rigby (06:11):
Alexander McCaig (06:12):
Over here, because it's the government.
Jason Rigby (06:14):
But and what I want you to speak to, is these electronic records management and the journey that they're going on-
Alexander McCaig (06:19):
Jason Rigby (06:19):
Is where a lot of companies are. But once they put that information, then it becomes data.
Alexander McCaig (06:25):
Well it does become data. And then, so let's consider something here. A lot of companies still do a lot of stuff on paper-
Jason Rigby (06:32):
Alexander McCaig (06:34):
The digital transformation, I know that's a buzzword, for these companies is slow. There's a lot of, well-trodden, even manufacturing firms here in the United States that you don't hear their names, but they're making important stuff like the ball bearings, certain chip sets, all this other stuff. Things that, to you and I or any other person would seem albeit unimportant, but they're foundational things to the economy and the things that we produce here.
Alexander McCaig (07:05):
And they do a lot of stuff still on paper.
Jason Rigby (07:06):
Alexander McCaig (07:07):
They lack that transition from hard copy to digital copy. In doing so effectively means that, one, it's like, great we're going to take a document and make it digital, but you have to train people on accessing that document. And then we have security about how it's being stored.
Jason Rigby (07:25):
Alexander McCaig (07:26):
And then how are we going to access it? Okay. How long do we keep it for?
Jason Rigby (07:30):
Alexander McCaig (07:31):
Is it worth scanning in? And then I also have to accrue my time and people and resources to go and actually get that stuff and turn it into something digital.
Jason Rigby (07:40):
Well, and one of the big concerns is integration of AI and ML capabilities. So artificial intelligence machine learning, because what they're learning is, if we can take this tremendous amount of data, since 1776.
Alexander McCaig (07:54):
Jason Rigby (07:55):
And we can put it in the cloud or wherever it goes. And we allow machine learning and AI to do its job, especially machine learning now with what we have, then what kind of information would we be... What kind of insight would we be getting back?
Alexander McCaig (08:09):
Well, here's the funny part about AI ML, it's designed to look for inefficiencies.
Jason Rigby (08:15):
Alexander McCaig (08:18):
The government, do you really think they want to put an AI ML process on all of their stuff to read the inefficiencies of what's going on. It's not going to clue them into anything other than it's inefficient.
Jason Rigby (08:27):
Alexander McCaig (08:27):
It's designed to be inefficient. But if you're talking about accessing these documents, I think it helps them in terms of receiving more funding.
Jason Rigby (08:35):
Alexander McCaig (08:35):
Saying that we're going to use AI and machine learning. We're going to partner with IBM to get it done. Okay, well big deal. What advantage really is it for you to make all this stuff digital.
Jason Rigby (08:46):
Right. It's almost like a whole new frontier, I was talking to you about, it's the wild west and so now you can get Oracle, Microsoft, all these different, big, large companies, Google, to say, "Oh okay, we'll team up beside you and help you-
Alexander McCaig (08:59):
Jason Rigby (09:00):
And become a government contractor." Now we have a whole new way to bill the government.
Alexander McCaig (09:05):
Yeah, that's essentially what's going on. And then you have this, the government becomes then a digital government at that point. It's not like we're saying we're going to apply AI machine learning to policymaking.
Jason Rigby (09:15):
Alexander McCaig (09:15):
No, it's just going to be how we index and search documents and how departments and agencies, inter-agencies speak to one another, because right now they do a very poor job.
Jason Rigby (09:23):
Alexander McCaig (09:23):
So if we can digitize the documents, we can cut back on a lot of the processes that may take weeks or months at a time-
Jason Rigby (09:30):
Alexander McCaig (09:30):
To get it from one department to the other properly and just have it sent over electronically.
Jason Rigby (09:34):
Yeah and one of the things that they talked about is they want to automatically classify, extract and enrich physical, I thought that was interesting, physical and digital content. So classify, extract and enrich.
Alexander McCaig (09:46):
What do they mean by physical?
Jason Rigby (09:47):
I was wondering if it would be something that they have physically where they would maybe take pictures of it or maybe it's satellite imaging.
Alexander McCaig (09:55):
Oh yeah so that makes sense. So for instance, on most phones that we have a thing called Google Lens.
Jason Rigby (09:59):
Alexander McCaig (09:59):
If you take a picture of a text document, it's going to actually strip all the text off.
Jason Rigby (10:03):
Alexander McCaig (10:03):
It's just a function of an algorithm and it's processed on the servers in the cloud. So that's probably something they're looking for. If they have documents that you don't really want to touch or manipulate, but take a photo at a distance and frankly absorbed the observational data.
Jason Rigby (10:16):
Alexander McCaig (10:17):
Because that's what AI ML does. It's all about observing. That's the only state in development that currently sits at, well that's something they're looking for, especially things that are quite fragile, you can't move them much. Say you want to document statues.
Jason Rigby (10:28):
Alexander McCaig (10:28):
I want to take a picture of the statue. And I want it to be automatically identified within our systems, with our machine learning or AI, whatever it might be, say that this statue is here, it has this much data. This is the last time it was maintenanced, all these other things that go with it.
Jason Rigby (10:39):
Right, yeah, exactly. Yeah. So what's the history behind it?
Alexander McCaig (10:43):
Jason Rigby (10:43):
All that would be attached to it. At a high level, the three-step process is going to look like this. Number one is input records. So the system uses optical character recognition, image recognition, and video processing capabilities to ingest information and save it in electronic format.
Alexander McCaig (10:59):
Okay. So really all it is, is a giant electronic scribe.
Jason Rigby (11:02):
Alexander McCaig (11:03):
It's a movie camera and a something taking notes.
Jason Rigby (11:05):
Alexander McCaig (11:05):
That's all they're saying. And we need it to sit there and we need it to run all day long, collecting all this hard copy stuff.
Jason Rigby (11:11):
And so the second one is they want to play AI and ML, and here's how they want to do that. Using a classification process and identity extraction combined with AI ML tactics. I thought that was an interesting word. This system will automatically train a model, apply it, capture feedback, and prepare the data for the next phase.
Alexander McCaig (11:29):
Yeah, that's correct. So what they're saying is that we're going to institute our AI ML, what is our base algorithm that we're going to use? Great. So these are the fundamental outputs we're looking for, depending on the inputs-
Jason Rigby (11:40):
Alexander McCaig (11:40):
We're putting in. And the inputs are this hard data, photocopies, videography. Now that these are in the system is going to look at it, identify it and then it's going to go back and it's going to retune those initial parameters that were in the algorithm, so that as it begins to ingest more information-
Jason Rigby (11:54):
Mm-hmm (affirmative). Yes.
Alexander McCaig (11:55):
It can more accurately see where certain things belong or how they actually interact with one another.
Jason Rigby (11:59):
Yeah. And then number three, which you just spoke of, is output.
Alexander McCaig (12:02):
Jason Rigby (12:02):
Once the data is prepared it is exported into a data visualization function that enables agencies to search, analyze VI and generate deeper insights to make better decisions from that information.
Alexander McCaig (12:13):
Okay. So here's an interesting part. So they want this thing reading files so that it can find some sort of efficiency KPIs.
Jason Rigby (12:21):
Alexander McCaig (12:21):
Jason Rigby (12:22):
And then I like the word data visualization function.
Alexander McCaig (12:25):
The reason they do that is because people don't like trying to manipulate Excel files.
Jason Rigby (12:29):
Alexander McCaig (12:29):
They do not like reading numbers. They don't like reading text. People think that they can demand better decisions if they look at a pie chart quickly-
Jason Rigby (12:37):
Alexander McCaig (12:37):
Or get some sort of visual and a lot of systems now, especially some local companies here we have in New Mexico, specialize in just ingesting data and then having their system with its algorithms output some sort of visual immediately.
Jason Rigby (12:50):
Mm-hmm (affirmative). Yeah, no that makes sense there. So benefits of the AI ML approach, they kind of describe it. And then I'll have you speak to a few of these and then we'll be done with this story. Metadata for search and tracking. So, and then it has data ingestion from multiple sources.
Alexander McCaig (13:07):
Okay. Well, let's talk about the metadata for a second.
Jason Rigby (13:09):
Alexander McCaig (13:09):
So when you take a photograph, just so someone understands this, the photo sits there.
Jason Rigby (13:14):
Alexander McCaig (13:14):
But actually inside of that, it's going to inscribe the geolocation where the photo was, the time, the place. And if you're lucky, sometimes it can actually recognize who the person is and start to input profile data.
Jason Rigby (13:25):
Alexander McCaig (13:25):
Oh, this was Alexander in this photo, this was Jason in this photo. You know what I mean? And that's the metadata, that's the stuff that's actually baked into it.
Jason Rigby (13:32):
Alexander McCaig (13:32):
And also the file size.
Jason Rigby (13:33):
Alexander McCaig (13:34):
How it's formatted. Those are all the-
Jason Rigby (13:36):
What camera it was taken on.
Alexander McCaig (13:37):
Those are all the things that help index where this information goes in these systems. So that when you go back to these tables to look for this info, the machine does, it knows what things it wants to search for and almost piece together in its little puzzle.
Jason Rigby (13:50):
Yeah I always picture, in my little mind to make it simple for everyone, you remember that cool detective that has stayed up all night and it's like four o'clock in the morning, he's drank 20 cups of coffee and he's smoking cigarettes.
Alexander McCaig (14:05):
He's pulling a cigarette out, yeah.
Jason Rigby (14:05):
He doesn't care. And then you look up and there's this huge board. And then there's like strings attached-
Alexander McCaig (14:12):
Its got the red string like a [inaudible 00:14:10].
Jason Rigby (14:12):
With the serial killer.
Alexander McCaig (14:12):
Jason Rigby (14:12):
Yeah and all of a sudden he's got all these documents and files-
Alexander McCaig (14:13):
That's what it is.
Jason Rigby (14:14):
And he put it all together.
Alexander McCaig (14:15):
What you're saying right there, that is the very analog format-
Jason Rigby (14:19):
Alexander McCaig (14:19):
Of what a computer can process at a more efficient rate.
Jason Rigby (14:22):
Alexander McCaig (14:23):
But that only happens if the proper parameters are put in place [crosstalk 00:14:28] by the detective.
Jason Rigby (14:27):
If you have that bad-ass detective.
Alexander McCaig (14:30):
Yeah, the detective would have to know exactly what needs to be looked for and what shouldn't be looked for.
Jason Rigby (14:34):
Yes. Yeah. And then disseminate that information.
Alexander McCaig (14:36):
Jason Rigby (14:37):
Data integration from multiple sources, I mean, this is a huge-
Alexander McCaig (14:40):
Come on this is what TARTLE does.
Jason Rigby (14:42):
Alexander McCaig (14:42):
Data integrations from multiple sources means if you're going on TARTLE and say we have a thousand different platforms that were integrated with. Social media, healthcare, IoT, Fitbit data, whatever it might be.
Jason Rigby (14:52):
Alexander McCaig (14:53):
Those are all these different data sources, these integrations that are coming in place. So that they get constant feed, like a pipe of information, that's always feeding the system.
Jason Rigby (15:01):
Yeah because what they're worried about is having siloed sources. And so they want to classify each piece of information by type, with the associated metadata.
Alexander McCaig (15:09):
Yeah. And that's what they want to do. One thing that happens is, a lot of information that gets collected in systems, we talked about data lakes, data silos. We talked about the movement of data.
Jason Rigby (15:16):
Alexander McCaig (15:16):
In my previous episode. They're trying to put momentum behind the stuff. It's been locked up for so long. They've done so good at recording and holding on to things, physical and nonphysical.
Jason Rigby (15:26):
Alexander McCaig (15:27):
Now it's like, "Okay, we got to put some wheels on it."
Jason Rigby (15:29):
Yeah. And I think one of the things, when that detective picks up that folder of an unsolved case from five years ago-
Alexander McCaig (15:36):
Jason Rigby (15:37):
And he's trying to, "Okay, this happened, let me put this together. There were similarities in this." What I like about this and this is what I love about TARTLE is that we're putting value to information.
Alexander McCaig (15:49):
Yeah. Of course.
Jason Rigby (15:50):
Because there was no value sitting in a file cabinet in a folder.
Alexander McCaig (15:53):
Jason Rigby (15:54):
With an unsolved case.
Alexander McCaig (15:55):
Unless it's moving, unless someone can take that data that's been purchased and associated with other data and analyze it and cross analyze it and do all these other things to find those correlations, it's worthless.
Jason Rigby (16:07):
Alexander McCaig (16:07):
It needs to be moved.
Jason Rigby (16:09):
Alexander McCaig (16:09):
It needs to be treated like an asset, but an asset that's fungible.
Jason Rigby (16:12):
Alexander McCaig (16:13):
And when I say fungible, something that can easily be traded, transported, held, move, whatever, all these different things.
Jason Rigby (16:18):
I love that. Yeah.
Alexander McCaig (16:19):
Jason Rigby (16:19):
So, I mean, hopefully 2022.
Alexander McCaig (16:21):
Jason Rigby (16:22):
Which is like in two years-
Alexander McCaig (16:24):
I'll tell you what though-
Jason Rigby (16:24):
A hundred percent, government agencies.
Alexander McCaig (16:26):
We have beat them to the punch.
Jason Rigby (16:30):
Alexander McCaig (16:31):
Folks. You heard it here first. Thank you very much.
Speaker 1 (16:41):
Thank you for listening to TARTLE Cast with your hosts, Alexander McCain and Jason Rugby, where humanity steps into the future and the source data defines the path. The path. What's your data worth?