Putting Artificial Intelligence to Work

January 7, 2019 • Zane Pokorny

Our guest this week is Thomas H. Davenport. He’s a world-renowned thought leader and author, and is the president’s distinguished professor of information technology and management at Babson College, a fellow of the MIT Center for Digital Business, and an independent senior advisor to Deloitte Analytics.

Tom Davenport is author and co-author of 15 books and more than 100 articles. He helps organizations to revitalize their management practices in areas such as analytics, information and knowledge management, process management, and enterprise systems. His most recent book is “The AI Advantage: How to Put the Artificial Intelligence Revolution to Work (Management on the Cutting Edge).”

Returning to the show to join the discussion is Recorded Future’s chief data scientist, Bill Ladd.

This podcast was produced in partnership with the CyberWire.

For those of you who’d prefer to read, here’s the transcript:

This is Recorded Future, inside threat intelligence for cybersecurity.

Dave Bittner:

Hello everyone, and welcome to episode 89 of the Recorded Future podcast. I’m Dave Bittner from the CyberWire.

Our guest this week is Thomas H. Davenport. He’s a world-renowned thought leader and author, and is the president’s distinguished professor of information technology and management at Babson College, a fellow of the MIT Center for Digital Business, and an independent senior advisor to Deloitte Analytics.

Tom Davenport is author and co-author of 15 books and more than 100 articles. His most recent book is titled “The AI Advantage: How to Put the Artificial Intelligence Revolution to Work (Management on the Cutting Edge).”

And returning to the show to join us is Recorded Future’s chief data scientist, Bill Ladd. Stay with us.

Tom Davenport:

Well, I think there are a variety of reasons why we’re moving in a more artificially intelligent direction.

Dave Bittner:

That’s Tom Davenport.

Tom Davenport:

One is that we just have so much data in almost all aspects of business and organizational life that we really have very little choice but to automate some aspects of the analysis of it and to learn from it effectively in creating insights and making decisions and taking actions. And there are also some other supply-side factors related to … We have a number of new algorithms and we have some powerful new types of processors that can chew through all this data really quickly and learn from it. And then, I guess you could say there’s some demand-side factors, like, we finally are starting to realize that we humans are not very good at making decisions in many cases, and a lot of the work we’re doing is very tedious and there’s just too much of it to do. So I think the demand and the supply factors add up to a pretty inevitable future for more AI in a whole variety of aspects of our lives.

Bill Ladd:

We tend to overestimate the short-term impact of technologies like these and underestimate the long-term impact.

Dave Bittner:

That’s Recorded Future’s chief data scientist, Bill Ladd.

Bill Ladd:

I went through the human genome boom back in the ’90s, and that was another case where that was exactly the case. Everyone thought that all of these drugs were just going to magically fall out of sequencing the genome, and it really didn’t work that way. It felt like there was, perhaps, a bust. But there’s not a drug discovery today that isn’t built on those technologies, on those platforms that drove that genome project. And simultaneously, the genome, now people sequence themselves for a hundred dollars. Now police use those public databases to find relatives of criminals. The way in which that work has transformed our world is tremendous. But at the time it was like, “Okay we sequenced the genome, but now what?” You tend to take a linear path when you think about where something is going to go, but it’s really where divergent or different approaches or things that are not just linear start to bump into each other that it’s just so hard to predict.

One of the reasons that that’s hard to predict is, you have to predict simultaneously where different technologies or approaches are going to be and then how they may react to each other when they get there. I think about what’s going on right now in AI, and a lot of it comes down to, there are just so many different assets that are available to people to do these kinds of projects. There are assets that are the internal data. There are the assets that are the external data. What other data can I get, more or less, for free? What software libraries are out there that I can use? We have concepts right now where we use essentially people as APIs, so I can essentially make human processes part of my computing infrastructure. What can I do in the cloud architecture? I can run experiments there that would require me to build a cluster of a hundred servers, but I’m never going to do that for an experiment.

So, all these different things are all kind of marching along on their own path and in order to really predict, you’ve got to understand all the different players and where they’re going to be at different points in time and what the potential connections are. So much of those findings are opportunistic. Someone says, “Oh, if I took this and I took that and put them together, it would be really fantastic.” It’s hard to do that, one technology or one approach, at a time.

Dave Bittner:

I think particularly when it comes to AI, and some of the hype that it’s received on the marketing side of things, I think sometimes for many people, myself included, it’s gotten a bit fuzzy as to what exactly we’re talking about when we say AI versus machine learning, those sorts of things. So in your mind, can you describe to us, how do you set those boundaries? What’s your definition of artificial intelligence?

Tom Davenport:

I sometimes say God did not see fit to provide us with clear definitions of things and we humans muck it up all the time. I view it pretty broadly, actually. AI being a collection of technologies. Machine learning is probably the most popular and common and also the one that has a variety of interesting sub-categories like neural networks and deep learning and so on.

Traditional machine learning is not so different from analytics, but when you get into some of the newer types of algorithms, the adversarial networks and deep learning networks and so on, you typically didn’t find those in traditional analytics. And you also don’t find too much of the language-oriented tools in traditional analytics. And, as you know, there is a whole realm of natural language processing-oriented technologies related to AI.

And then there’s some older ones, like rule-based technologies, which one might think are gone because that was the last generation of AI, but I just did a survey with Deloitte where about 50 percent of the companies said they were still using rule-based AI technologies.

And then there are these things that people refer to as “robotic process automation,” which aren’t terribly smart right now, but are increasingly being combined with smarter technologies like machine learning. So I just put it all in the AI bucket, although clearly some are more intelligent than others.

Dave Bittner:

It seems as though there are definitely some people who get their hackles up when … I guess, being specific about what is and what is not. I wonder sometimes, have we, just through popular use, have we reached the point where those distinctions have lost some of their usefulness?

Tom Davenport:

I think they have. I’ve seen this in a whole variety of different domains, certainly in the world of knowledge management in my past, people could debate for centuries, almost. When does a piece of data turn into information, and when does that turn into knowledge, and when does it become wisdom, and so on. And I just really said, “Let’s put in all in one big bucket.”

Dave Bittner:

Let’s explore this notion of what the role is, specifically of data scientists. Putting a perspective with your background in analytics, and then today with AI, what is the role that data scientists have to play?

Tom Davenport:

Data science started around the turn of the century in Silicon Valley and it was a mixture of analytics and activities designed to turn unstructured data into structured data so it could be analyzed. We’re not very good at analyzing things unless it gets in a row and column of numbers format. So if you want to analyze text or images or sound data or genomic data or something like that, you typically have to put it in a form involving rows and columns of numbers.

Now, in AI, we don’t really have any good, widely-accepted term for someone who works with AI. Certainly, data scientists do it to a substantial degree, and a lot of the same activities that I just described as being part of data science are still necessary for AI. There are some data scientists who understand some forms of AI, and some that are more comfortable with traditional analytics. So it’s kind of two messy categories overlapping with each other at various points.

Dave Bittner:

Now, what’s been the evolution, or the easing in, of this technology when it comes to being applied to cybersecurity?

Tom Davenport:

Well, I think that’s a good way to put it. It is easing in, and it’s early days still, for the most part. And as you suggested earlier, there’s lots of hype about it. But the general idea is that organizations have so many attempted hacks and breaches and attempts at fraud and so on that it’s virtually impossible to do it all effectively with human labor. So, more and more organizations are coming to the conclusion that we’re going to need AI to do this to identify particular patterns of threats, to analyze threat intelligence, to start to take automated action.

I mean, the other thing with the hugely connected world that we have today, a serious malware attack can spread around the world in seconds. Having a human figure out what’s going on and then determining how to react to it, by the time any of us could do that, it would be too late. So there’s more and more a need for speed, and that’s what AI is good at.

Dave Bittner:

And I suppose on the flip side of that, there’s that Hollywood perception that if we’re not careful, we’re going to end up with the Terminator.

Tom Davenport:

Well I think there are all sorts of both positive and negative attributes with applying AI to cybersecurity. People could take our “good side” AI tools and modify them and turn them into “bad guys’ side” tools relatively easily. And in fact, that’s already been done in some large-scale malware attacks. I mean, if you define, for example, the U.S. intelligence services as on the “good guys’ side,” which I’m sure some people might dispute, but those pieces of code that they developed have been adapted for malicious purposes on a variety of occasions.

There are a lot of concerns … There are concerns about … Will this just end up being more work for humans? Because in most cases, we will rely on these tools to identify threats, but to really confirm them, we might believe that a human investigation is necessary. So far, it appears that we are already having too many false positives from the use of these tools, so it’s more work for us, not less work for us.

Bill Ladd:

All right, when you talk about the Terminator, we’re talking about something which is completely autonomous in terms of how it thinks and what it can do. And the reality is that where we’re at, primarily, with AI technologies today is, we’re automating tasks that humans do. We’re not automating jobs that humans do, for the most part. So in a lot of ways, for me, what we’re really doing is, we’re focusing on augmenting what an individual can do. How do I increase that individual’s efficiency and productivity? I have a hard time imagining getting all the way to automating humans. Perhaps I’m limited in my imagination. But I see that we have the capacity to do a tremendous amount of improving the efficiency with which humans can do the things that they do.

Dave Bittner:

I think a lot of people are wary — they look at some of these AI technologies as being black boxes, and there’s not a lot of transparency for what’s going on under the hood. And so, they worry about things like biases being baked into the algorithms. What are your thoughts on that?

Tom Davenport:

That is a true story with regard to some types of AI. Traditional regression-based machine learning is typically not much of a problem because you can look at the regression equation and see what the key variables are, and so on. But once you start to move into algorithm types, like even fairly simple neural networks and in particular, deep learning algorithms, which typically have lots and lots — thousands, often, sometimes even hundreds of thousands or millions — of abstract variables that don’t really make any sense to a human observer, even a very smart data scientist, you’re exactly right. Nobody is going to be able to identify why a particular prediction was made or a particular classification was made, or something like that.

In a lot of cases, it doesn’t really matter. We don’t really care why an algorithm decided that something was a cat on the internet, but if you decide that a massive cyberattack originated in the Russian government and it leads to a response, we’re talking pretty serious allegations there and pretty serious risks of some horrible things. So I think we’re going to have to get better at interpreting these models if we’re going to use them for serious cyberwar.

Bill Ladd:

At the end of the day, it’s all algorithms and data. We’re solving individual problems with algorithms and data and just the scope of those problems gets larger and larger over time. And at some point, it starts to look like something people call artificial intelligence. But basically, you started 30 years ago looking at the data inside a system on computers that were really slow. I say inside the system — I mean we’re looking at data that was held by an individual organization. And you would have to write every line of code that operated on those because there weren’t meaningful libraries.

Today you’ve got Python libraries that can do the bulk of the math work and you’ve got massive cloud infrastructure that can hold the data and do the algorithms for you. There are more tools in the tool box. And that’s allowed us to do things that are much more complex and comprehensive than we’ve been able to do before.

Dave Bittner:

Now, in your estimation, how far along are we when it comes to developing AI? Is it still early days, are we still in the pioneering stage, or are we farther along than that?

Tom Davenport:

I think it’s pretty early days and the reason is, in cyber, we don’t have a huge … Most of these machine learning models are trained through supervised learning where we have to have labeled data to say what really was a piece of malware and what wasn’t. And we don’t really have any good worldwide databases of malware data so that we could clearly identify some code or something like that without having to go through a big data gathering exercise.

Analyzing code in general is a pretty nascent area for AI to do. There are pieces of this that can be done relatively straightforwardly today — analyzing some of the factors that might lead to breaches and hacks within organizations — that’s a pretty straightforward machine learning problem and it’s one that I’m actually working some with Recorded Future on. I’m an advisor, where we might be able to look at different attributes of a company as some things related to its scale and previous attacks, and so on, and identify the particular level of risk and even come up with a risk score for that organization. That, I think, is pretty straightforward. But really identifying malware or particular bad actors or something like that, very early days.

Dave Bittner:

You mentioned Recorded Future and one of the things that they use is natural language processing. Could you describe to us, first of all, what are we talking about with that and what are some of the benefits that that can provide?

Tom Davenport:

There are a variety of different approaches to natural language processing, but it’s basically just making sense of language — natural language — as used by humans. Typically, you want to do it across a variety of different languages since that’s what we use in this world. At Recorded Future, they use it for identifying potential threats from all over the world. People tend to talk about their cyber exploits to some degree and attacks get publicized. So you can scrape that data — typically, off the internet, which they do at Recorded Future — and then they analyze it, classify it, count it, et cetera, so you get some sense of the broad world of threat intelligence.

Dave Bittner:

And I suppose it’s a matter of being able to do it at a velocity far greater than humans could do it themselves?

Tom Davenport:

Yeah. There’s way too much data for humans to do it all, and as you suggest, we like to do it quickly enough so people could respond to it and get the right defenses in place.

Dave Bittner:

I want to switch gears a little bit and get your take on threat intelligence itself, and where you think it fits in with organizations looking to defend themselves.

Tom Davenport:

Well I think it’s hard to … It would be hard to argue that it’s not important and useful in that process. I don’t consider myself a huge expert in it, but I look at the stuff that comes out of Recorded Future and other organizations, and it basically seems like a no-brainer to me that, who wouldn’t want to take advantage of it? Now, again, we have limited ability to react to it in any automated way now, but certainly being aware of certain types of threats and knowing what other people are experiencing around the world and what might be more likely to happen in the future. As I said, we’re a little bit short of being able to make absolute predictions about what’s going to happen, but if something is happening to somebody else, like a large organization, chances are good that it could happen to you as well, so you might want to be prepared for that kind of similar situation.

Dave Bittner:

When it comes to AI, have we reached a point yet where the systems are capable of surprising humans in specific ways? I’m thinking of doing something that really smacks of intuition. We’ll talk to analysts and they’ll say, “I really couldn’t put my finger on it, but something just didn’t feel right, so I just felt like I needed to dig in a little bit more here.” Are the AI systems capable of that sort of surprise?

Tom Davenport:

Well, not yet in any sort of standard, institutionalized way. There have been some research-oriented applications, some games where … You probably remember that Microsoft research project Tay where it veered off in a racist direction. There have been some games where one AI system was trying to beat a game and did so in ways that the human creators of it didn’t anticipate. But I think it’s probably a bit too early to even understand how likely that sort of intuition or creativity is going to be. Ultimately, everything derives from the data and it could just be that there is some pattern in the data that we, as humans, did not really see, but the machine learning system is able to detect and make a decision or take an action on the basis of.

Bill Ladd:

There’s no question that algorithms can generate unexpected results. Again, it’s one of these things where we talk about the different flavors of AI. If I talk about the intelligence framework, where I am organizing information and presenting it to analysts, those algorithms are only going to focus on the things that I tell them to focus on. What that does is it gives a starting point for the human to have that intuition, not try to replicate it in the machine, but to essentially do the leg work to get the human to that point and then to support that downstream research. And those unexpected findings, where you’re most likely to find those, is essentially in your machine learning classes of AI problems — where you’re essentially looking for relationships between what I know and what I want to classify and what I want to predict.

It’s absolutely true that you quite often find that as you combine a number of input features into your machine learning problem, the features that end up being important are not the ones that you anticipated, necessarily. It’s why you do the machine learning in the first place, because you can’t do that really at scale as a human. I’m a statistician by training, that’s why we do statistics, because we don’t know what the answers are going to be. We let the math tell us, “Oh, these are the factors that are important.” And they may or may not be what we thought they were going in.

Dave Bittner:

So, do you have any advice for people who are out there in the marketplace? You go to a tradeshow and you walk around and everyone is saying, “Our systems have AI,” or, “They have machine learning.” What are some of the ways the people can cut through that hype to check and make sure that what they’re getting is what’s promised?

Tom Davenport:

You have to ask a level of questions below that: What kind of machine learning are you using? Where did you get the data? What kind of data is it trained upon? What kinds of algorithms are you generally using? How interpretable is it? And as with any other area in business and organizations these days, I think it demands some degree of sophistication in your knowledge of AI in order to be able to ask those questions and interpret the answers. But in general, I do think that one of the things that I see as a real challenge for AI in companies in any space, and not just cyberspace, is incorporating it into existing systems and processes. So, if you have a set of cyber tools that you’re happy with and those vendors start to add some AI capabilities, I think in general, you’ll find it much easier to incorporate those into your portfolio of tools than if you had a lot of standalone single purpose tools that had to be connected to everything else through APIs — you had to write code to do it, or God forbid, developing it all yourself.

We see this in a number of other domains. If you have a CRM system — customer relationship management system — it’s going to be easier for you to do machine learning-based scoring of your leads by paying a few extra bucks to Salesforce or something like that for its Einstein Lead Scoring capability than to develop it yourself. And I think we’re going to see that in a whole variety of aspects of IT, that people will find it easier to take what vendors have to offer unless they are really sophisticated and really on the cutting edge.

Bill Ladd:

On one level, it’s great that it draws … The hype cycle is great, it draws a lot of attention, maybe some investment dollars into technologies. On the flip side, it creates unrealistic expectations and unrealistic time frames that you have to manage through. I think Tom did a great job of talking about where you start with proof of concepts, that it’s so easy to underestimate the process of organizational change that’s required. You can have a great algorithm, but how does it fit into your organization? How does it fit into your technology stack? How does it fit into the way that you do business? Those are all non-trivial problems. Coming up with a great piece of AI technology doesn’t solve those problems.

Traditional approaches to intelligence is the massive collection of content, a human sifting through that content and synthesizing their findings and writing reports, and then other humans consuming those reports. What we do as a company is to essentially try to automate as much of that as possible. We have a massive data collection infrastructure at the level of some nation-states. We have a massive NLP infrastructure that basically organizes that data into entities and events that defines what things we think are interesting from a risk perspective or a threat perspective about those entities, and summarizes that key information on a single Intelligence Card. Updated in real time, we have hundreds of millions of these dynamically generated reports that are available to intel analysts. Depending on what they do, that automated summary may be all that they need. In other cases, it’s the starting point for a deeper research project. So at a high level, what we’re doing is, we’re augmenting those analysts, intel analysts, with a massive collection and sifting and prioritization engine.

Now, we’re not offering the easy button. Again, it’s an augmentation issue. I’m not telling you that you’re going to be attacked by this threat actor on this day and that you need to do this to remediate it. That would be great, but we’re essentially automating a massive amount of what humans typically have done in intelligence.

Dave Bittner:

That was Bill Ladd, chief data scientist at Recorded Future. Our thanks to him for joining us.

And special thanks to Thomas H. Davenport for joining us. His latest book is “The AI Advantage: How to Put the Artificial Intelligence Revolution to Work (Management on the Cutting Edge).”

Don’t forget to sign up for the Recorded Future Cyber Daily email, where every day you’ll receive the top results for trending technical indicators that are crossing the web, cyber news, targeted industries, threat actors, exploited vulnerabilities, malware, suspicious IP addresses, and much more. You can find that at recordedfuture.com/intel.

We hope you’ve enjoyed the show and that you’ll subscribe and help spread the word among your colleagues and online. The Recorded Future podcast team includes Coordinating Producer Amanda McKeon, Executive Producer Greg Barrette. The show is produced by Pratt Street Media, with Editor John Petrik, Executive Producer Peter Kilpe, and I’m Dave Bittner.

Thanks for listening.