Sources of Threat Data (and Why They Don’t Equal Intelligence)
As sources and access to data grows exponentially, so do the opportunities to derive intelligence. But, with so many sources and so much data, this is hard to do manually. Not only do you have to collect this data from the right sources, which can take a great deal of effort to identify, you must also have the resources and expertise to analyze it.
Sources of Threat Data
A lot of the time, the term “threat intelligence” is used to describe the sources of all this data, but in reality they’re simply origins of data that must be processed before they can be considered intelligence.
To illustrate this point, consider a large stack of reconnaissance photos. Once they have been reviewed by an expert, analyzed, and used to tell a story, they can be considered intelligence. But until then? It’s just a large stack of reconnaissance photos.
This list, which is far from exhaustive, broadly defines the available sources of threat data. Note, once again, that at this stage we call it threat data, not threat intelligence.
This type of data is available in huge quantities, often for free. Due to its binary nature, integrating it with existing security technologies is easy, although a great deal of further analysis will be needed to derive real context. These sources present a high chance of false positives, and results are frequently outdated. Some examples are threat lists, spam, malware, and malicious infrastructure.
These sources often provide useful indicators of new and emerging threats, but it will prove hard to connect with relevant technical indicators to measure genuine risk. Some examples are news, information security sites, vendor research, blogs, and vulnerability disclosures.
There’s undoubtedly masses of potentially useful data on offer from social media channels, but it’s hard to determine false positives and misinformation. Typically, you’ll find many references to the same threats and tactics, which can place a heavy burden on human analysts. Some examples are Twitter and Facebook.
Because these channels are specifically designed to host relevant discussions, they are a potentially valuable source of threat information. With that said, you’ll still need to spend time on collection and analysis to identify what is truly valuable. Dark Web
Often the source of very specific tactical and technical threat information, but incredibly hard to access, particularly for the higher-tier criminal communities. Additionally, as many of these communities are non-English speaking, language is often a challenge.
The Winning Combination
Of course, if your goal is to develop a complete picture of your threat landscape, the only route forward is to combine references from various sources of intelligence. But as we’ve already mentioned, many of the sources above routinely present language barriers, which can prove to be a significant hindrance to effective analysis.
Thankfully, advances in machine learning and natural language processing (NLP) mean that with the right technology, references to threats can be rendered language neutral, and therefore analyzed by humans or machines regardless of the original language used. Perhaps it’s even more amazing that we’ve now reached a point where intelligence solutions that incorporate artificial intelligence (AI) components have successfully learned the language of threats, and are able to accurately identify “malicious” terms.
Clearly, this combination of machine learning, NLP, and AI poses a huge opportunity for organizations looking to incorporate threat intelligence. The reduction of analyst workload and removal of language barriers in particular are hugely beneficial, and when combined with the ability to consider multiple data and information sources concurrently to produce genuine threat intelligence, it becomes far easier to build a comprehensible map of the threat landscape.