Enabling OSINT in Activity-Based Intelligence (ABI)
August 31, 2016 • Lauren Zabierek
Activity-based intelligence, or ABI, is an intelligence methodology developed out of the wars in Iraq and Afghanistan used to discover and disambiguate entities (e.g., people of interest) in an increasingly data-rich environment (most of it unclassified and open source). It is geospatial in nature, because it seeks to link entities and events through their locations, rather than by text.
ABI has four main ideas — or pillars — which form the basis of how to understand and use data to discover unknowns.
In their ground-breaking book, Activity-Based Intelligence: Principles and Applications, Vencore Director of Analytics, Patrick Biltgen and my good friend and former colleague, Stephen Ryan summarize the four pillars as follows:
Georeference to Discover: focusing on spatially and temporally correlating multi-INT data to discover key entities and events; Data Neutrality: the premise that all data may be relevant regardless of the source from which it was obtained; Sequence Neutrality: understanding that we have the answers in the data collected at any time to many questions we do not yet know to ask; and Integration Before Exploitation: correlating data as early as possible, rather than relying on vetted, finished products (from single INT data), because seemingly insignificant events in a single INT maybe be important when integrated across multiple INTs.
In his keynote speech at GEOINT 2016, the director of NGA, Robert Cardillo, stated that his challenge to NGA is to succeed in the open. Mr. Cardillo also called for the rejection of “outdated ideas about the value of open source data.” ABI analysts have long rejected those ideas and demanded better access to OSINT because we adhere to the pillar of Data Neutrality.
We KNOW that the web offers a wealth of information, but heretofore, its size and scale presented a number of challenges to an analyst, namely that data from the web is unstructured, vast, and lacks context, making it difficult to collect and process. After overcoming the issue of accessing the world wide web safely, the next question we faced was, “where do I even start?”
In this blog post, I will focus on how Recorded Future complements the Data Neutrality pillar through structured open source intelligence, or OSINT.
How Recorded Future Structures the Web
Recorded Future is inherently data neutral, as we value the intelligence that we glean from the breadth of our coverage. Our intelligence engine harvests data from over 750,000 (and growing) sources of data — all unstructured text — in the open, deep, and dark web.
This data is then given structure by the automated creation and recognition of entities and events — terms all ABI analysts understand — which can be anything that we want to discover, understand, and resolve.
Of note, in Recorded Future, these terms are broader than in the traditional ABI lexicon, as they include proxies, locations, and transactions (such as Twitter handles, threat actor groups, or locations in the geopolitical realm as well as things like IP addresses connected to domains, phishing emails delivering malware, and exploits in the cyber domain).
When Recorded Future ingests a reference from the web (e.g., something that somebody posts on the internet, whether via Twitter, information security blog, or forum) it catalogs that data point around the entities and/or events. We accomplish this through machine learning and natural language processing — meaning that collection and processing of data is automated.
What this does is not insignificant; first, it takes the burden of collection and processing of data off of the analyst (which I can tell you from experience can take an inordinate amount of time and bandwidth). Second, it creates an ever-increasing pool of data points of which an analyst can query for and be alerted to specific information. Queries like these might include:
- “Give me all domains ever used with X piece of malware.”
- “Show me all tweets with negative sentiment within a one-kilometer radius of X location.”
- “Show me all tweets and foreign news reports about X location.”
- “What are the latest zero-day exploits being discussed in criminal forums.”
Finding this information quickly, persistently, and comprehensively through traditional internet search engines or from a handful of favorite open source sites is nearly impossible.
Recorded Future mitigates this challenge for the analyst, enabling access to the wealth of information safely and efficiently, through cloud technology, data encryption, two-factor authentication, and decoupling user information.
This means that an analyst can be on the unclassified web — a must for truly utilizing OSINT’s potential — and do so comfortably knowing that one’s presence and searches are protected.
Dark Web Sources
Let’s not gloss over Recorded Future’s coverage of deep and dark web sources.
There is a wealth of information in these areas (such as black markets and criminal forums) that any standard internet search engine, or OSINT analyst for that matter, is unable to access. The “chatter” on these sites holds myriad clues for analysts that could potentially connect the dots in a variety of intelligence issues. To then enable analysts access to this kind of information without having to actually go to these sites is nothing short of revolutionary.
You might be thinking, what if these sites are in foreign languages?
Recorded Future has you covered with our natural language processing, or NLP. Currently, we natively process data in seven languages — English, Spanish, French, Russian, Farsi, Chinese, and Arabic — with two more languages on the horizon. This means that Recorded Future understands what is being discussed and can pick up threat information in these languages.
Furthermore, we provide a mechanism for in-platform translation, so if you see a reference written in Chinese, you don’t have to go out to Google Translate, you can simply click the Translate button right from within the platform.
In a nod to Sequence Neutrality, where the answers to our intelligence questions might be held in the data we collected previously, Recorded Future maintains a repository of six years’ worth of data. This allows an analyst to query historical data when another data point leads him or her there, and potentially find the key to unlock previously unanswered questions.
Finally, in response to Mr. Cardillo’s challenge to the companies showing at GEOINT to offer more trial accounts and API keys, Recorded Future provides no-cost “pilots” for prospective clients and the ability to purchase an API token to pull in data.
How would this look? In the most traditional interpretation, structured, georeferenced data would be pulled from Recorded Future’s data repository and incorporated into a single GIS framework — such as ArcGIS — for correlation with data from other “INTs.”
This sort of access to all parts of the web that I have described above has never before been possible, which is why I am so excited about this technology.
Fortunately, I was able to represent Recorded Future at GEOINT 2016 and explain to GEOINT officers how our technology enables ABI analysis.
ABI requires access to all available sources of data; access to OSINT is a mandate for today’s threat intelligence capability. Analysts must be able to observe human activities, networks and relationships, and events and transactions across all domains of the operational environment. Recorded Future is an enabling technology — one that provides analysts the access to structured data on the open, deep, and dark web. Indeed, as those outside of government begin to understand this methodology, the technologies that enable analysts such as those developed by Recorded Future will be key to success across the industry.
The recently announced partnership between Recorded Future and Vencore will “leverage the OSINT collection capabilities of Recorded Future in support of Vencore’s mission to support and integrate technologies, tools, and data sources” in support of activity-based intelligence and other advanced analytics like object based production, or OBP.
Stay tuned for my next blog post about how Recorded Future complements OBP!