What Is Open Source Intelligence (OSINT)?

Open Source Intelligence (OSINT) is the practice of gathering, analyzing, and disseminating information from publicly available sources to address specific intelligence requirements.

Of all the threat intelligence subtypes, open source intelligence (OSINT) is perhaps the most widely used, which makes sense. After all, it’s mostly free, and who can say no to that?

Unfortunately, much like the other major subtypes — human intelligence, signals intelligence, and geospatial intelligence, to name a few — open source intelligence is widely misunderstood and misused.

This widespread use and the growing sophistication of OSINT are reflected in market projections. In fact, according to a report by Future Market Insikt, the OSINT industry is predicted to reach a staggering $58 billion by 2033, highlighting its increasing importance and integration into various sectors.

If you have ever asked yourself, 'What does OSINT stand for?', 'What is open source intelligence (OSINT)?', or 'What does OSINT mean?', keep reading to find out.

In this blog post, we’re going to cover the fundamentals of Open Source Intelligence, or OSINT, including how it’s used, and the tools and techniques that can be employed to gather and analyze it. We’ll delve into what is OSINT, exploring the OSINT meaning and the significant role it plays in the realm of intelligence gathering.

Key Takeaways

Open source intelligence is derived from data and information that is available to the general public. It’s not limited to what can be found using Google, although the so-called “surface web” is an important component.
As valuable as open source intelligence can be, information overload is a real concern. Most of the tools and techniques used to conduct open source intelligence initiatives are designed to help security professionals (or threat actors) focus their efforts on specific areas of interest.
There is a dark side to open source intelligence: anything that can be found by security professionals can also be found (and used) by threat actors.
Having a clear strategy and framework in place for open source intelligence gathering is essential — simply looking for anything that could be interesting or useful will inevitably lead to burnout.
Using OSINT tools to discover and protect sensitive data from potential attackers is crucial to reduce the risk of cybersecurity threats.

OSINT Definition

Before we look at common sources and applications of open source intelligence, it’s important to understand what it actually is.

According to U.S. public law, open source intelligence:

Is produced from publicly available information
Is collected, analyzed, and disseminated in a timely manner to an appropriate audience
Addresses a specific intelligence requirement

The important phrase to focus on here is “publicly available.”

The term “open source” refers specifically to information that is available for public consumption. If any specialist skills, tools, or techniques are required to access a piece of information, it can’t reasonably be considered open source.

What is OSINT? (Open Source Intelligence)

Crucially, open source information is not limited to what you can find using the major search engines. Web pages and other resources that can be found using Google certainly constitute massive sources of open source information, but they are far from the only sources.

For starters, a huge proportion of the internet (over 99 percent, according to former Google CEO Eric Schmidt) cannot be found using the major search engines. This so-called “deep web” is a mass of websites, databases, files, and more that (for a variety of reasons, including the presence of login pages or paywalls) cannot be indexed by Google, Bing, Yahoo, or any other search engine you care to think of. Despite this, much of the content of the deep web can be considered open source because it’s readily available to the public.

OSINT tools can be used to access and analyze information from sources beyond traditional search engines. These tools, such as Spiderfoot, searchcode, Searx, Twint, and Metagoofil, gather and analyze massive amounts of data from public and open sources, including social media networks and the deep web, to discover and store large quantities of data, find links and patterns among different pieces of information, and collate discovered information into actionable intelligence.

In addition, there’s plenty of freely accessible information online that can be found using online tools other than traditional search engines. We’ll look at this more later on, but as a simple example, tools like SecurityTrails and others can be used to find IP addresses, networks, open ports, webcams, printers, and pretty much anything else that’s connected to the internet.

Information can also be considered open source if it is:

Published or broadcast for a public audience (for example, news media content)
Available to the public by request (for example, census data)
Available to the public by subscription or purchase (for example, industry journals)
Could be seen or heard by any casual observer
Made available at a meeting open to the public
Obtained by visiting any place or attending any event that is open to the public

At this point, you’re probably thinking, “Man, that’s a lot of information …”

And you’re right. We’re talking about a truly unimaginable quantity of information that is growing at a far higher rate than anybody could ever hope to keep up with. Even if we narrow the field down to a single source of information — let’s say Twitter — we’re forced to cope with hundreds of millions of new data points every day.

This, as you’ve probably gathered, is the inherent trade-off of open source intelligence.

History of OSINT

The term OSINT refers to the practice of collecting information from publicly available sources to be used in an intelligence context. This practice has been around for a while, but it’s the digital era that really propelled OSINT into a league of its own.

The foundational roots of OSINT are linked to the formation of the Foreign Broadcast Monitoring Service (FBMS) back in 1941. This organization was charged with the significant role of scrutinizing international broadcast communications to identify any potential dubious activities.

According to the Association of Former Intelligence Officers (AFIO): “The US military first coined the term OSINT in the late 1980s”. This development stemmed from the recognition that intelligence reform was needed to effectively meet the rapidly changing informational demands, particularly at the tactical battlefield level.

Initially, OSINT was a tool used primarily by intelligence agencies and law enforcement to gather publicly available information to assist in national security and cybercriminal investigations. The methods involved were time-consuming, often requiring individuals to manually sift through public records, newspapers, and other documents to find relevant information. This manual gathering of information was often difficult due to the vast amount of data one had to sift through. However, as the web evolved, so did the methods of collecting and analyzing publicly available data.

The advent of the internet significantly amplified the means through which information could be collected. Suddenly, a wide range of data became publicly available and easily accessible online, from government reports to academic papers, and everything in-between. Websites became a primary data source for OSINT practitioners. These developments led to a boom in the creation of OSINT tools designed to automate the process of data collection and analysis. These tools could quickly gather information from various sources, including publicly accessible databases, social media platforms, and many other tools available in the digital realm.

As the field of cybersecurity experienced rapid growth, the application of OSINT expanded. Cybersecurity professionals began to see the value in utilizing OSINT to identify vulnerabilities, assess potential security threats, and bolster organizational security. OSINT tools became essential in monitoring publicly accessible web servers, analyzing metadata, and assessing security vulnerabilities. The data gathered became invaluable in understanding the security posture of an organization, identifying potential threats, and developing strategies to mitigate risks.

One notable trend is the integration of machine learning and analytics in OSINT tools to enhance the process of identifying patterns and trends from the collected data. This integration has not only made OSINT tools more effective but has also expanded the range of applications in which they can be utilized. For instance, security researchers now use OSINT to perform penetration tests, while businesses use it to gain insights into their competitors and the market environment.

Moreover, the community of OSINT researchers has grown over time, with forums, conferences, and groups forming to share knowledge, discuss best practices, and develop new OSINT techniques. Many OSINT tools, including frameworks like Recon-ng, have communities of developers on platforms like GitHub, working to improve, customize, and create modules to extend the capabilities of these tools. The collective effort of these communities has played a significant role in refining the OSINT practice, making it a vital component in intelligence gathering and cybersecurity.

How Open Source Intelligence Works

Now that we’ve covered the basics of open source intelligence, we can look at how it is commonly used for cybersecurity. The intelligence community plays a crucial role in utilizing OSINT for national security and cybersecurity efforts.

How is Open Source Intelligence Used?

There are two common use cases:

1. Ethical Hacking and Penetration Testing

Security professionals use open source intelligence to identify potential weaknesses in friendly networks so that they can be remediated before they are exploited by threat actors. Commonly found weaknesses include:

Accidental leaks of sensitive data, like through social media. Identifying and protecting sensitive data is crucial to reduce the risk of cybersecurity threats.
Open ports or unsecured internet-connected devices
Unpatched software, such as websites running old versions of common CMS products
Leaked or exposed assets, such as proprietary code on pastebins

2. Identifying External Threats

As we’ve discussed many times in the past, the internet is an excellent source of insights into an organization’s most pressing emerging threats. From identifying which new vulnerabilities are being actively exploited to intercepting threat actor “chatter” about an upcoming attack, open source intelligence enables security professionals to prioritize their time and resources to address the most significant current threats.

In most cases, this type of work requires an analyst to identify and correlate multiple data points to validate a threat before action is taken. OSINT tools are used to gather and analyze data from public and open sources, including social media networks and the deep web, to identify and correlate multiple data points for threat validation. For example, while a single threatening tweet may not be cause for concern, that same tweet would be viewed in a different light if it were tied to a threat group known to be active in a specific industry.

One of the most important things to understand about open source intelligence is that it is often used in combination with other intelligence subtypes. Intelligence from closed sources such as internal telemetry, closed dark web communities, and external intelligence-sharing communities is regularly used to filter and verify open source intelligence. There are a variety of tools available to help analysts perform these functions, which we’ll look at a bit later on.

Open Source Intelligence and Cybersecurity

At this point, it’s time to address the second major issue with open source intelligence: if something is readily available to intelligence analysts, it’s also readily available to threat actors.

Threat actors use open source intelligence tools and techniques to identify potential targets and exploit weaknesses in target networks. Once a vulnerability is identified, it is often an extremely quick and simple process to exploit it and achieve a variety of malicious objectives. They also seek out sensitive data that can be exploited for malicious purposes, such as launching targeted attacks or selling the information on the dark web.

This process is the main reason why so many small and medium-sized enterprises get hacked each year. It isn’t because threat groups specifically take an interest in them, but rather because vulnerabilities in their network or website architecture are found using simple open source intelligence techniques. In short, they are easy targets.

And open source intelligence doesn’t only enable technical attacks on IT systems and networks. Different types of threat actors also seek out information about individuals and organizations that can be used to inform sophisticated social engineering campaigns using phishing (email), vishing (phone or voicemail), and SMiShing (SMS). Often, seemingly innocuous information shared through social networks and blogs can be used to develop highly convincing social engineering campaigns, which in turn are used to trick well-meaning users into compromising their organization’s network or assets.

This is why using open source intelligence for security purposes is so important — It gives you an opportunity to find and fix weaknesses in your organization’s network and remove sensitive information before a threat actor uses the same tools and techniques to exploit them.

Open Source Intelligence Techniques

Now that we’ve covered the uses of open source intelligence (both good and bad) it’s time to look at some of the techniques that can be used to gather and process open source information.

OSINT Techniques

First, you must have a clear strategy and framework in place for acquiring and using open source intelligence. It’s not recommended to approach open source intelligence from the perspective of finding anything and everything that might be interesting or useful — as we’ve already discussed, the sheer volume of information available through open sources will simply overwhelm you. Instead, you must know exactly what you’re trying to achieve — for example, to identify and remediate weaknesses in your network — and focus your energies specifically on accomplishing those goals.

Second, you must identify a set of tools and techniques for collecting and processing open source information. Once again, the volume of information available is much too great for manual processes to be even slightly effective.

Passive vs. Active OSINT

Broadly speaking, the collection of open source intelligence falls into two categories: passive collection and active collection.

Passive collection often involves the use of threat intelligence platforms (TIPs) to combine a variety of threat feeds into a single, easily accessible location. While this is a major step up from manual intelligence harvesting, the risk of information overload is still significant.

More advanced threat intelligence solutions like Recorded Future solve this problem by using artificial intelligence, machine learning, and natural language processing to automate the process of prioritizing and dismissing alerts based on an organization’s specific needs. Additionally, using an OSINT tool can further streamline this process by gathering and analyzing massive amounts of data from public and open sources, including social media networks and the deep web.

In a similar manner, organized threat groups often use botnets to collect valuable information using techniques like traffic sniffing and keylogging.

Active collection is the use of a variety of techniques to search for specific insights or information. For security professionals, this type of collection work is usually done for one of two reasons:

A passively collected alert has highlighted a potential threat and further insight is required.
The focus of an intelligence gathering exercise is very specific, such as a penetration testing exercise.

As we’ve already explained, one of the biggest issues facing security professionals is the regularity with which normal, well-meaning users accidentally leave sensitive assets and information exposed to the internet. There are a series of advanced search functions called “Google dork” queries that can be used to identify the information and assets they expose.

Google dork queries are based on the search operators used by IT professionals and hackers on a daily basis to conduct their work. Common examples include “filetype:”, which narrows search results to a specific file type, and “site:”, which only returns results from a specified website or domain.

The Public Intelligence website offers a more thorough rundown of Google dork queries, in which they give the following example search:

“sensitive but unclassified” filetype:pdf site:publicintelligence.net

If you type this search term into a search engine, it returns only PDF documents from the Public Intelligence website that contain the words “sensitive but unclassified” somewhere in the document text. As you can imagine, with hundreds of commands at their disposal, security professionals and threat actors can use similar techniques to search for almost anything.

Start With the End in Mind

The most important factor in the success of any open source intelligence initiative is the presence of a clear strategy — once you know what you’re trying to accomplish and you’ve set objectives accordingly, identifying the most useful tools and techniques will be much more achievable.

To learn more about how Recorded Future can help organizations better understand and prevent threats, request a personalized demo today.

FAQs

Is OSINT legal?

Yes, OSINT is generally legal since it involves gathering information that is publicly available. However, the legality can become a gray area depending on how the information is used or if attempts are made to access restricted or private data under the guise of OSINT. It is crucial to ensure that while gathering information, sensitive data is protected to prevent it from being exploited by attackers.

What is the dark side of OSINT?

The dark side of OSINT emerges when sensitive data is gathered and misused for malicious purposes. This can include stalking, harassment, doxxing, or even planning cyber-attacks. While OSINT itself is a valuable tool for many positive uses, like any tool, it can be misused in the wrong hands.

How is OSINT utilized to gather intelligence from the dark web, and what precautions should be taken?

OSINT is employed to uncover hidden services and forums on the dark web, aiding in cybercrime investigations or threat detection. The use of OSINT tools, such as Spiderfoot and Twint, is crucial in uncovering and analyzing data from these hidden sources. However, venturing into the dark web requires precautions:

Legal Compliance: Ensure actions comply with legal and ethical guidelines.
Privacy Protection: Avoid infringing on individual privacy.
Secure Browsing: Use tools like Tor for secure browsing.
Malware Protection: Employ robust malware protection to guard against malicious software.
Data Verification: Verify the accuracy of the information collected through multiple sources.

These measures help ensure the safe and lawful use of OSINT in gathering intelligence from the dark web.

Is OSINT ethical?

Yes, OSINT (Open Source Intelligence) can be ethical when it involves passively collecting and analyzing publicly available information for legitimate purposes like cybersecurity, research, or journalism, while respecting privacy and legal boundaries. However, ethical concerns arise with practices like aggressive data aggregation for profiling, deception, bypassing privacy settings, using information for harm (doxxing, harassment), and spreading unverified information. Responsible OSINT practitioners prioritize transparency (where appropriate), minimize data collection, verify sources, avoid harm, comply with laws, and document their methods to ensure ethical and legal conduct.

What kind of information is considered OSINT? What are the sources?

OSINT encompasses a wide array of publicly accessible information, including news media, social media, public records (like court filings and business registries), government publications, academic research, commercial data (financial reports, satellite imagery), and even the deep and dark web. These sources can provide diverse data points such as text, images, videos, metadata, and network information, all legally and ethically obtained for analysis and intelligence gathering.

Who uses OSINT?

OSINT is utilized by a diverse range of professionals, including cybersecurity analysts, threat intelligence teams, law enforcement agencies, national security organizations, journalists, market researchers, competitive intelligence analysts, and even individual investigators. The sources they leverage encompass publicly available information such as news media, social media platforms, government publications, public records, academic research, commercial databases, online forums, and the deep and dark web, all accessible to the public.

This article was originally published February 19, 2019, and last updated on Jun 24, 2024.