WHOISThe Potential Impact of GDPR on Security Research

There has been a lot of discussion recently around how the introduction of the GDPR in the EU is going to diminish the value of WHOIS data in security research. On the one hand, security researchers are claiming that WHOIS restrictions, especially the restriction of contact information in public records, will leave the internet open to more attacks, while many privacy advocates are claiming that there will be a drop in attacks because data will be less freely available.

Before getting into that argument, it’s important to step back and define some terms. The first is GDPR, or General Data Protection Regulation. GDPR is a set of privacy regulations that take effect on May 25, 2018 across the European Union. These regulations were designed to protect EU citizens’ data and prevent companies and organizations from sharing this information without the user’s consent. WHOIS is both a service and a protocol that provides registration information about a domain name. All ICANN-accredited registrars are required to maintain publicly reachable WHOIS servers that allow anyone interested to query a domain name. For example, if I wanted to know about Recorded Future, I would type, “whois recordedfuture.com” and receive the following results:

Domain Name: RECORDEDFUTURE.COM
Registry Domain ID: 1536003122_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.name.com
Registrar URL: http://www.name.com Updated Date: 2015-11-17T08:51:24Z
Creation Date: 2009-01-04T20:10:36Z
Registrar Registration Expiration Date: 2018-10-02T03:59:59Z
Registrar: Name.com, Inc.
Registrar IANA ID: 625
Reseller:
Domain Status: clientTransferProhibited https://www.icann.org/epp#clientTransferProhibited Registry Registrant ID: Not Available From Registry
Registrant Name: Martin Forssen
Registrant Organization: Recorded Future
Registrant Street: Västra Hamngatan 24
Registrant City: Göteborg
Registrant State/Province: n/a
Registrant Postal Code: 41117
Registrant Country: SE
Registrant Phone: +46.760252357
Registrant Email: operations@recordedfuture.com
Registry Admin ID: Not Available From Registry
Admin Name: Martin Forssen
Admin Organization: Recorded Future
Admin Street: Västra Hamngatan 24
Admin City: Göteborg
Admin State/Province: n/a
Admin Postal Code: 41117
Admin Country: SE
Admin Phone: +46.760252357
Admin Email: operations@recordedfuture.com
Registry Tech ID: Not Available From Registry
Tech Name: Martin Forssen
Tech Organization: Recorded Future
Tech Street: Västra Hamngatan 24
Tech City: Göteborg
Tech State/Province: n/a
Tech Postal Code: 41117
Tech Country: SE
Tech Phone: +46.760252357
Tech Email: operations@recordedfuture.com
Name Server: leah.ns.cloudflare.com
Name Server: hugh.ns.cloudflare.com
DNSSEC: signedDelegation
Registrar Abuse Contact Email: abuse@name.com
Registrar Abuse Contact Phone: +1.7203101849

I now know who the domain owner is, how to contact them, who their domain registrar is, what their authoritative DNS servers are, and more. WHOIS is a powerful tool that security researchers have used over the years to connect domains to threat actors and track down various attack groups. However, WHOIS data, as it is currently presented, is in violation of GDPR regulations. This is an important problem, because even though the European Union only accounts for about 1/14 of the world’s population, many domain registrars will not want to run the risk of running afoul of GDPR regulations, so they will apply the same registration standards to all of their customers, irrespective of where that customer is located.

The reason that security researchers have had success with using WHOIS data is twofold. The first is that many attackers, even sophisticated attackers, have had poor OPSEC (operational security) when it comes to registering domains. A surprisingly large number of attackers simply didn’t bother to take steps to hide their identity or create unique, fake aliases when registering domains. Spammers were notoriously bad at hiding their identity, making it easy to identify newly registered domains as belonging to a known spammer and immediately adding it to block lists and real-time blackhole lists (RBLs).

Secondly, even attackers who realized their mistake and changed the WHOIS information of their domains after the fact, or added privacy services to the domain, were out of luck because certain companies, such as DomainTools, keep track of historical WHOIS data, and researchers could easily see the original WHOIS information.

No one is exactly sure what WHOIS information will look like post-GDPR on May 25, but some TLD owners and registrars have been restricting WHOIS information in anticipation of GDPR. The TLD “.eu” was one of the first to restrict public WHOIS information. So, for example, WHOIS data for the domain name worldstandards.eu looks like this:

Domain: worldstandards.eu

Registrant:
NOT DISCLOSED!
Visit www.eurid.eu for webbased whois.Onsite(s):
NOT DISCLOSED!
Visit www.eurid.eu for webbased whois.Registrar:
Name: AXC
Website: http://www.axc.eu/

Name servers:
ns1.mediawax.be
ns2.mediawax.be
ns3.mediawax.be

Please visit www.eurid.eu for more info.

As you can see, there is significantly less information presented. It is basically the domain name, the registrar, and the name servers. That presents a stark contrast to standard WHOIS output and is the reason why so many people in the security community are panicking. It should be noted that, for now, researchers can pull the full record for .eu domains by going to www.eurid.eu and using their WebWHOIS feature.

The .eu TLD registrar is not the only one restricting access to information ahead of the GDPR. Both GoDaddy and Google have started restricting access to WHOIS information in a manner they deem compliant with GDPR.

This type of data restriction is actually a positive for those who register domains. Just as security researchers use WHOIS data to track down bad domains and try to tie those that are suspicious to ongoing campaigns or groups, attackers harvest WHOIS information for spamming and phishing attacks, as anyone who has ever registered a domain and been immediately inundated with spam or physical letters from unscrupulous registrars can attest.

From a security perspective, researchers will no longer be able to use public information, such as contact details or email addresses, to tie domains to groups and connect different domains to campaigns, or even specific threat actors. Again, while this method has become less effective in recent years, it is still used by researchers from time to time.

Who Is Right?

Again, no one really knows what is going to happen today. ICANN originally put forth a “cookbook” that they hoped would be seen as sufficient for GDPR compliance. Unfortunately, many of ICANN’s suggestions were roundly rejected by the EU’s Article 29 Working Party, meaning there is a lot of confusion with GDPR now being the law of the EU, which will impact the rest of the world.

In response to the Article 29 Working Party comments, ICANN introduced a draft for a new Temporary Specification for gTLD Registration Data on May 14. Given the arrival of GDPR, this document will serve as the guidance for all registrars who register gTLD domains. With no time left to implement the requirements outlined in the document, many registrars who have not taken action are scrambling to put something in place.

However, there is a strong argument to be made that WHOIS data has become less useful as a security research tool over the past few years. Lax data collection, poor enforcement of agreements by ICANN, and low-cost registrars undercutting prices has meant that registrars rarely force new domain registrants to use accurate data. This means that WHOIS data has become significantly less reliable over time when it comes to identifying a domain owner. Occasionally, there are still successful finds when mining WHOIS data, but those successes are further and further apart. Take, for example, the domain facebook-cdn.net associated with the group APT 32/OceanLotus. Here is the WHOIS output:

Domain Name: FACEBOOK-CDN.NET
Registry Domain ID: 2027877841_DOMAIN_NET-VRSN
Registrar WHOIS Server: whois.ilovewww.com
Registrar URL: http://www.ilovewww.com Updated Date: 2017-10-06T05:14:42Z
Creation Date: 2016-05-13T05:51:57Z
Registrar Registration Expiration Date: 2018-05-13T05:51:57Z
Registrar: Shinjiru MSC Sdn Bhd
Registrar IANA ID: 1741
Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited Registry Registrant ID: Not Available From Registry
Registrant Name: Domain Admin
Registrant Organization: Privacy Protect, LLC (PrivacyProtect.org)
Registrant Street: 10 Corporate Drive
Registrant City: Burlington
Registrant State/Province: MA
Registrant Postal Code: 01803
Registrant Country: US
Registrant Phone: +1.8022274003
Registrant Phone Ext:
Registrant Fax:
Registrant Fax Ext:
Registrant Email: contact@privacyprotect.org
Registry Admin ID: Not Available From Registry
Admin Name: Domain Admin
Admin Organization: Privacy Protect, LLC (PrivacyProtect.org)
Admin Street: 10 Corporate Drive
Admin City: Burlington
Admin State/Province: MA
Admin Postal Code: 01803
Admin Country: US
Admin Phone: +1.8022274003
Admin Phone Ext:
Admin Fax:
Admin Fax Ext:
Admin Email: contact@privacyprotect.org
Registry Tech ID: Not Available From Registry
Tech Name: Domain Admin
Tech Organization: Privacy Protect, LLC (PrivacyProtect.org)
Tech Street: 10 Corporate Drive
Tech City: Burlington
Tech State/Province: MA
Tech Postal Code: 01803
Tech Country: US
Tech Phone: +1.8022274003
Tech Phone Ext:
Tech Fax:
Tech Fax Ext:
Tech Email: contact@privacyprotect.org
Name Server: ns3.facebook-cdn.net
Name Server: ns4.facebook-cdn.net
DNSSEC: Unsigned

As you can see, the domain has privacy protection enabled, making it impossible to know anything about who registered the domain or the registrant’s contact information from WHOIS data, so WHOIS information appears to be useless to us, in this case.

However, there is still a good deal of information that is very helpful to security researchers. Among the things that security researchers look for is the registration date — is this a new domain, or one that has been around for a while? In this case, we know the domain was first registered May 13, 2016 and information about the domain was updated on October 6, 2017. We also know that the domain was registered through Shinjiru International Inc., a domain registrar based in China that is primarily (though not always) used by Chinese citizens. We also know that the two authoritative servers for the domain are ns3.facebook-cdn.net and ns4.facebook-cdn.net. All of this information is surprisingly useful to a security researcher and will all still be available even now that GDPR is the law of the land.

In addition to the rise in the use of domain privacy, there has been significant increase in the number of country code domains (ccTLDs). While managers of ccTLDs often work with ICANN, they are not under any obligation to do so. As you can imagine, this leads to highly variable data quality, depending on the ccTLD. For example, the .gq ccTLD is notoriously spam-ridden, landing it in SPAMHAUS’s top 10 most abused TLDs. By the estimation of the team SPAMHAUS, there are 210,000 registered .gq domains, of which 78,500, or 37 percent, are malicious.

The .gq ccTLD is also very loose with required registrant information it collects from new domain registrations. Using the domain blast.gq, which resides on multiple spam RBLs, as an example:

Domain name:
BLAST.GQ

Organisation:
Equatorial Guinea Domains B.V.
Dominio GQ administrator
P.O. Box 11774
1001 GT Amsterdam
Netherlands
Phone: +31 20 5315725
Fax: +31 20 5315721
E-mail: abuse: abuse@freenom.com, copyright infringement: copyright@freenom.com

Domain Nameservers:
NS1.PYX.HOST
NS2.PYX.HOST

There is not a lot of useful information available via WHOIS for security researchers to investigate. The thing is, many security researchers have watched the degradation of the quality of WHOIS information over the last few years and have been less reliant on it than in the past. Instead, researchers focus on publicly available domain observables, such as how recently the domain was registered, the registrar, and the name servers used, as well as tracking how often the IP address associated with various subdomains changes in a short period of time. It is certainly more work, but there are a number of automated systems and services that make this data available for those serious about tracking down bad domains.

There is a push from the security community to try to carve out an exception for security researchers to still have access to WHOIS data. There is no guarantee that this will happen, but it would allow the security community to continue using WHOIS to identify cybercriminals. Remember, registrars affiliated with ICANN are still required to collect the same data, they are just no longer allowed to display it because of GDPR. Since the data is there, those with the right access, such as law enforcement, will be able to get it when needed. There is also a strong case that security researchers, who contribute to a stable internet, will want that access as well. This could have a unique side effect — malicious domain registrants, thinking their data will be hidden, may be less likely to use fake information, or more likely to reuse the same fake information across multiple domains.

The other good news is that the attention to this issue has really shined a light on what a mess the WHOIS system is and has been for a while. There are number of ICANN working groups (full disclosure: I am a member of one of these groups) that are working to improve the standardization of WHOIS data across all TLDs and registrars to ensure that whatever data is allowed to be displayed under GDPR is consistently collected and displayed.

Security researchers are already working around a broken WHOIS system, and the advent of GDPR will not change that. There is still a lot of valuable data to collect and intelligence to gather in even the most basic WHOIS output — the trick is knowing what to do with the collected data and thinking long term, versus a single domain. For example, many companies are relying on passive DNS as well as registrar and name server reputation to identify potentially bad domains. Companies like Farsight Security and Google’s VirusTotal’s passive DNS collection provides a great deal of information about bad domains and allow researchers to connect different bad domains to each other and work to tie them to a specific actor.

Many people are bemoaning that GDPR will be the death of WHOIS, but WHOIS has been on life support for a very long time. Given that there are more effective and accurate ways to hunt threat actors using DNS, maybe it is time to let WHOIS die.

WHOIS: The Potential Impact of GDPR on Security Research

Who Is Right?