By this point, you’ve hopefully gotten the message that your personal data can end up exposed in all sorts of unexpected internet backwaters. But increased awareness hasn’t slowed the problem. In fact, it’s only grown bigger—and more confounding.
Last week, security researchers Bob Diachenko and Vinny Troia discovered an unprotected, publicly accessible MongoDB database containing 150 gigabytes-worth of detailed, plaintext marketing data—including 763 million unique email addresses. The pair are going public with their findings today. The trove is not only massive but also unusual; it contains data about individual consumers as well as what appears to be “business intelligence data,” like employee and revenue figures from various companies. This diversity may stem from the information’s source. The database, owned by the “email validation” firm Verifications.io, was taken offline the same day Diachenko reported it to the company.
While you’ve likely never heard of them, validators play a crucial role in the email marketing industry. They don’t send out out marketing emails on their own behalf, or facilitate automated mass email campaigns. Instead, they vet a customer’s mailing list to ensure that the email addresses in it are valid and won’t bounce back. Some email marketing firms offer this mechanism in-house. But fully verifying that an email address works involves sending a message to the address and confirming that it was delivered—essentially spamming people. That means evading protections of internet service providers and platforms like Gmail. (There are less invasive ways to validate email addresses, but they have a tradeoff of false positives.) Mainstream email marketing firms often outsource this work rather than take on the risk of having their infrastructure blacklisted by spam filters, or lowering their online reputation scores.
“Companies have email lists and want to start emailing them, but they’re not sure how valid they are,” says Troia, who founded the firm Night Lion Security. “So they go to a company that will essentially send out spam. Troia speculates, but has not confirmed, that the database may be so large and varied because it comprises all of Verification.io’s customers’ data. WIRED was unable over the course of several days to contact the company or CEO Vlad Strelkov. On Monday, the entire Verifications.io website went offline and has not been restored since.
In general, the 809 million total records in the Verifications.io trove include standard information like names, email addresses, phone numbers, and physical addresses. But many also include things like gender, date of birth, personal mortgage amount, interest rate, Facebook, LinkedIn, and Instagram accounts associated with email addresses, and characterizations of people’s credit scores (like average, above average, and so on). Meanwhile, other records in the collection seem related to generating sales leads at businesses, including company names, annual revenue figures, fax numbers, company websites, and industry identifiers for categorizing companies called “SIC” and “NAIC” codes.
The data doesn’t contain Social Security numbers or credit card numbers, and the only passwords in the database are for Verifications.io’s own infrastructure. Overall, most of the data is publicly available from various sources, but when criminals can get their hands on troves of aggregated data, it makes it much easier for them to run new social engineering scams, or expand their target pool.
In the exposed database, the researchers also found evidence of test email accounts, hundreds of SMTP (email sending) servers, the text of emails, anti-spam evasion infrastructure, keywords to avoid, and IP addresses to blacklist. Diachenko suggests that in the Verifications.io work flow, customers would upload an Excel spreadsheet listing the email addresses to validate, and then Verifications.io would run their tests and return lists of clean addresses and ones that bounced back. It’s possible, given the piecemeal nature of the data and evidence that it was imported from numerous different Excel files, that Verifications.io also retained some or all of the data it received from customers after concluding its email address checks.
The researchers validated samples of the data with companies listed as Verifications.io customers. Troia says his own information appears in the database. WIRED spoke to the proprietor of an email marketing firm who confirmed the validity of a segment of the data. WIRED also checked for four individuals, but did not find them listed. Diachenko and Troia also note that they have no way to know whether anyone discovered and downloaded the Verifications.io data while it was publicly available and fully exposed.
“I have no idea if anyone else accessed this besides us,” Troia says. “But it was definitely out there for anyone to grab.”
‘Another Day on the Internet’
Much remains unknown about the database and Verifications.io, because the company is difficult to track. When the researchers initially contacted the company through a messaging portal on its site to disclose the database exposure, someone responded with an unsigned note. “Thank you for reporting the issue. We appreciate you reaching out and informing us,” the reply said. “This is our company database built with public information, not client data. We were able to quickly secure the database. Goes to show, even with 12 years of experience you can’t let your guard down.”
Much of the data in the database is publicly available, though it’s not clear that all of it is. When the researchers asked in the portal for the name of the owner of the company and the legal name of the company, someone wrote back declining to answer.
It is also unclear where Verifications.io is based. Most of its materials list Boca Raton, Florida, but some of its web assets are registered in California and Delaware. The Verifications.io website lists addresses in Estonia, but some of those matched up with what appear to be a museum and a government building.
Security researcher Troy Hunt is adding the Verifications.io data to his service HaveIBeenPwned, which helps people check whether their data has been compromised in data exposures and breaches. He says that 35 percent of the trove’s 763 million email addresses are new to the HaveIBeenPwned database. The Verifications.io data dump is also the second-largest ever added to HaveIBeenPwned in terms of number of email addresses, after the 773 million in the repository known as Collection 1, which was added earlier this year. Hunt says some of his own information is included in the Verifications.io exposure.
“The main takeaway for me is that this is just another case where someone has my data, and hundreds of millions of other people’s data, and I’ve absolutely no idea how they got it,” Hunt says. “I’d never heard of the company until now and I certainly can’t ever recall consenting to their use of my data. Of course, it’s entirely possible that buried in some other service’s terms and conditions it says they’re allowed to pass my data around in this fashion, but that’s not really consistent with my expectations of how my data should be used.”
As with recent data exposures from the business data aggregator Apollo and the marketing firm Exactis, there’s not a lot you can do to individually protect yourself when vast repositories of data compiled from both public and private sources leak. Check HaveIBeenPwned to see if your data was in the Verifications.io exposure, and continue your general vigilance about using strong, unique passwords, monitoring your financial statements, and giving out your Social Security number as infrequently as possible. But also know that none of those measures provide a full solution to this society-scale problem.
The disjointed nature of the exposed Verifications.io data speaks to the chaotic state of the data industry overall. People’s personal information is shared by massive companies like Facebook, bought and sold by shady marketers, or stolen from data giants and doomed to circulate endlessly in the purgatory of criminal forums. The churn makes it difficult for consumers to control who has their data and where it ends up. As Hunt puts it, “Sadly, it’s just another day on the internet.”