A database with 3.8B phone numbers from Clubhouse is up for sale

350 points3 yearstwitter.com

deliberateJack • 3 years ago

I am selling a database with ten billion phone numbers. 1.25 GB file with each number compressed to a single bit. You can compare the clubhouse database against mine to determine which numbers are not in their set.

Scoundreller • 3 years ago

Knowing which numbers are capable of receiving SMS and which aren't has some value.

Especially in a world of number portability where you can't just say "oh, that's an old number, it must be POTS".

But I guess, here, if a number is from your contact list, it may still be POTS.

But at least you have higher assurance that it's an active user. If you wardial one day, you quickly find out how many numbers never lead to a human for various reasons. In theory, some of these are trap numbers and quickly flag the caller as suspicious, but I doubt it.

rsync • 3 years ago

"Knowing which numbers are capable of receiving SMS and which aren't has some value."

This isn't difficult - I wrote a shell script named "lookup" that will give me background info for any phone number I feed it and tell me what kind of number it is, what carrier it is, who it belongs to, etc.:

  # lookup 415-333-2222

  {"caller_name": {"caller_name": "WIRELESS CALLER", "caller_type": null, "error_code": null}, "country_code": "US", "phone_number": "+14153332222", "national_format": "(415) 333-2222", "carrier": {"mobile_country_code": "311", "mobile_network_code": "489", "name": "Verizon Wireless", "type": "mobile", "error_code": null}, "add_ons": null, "url": "https://lookups.twilio.com/v1/PhoneNumbers/+14153332222?Type=carrier&Type=caller-name"}

... which is very useful since I often send (personal) SMS from the command line and sometimes I need to know if a number can receive it ...

I'm not going to paste the entire script here but the meat of it is:

  /usr/local/bin/curl -X GET "https://lookups.twilio.com/v1/PhoneNumbers/$number?Type=carrier&Type=caller-name" -u $accountsid:$authtoken

... and each lookup costs a penny or a half a penny or something ... I forget ...

EGreg • 3 years ago

How would your script obtain this information though? Relying on twilio?

rospaya • 3 years ago

In some countries mobile phone numbers have a prefix so you know by that.

gsich • 3 years ago

Also some POTS provider will accept SMS and either read it to you, or you can read them in some web portal (or the router possibly).

simfree • 3 years ago

The Local Routing Number provides this value in the USA, and multiple carriers (eg:Twilio) offer daily deactivation reports from the cellular carriers so you can tell which numbers are unroutable.

Scoundreller • 3 years ago

Canada isn't as progressive. Only telecoms can see which telecom a number points to and for the purpose of call-routing only.

fisherjeff • 3 years ago

Great. It’s the weekend and I can theoretically now stop thinking about software, and yet here I am thinking of ways to efficiently compress lists of phone numbers

perihelions • 3 years ago

There was a thread about that last month,

https://news.ycombinator.com/item?id=27549075 ("Sorted Integer Compression")

fisherjeff • 3 years ago

The rabbit hole deepens…

quchen • 3 years ago

Just enumerate them all, if none is missing it's fairly easy to compress. (And 1b per number is really inefficient) ;-)

main = traverse print [1..99999999]

WJW • 3 years ago

The Kolmogorov complexity of the set of all phone numbers is pretty low. All phone numbers with a few missing is also pretty low.

In fact, I now wonder if you can even compress the 3.8b phone number set to less than 1 bit per phone number. It should be pretty doable since a significant chunk of the number space is not valid.

dillondoyle • 3 years ago

But not all numbers are valid? 911. Not all area codes exist.

luckman212 • 3 years ago

What language is that?

WJW • 3 years ago

Haskell

H8crilA • 3 years ago

Presumably all non-american ones are not on your list?

gihtas • 3 years ago

How much?

saiya-jin • 3 years ago

I have even better - for every country, just covering all their operator's prefix and then 99999-9999999 numbers in that range. Definitely the biggest dataset around, and bigger is alwyas better, right?

astatine • 3 years ago

The 3.8B numbers is really meaningless, in isolation. This is the problem of plenty - 10K numbers with a very specific profile might be a lot more valuable. The real worry would be the info on the relationships between the numbers (which number is connected to whom). This leak seems to have a count of relations rather than the actual connections.

axegon_ • 3 years ago

Well the facebook data that was published everywhere earlier this year could hold some value when combined with this one: While the facebook data is somewhat outdated, I'm pretty sure you'd get millions of people with relevant and up to date information.

ttam • 3 years ago

https://twitter.com/UnderTheBreach/status/141888964970820813...

this tweet says it's BS (they validated the japan sample)

PragmaticPulp • 3 years ago

According to the Tweet, the leaker provides a claimed data sample that is a list of phone numbers without any additional information.

A list of 3.8 billion phone numbers that simply exist is useless. The leak would only have value if the numbers were associated with some identifying information.

If it’s really only phone numbers, I wonder if it’s a leak or if someone brute-forced all possible phone numbers against a ClubHouse API that leaked information about whether or not the number existed in their database.

sebmellen • 3 years ago

If Clubhouse can’t detect >3.8B erroneous requests and shut down that API/microservice, that destroys my confidence more than a data breach.

mohanmcgeek • 3 years ago

Clubhouse didn't have 3.8B users.. why would they have 3.8B phone numbers?

This whole thing seems made up.

mcintyre1994 • 3 years ago

jsjohnst • 3 years ago

mm983 • 3 years ago

they didn't "validate" anything, they just opened the csv. also i'd be interested in their take on the second column, that looks like clubhouse's scoring system (which they ran without telling anyone, likely for marketing purposes, according to this* article). if so, you can in fact tell which numbers are more significant than others.

*https://futurezone.at/apps/clubhouse-leakt-38-milliarden-tel...

zinekeller • 3 years ago

Hmm, so the "highest" numbers would be publicly-knowable numbers anyway (because they are the numbers to dial and contact the government/customer service of a private company).

If this is only a list of numbers and their relative popularity, the best you can do is accusation of adultery (and even in that, you could say that you're "popular" because coworkers also store your numbers).

FabianBeiner • 3 years ago

https://zerforschung.org/posts/clubhouse-telefonnummern-en/

koolba • 3 years ago

They should combine it with that zero click remote iMessage bug. That’d be some serious black hat marketing synergy.

anigbrowl • 3 years ago

Enough phone numbers for half the population of the world? Cool story, bro.

I refer here to the aspiring salespeople, not the person reporting it. I suspect this list will be available for free on the dark web within a couple of months. Much as I like to collect interesting data this doesn't seem useful.

paxys • 3 years ago

I wonder how feasible a business model it is to collect all the data from all leaks which make their way to the internet, massage the data a little bit, and sell it as a brand new "hack" of some popular service. You can probably do this a few times a year without a problem.

d110af5ccf • 3 years ago

Why fake a new data leak at all? It's likely to be illegal either way. Depending on the quality of your work I suspect it would be easy to find buyers for aggregated and cross validated data sets on the black market.

For that matter, I have to assume that the shadier businesses silently make use of publicly available leaks. The data is just too valuable to ignore depending on your business model.

mam3 • 3 years ago

Bilions ?? On clubhouse ?

mcintyre1994 • 3 years ago

Clubhouse does the classic “share your contacts with us to find your friends here” thing, but it sounds like they just upload your entire list into their database instead of doing anything remotely privacy aware. I’m mostly curious how much else they uploaded with the numbers - is this name + number + email etc? And if this dump is just numbers, do Clubhouse have the rest somewhere else?

FabianBeiner • 3 years ago

According to the screenshot: All members plus every single number in each of their phone books.

oliv__ • 3 years ago

Even if they had 10M users (which I doubt), at 100 contacts per user that's 1B contacts.

BatteryMountain • 3 years ago

They forgot to "select distinct"?

chovybizzass • 3 years ago

It includes every users' contact list from their phone. So likely damn near everyone on the planet with a cell phone.

coldcode • 3 years ago

Are people really that stupid to give some mobile app company access to their contact list? On iPhone you have to explicitly give permission, I presume on Android as well. I find that hard to believe everyone is doing it.

hdjjhhvvhga • 3 years ago

Many apps will refuse to work if you don't allow access to your contacts, so people just give in and allow it.

Google is the biggest abuser in this area just grabbing all your contacts and linking them to your Google account once you add any Google account (like Gmail or Youtube) to your Android device.

user-the-name • 3 years ago

alisonkisk • 3 years ago

nemothekid • 3 years ago

>Are people really that stupid to give some mobile app company access to their contact list?

Almost every social media startup in the last 15 years was bootstrapped this way.

alpaca128 • 3 years ago

Afaik WhatsApp (on Android at least) requires you giving access to your contacts. So roughly speaking a huge chunk, probably the majority, of smartphone users shared their contact list to at least one company, which strictly speaking might not even be legal in many cases.

After all that's how WhatsApp populates its contact list, it looks which users have each other's phone numbers. That way it doesn't need a user login and friend/contact requests, but in return you give up your privacy.

wngr • 3 years ago

Not true. It'll work without, it's just very inconvenient.

CapitalistCartr • 3 years ago

Everyone doesn't have to. If one person with your number gives up their contact list, they have yours. I'd guess about 10-12% of the populace would have to cooperate.

FabianBeiner • 3 years ago

That was what made Clubhouse so famous:

"After registering, the clubhouse app asks for access to your address book. This must be granted if you want to invite friends."

jbverschoor • 3 years ago

ipaddr • 3 years ago

I keep no contacts on my phone and gladly give that info away. I'm surprised people don't use multiple phones for privacy.

eclipxe • 3 years ago

Most people don’t care about privacy.

patja • 3 years ago

Based on the popularity of WhatsApp, yes most people don't give it a second thought.

Bjartr • 3 years ago

Yes, constantly.

sneak • 3 years ago

Yes.

justinclift • 3 years ago

Yeah, not sure either. Suspecting it's some other Clubhouse, not the main (project planning) one (https://clubhouse.io).

SahAssar • 3 years ago

It's the audio chat one: https://www.joinclubhouse.com/

justinclift • 3 years ago

Thanks, that makes more sense. :)

afrcnc • 3 years ago

It's fake: https://twitter.com/troyhunt/status/1419013520763539457

Sebguer • 3 years ago

It's not fake, it's just not valuable.

Edit: Or rather, the tweet doesn't claim it's fake, just that it's not valuable.

afrcnc • 3 years ago

No. It's fake. Even Clubhouse said so. It's just randomly generated data.

robertwt7 • 3 years ago

How does it work for the seller when the FBI is the one who ends up buying that list and then busted him in the auction?

Genuinely asking.. might be dumb question

vmception • 3 years ago

If the seller gets caught that is how it works

If the seller doesn’t get caught due to the purchasing methods and general routine OPSEC, then its just another example of the Fed reliably monetizing everything, meaning there will always be a buyer and everyone should sell more.

dmitriid • 3 years ago

That's what law enforcement does all the time: when there are illegal goods for sale, and a chance to catch the seller, they will go in, make the purchase and arrest the seller.

finger • 3 years ago

Sorry for the stupid question, but isn’t it illegal to buy illegal stuff? How does the police get away with that?

For instance in Denmark it is technically illegal to buy stolen goods, even if you genuinely aren’t aware of it being stolen. Im sure this applies to most countries.

zenexer • 3 years ago

LEOs often seem to be exempt when acting in an official capacity. I’m not sure what the restrictions are—do they need a court order in a situation like this?—but LEOs are definitely allowed to break laws and buy illegal wares.

noxer • 3 years ago

Illegal is defined by law and laws applied to a subset of people. What do you think the police does with illegal substances? Not confiscating them because "owning" it is illegal? No, the police does not take ownership the state does and the laws do not apply to the state. There is nothing out there in the world that is illegal for everyone to handle. not drugs, not nukes, not illegal media etc. someone has to have the right to handle it somehow.

dmitriid • 3 years ago

This differs from country to country. There's some info on Wikipedia: https://en.wikipedia.org/wiki/Sting_operation?wprov=sfti1

noxer • 3 years ago

unnouinceput • 3 years ago

Let's play devil's advocate here and assume I am the dude selling the list.

I would ask for monero and would not care if the FBI is the buyer. The most they can do is to watch exchanges where monero is exchanged versus dollars or other cryptocoins. Then do this a few times over and start buying goods with those then sell the goods on Amazon/eBay for hard $$$. Small amounts and even with 50 cents at a dollar is still worth it for one person.

sennight • 3 years ago

I've wondered about the feasibility of using state run lotteries for laundering in a cash based criminal enterprise. The known odds of low cost/return scratch-offs and the need to only account for claimed winnings would make it tempting... if it wasn't so labor intensive.

clavigne • 3 years ago

I don't think it would be a good idea, given that you'd have to claim the winnings. It might work once or twice but not over and over again.

Additionally in most cases I'd think the lottery odds would be lower than the cost of traditional laundering (smurfing, through crooked banks, using cash based businesses like taxis etc.) Especially if you have to pay people to buy tickets.

sennight • 3 years ago

> It might work once or twice but not over and over again.

Except for when it does: there are a bunch of people who have repeatedly jackpotted state lotteries, they're usually described as 'reclusive mathematicians'. But that isn't what I'm talking about. I just checked the TX Lottery Commission's site and it looks like scratchoffs would run, worst case, a 30% return. I can't be bothered to calculate the upper bounds, but I'd expect it to be 40%-ish. That seems good to me, I especially like that you can skip the part where you have to drive out to some hotel to meet an undercover Secret Service agent pretending to be a Wells Fargo employee responding to your help wanted notice in Soldier of Fortune.

Aeolun • 3 years ago

Isn’t it great that a lot of high-tech crime is prevented by the people capable of it being too lazy to bother?

sennight • 3 years ago

I learned a long time ago that the most effective way to correct a vice is to play it against another vice, sloth being an easy goto. But in this case... I'm not a drug dealer, so I don't need to launder large amounts of small bills. But... if I wanted to launder a bunch of public ledger based crypto: instead of a using a loud and proud "bitcoin tumbler", I'd use something like satoshibet. Of course, that is likely why the original no longer exists - and I imagine anyone standing up a replacement (without a sufficiently invasive KYC implementation) would face similar hostility. Anyway, I expect that'll change when a state run satoshibet eventually emerges.

edoceo • 3 years ago

Cant go wrong with Quick Pick.

ptr2voidStar • 3 years ago

Check mate.

michelb • 3 years ago

How realistic would it be to send (anonymous) mass sms messages with phishing or other malicious links to those numbers? I’m occasionally getting sms message with bogus sender info (i cannot reply, nor get contact info), always wonder how spammers pull that off so easily.

Scoundreller • 3 years ago

As a challenge, I try to takedown these things by reporting them to Google Safebrowsing, their SSL provider (if they have one), their host, their URL shortener, etc.

Though in Canada, I'm seeing them apply some cloaking measures so they don't get removed as quickly.

I think there's two streams of this:

1. a crooked telecom that has low-level access

2. buy a bunch of SIM cards and dump them into one of these aliexpress machines that has 16 wireless modems in them that let you do whatever you want:

https://www.aliexpress.com/item/4000462982086.html

Can even network them to a bank thingy that'll hold 128 cards:

https://www.aliexpress.com/item/4000462976225.html

agumonkey • 3 years ago

Ah I wonder if that's related to the bot flood I got recently.

TechBro8615 • 3 years ago

I’ve been getting this since the FB hack (by “hack” I mean the recent bulk enumeration of 500m phone numbers that Facebook facilitated for an unknown party).

fabiandesimone • 3 years ago

Hey @fabianbeiner how can I get in touch with you?

stackedinserter • 3 years ago

Is clubhouse still a thing in July 2021? How do you use it? (and who do you talk to?)

ALittleLight • 3 years ago

It's funny how the hacker who is selling stolen private data is also complaining about GDPR compliance and privacy. On the one hand, he's right that Clubhouse (if this is true) has done something bad, but the hacker is much worse.

mm983 • 3 years ago

They are done for this time. Leaking peoples' number who haven't even signed up yet because of their economy flame approach for literally anything, oh boy...

qpiox • 3 years ago

If you have enough cash and time you can legally create your own list of all possible numbers on the world. Pick a number, dial and see if it exists. Hang up to prevent further charges.

jsjohnst • 3 years ago

> create your own list of all possible numbers on the world. Pick a number, dial and see if it exists.

Let’s say you had the ability to do that 1,000x a minute using an automated dialer. Just in the US alone that would take you over a year to complete and how many of those numbers you verified changed active/disconnected status during that time?

(PS, I didn’t downvote you, just pointing out a problem with your theory)

riffic • 3 years ago

You've invented wardialing

https://en.wikipedia.org/wiki/Wardialing