I am selling a database with ten billion phone numbers. 1.25 GB file with each number compressed to a single bit. You can compare the clubhouse database against mine to determine which numbers are not in their set.
The 3.8B numbers is really meaningless, in isolation. This is the problem of plenty - 10K numbers with a very specific profile might be a lot more valuable. The real worry would be the info on the relationships between the numbers (which number is connected to whom). This leak seems to have a count of relations rather than the actual connections.
Well the facebook data that was published everywhere earlier this year could hold some value when combined with this one: While the facebook data is somewhat outdated, I'm pretty sure you'd get millions of people with relevant and up to date information.
https://twitter.com/UnderTheBreach/status/141888964970820813...
this tweet says it's BS (they validated the japan sample)
According to the Tweet, the leaker provides a claimed data sample that is a list of phone numbers without any additional information.
A list of 3.8 billion phone numbers that simply exist is useless. The leak would only have value if the numbers were associated with some identifying information.
If it’s really only phone numbers, I wonder if it’s a leak or if someone brute-forced all possible phone numbers against a ClubHouse API that leaked information about whether or not the number existed in their database.
If Clubhouse can’t detect >3.8B erroneous requests and shut down that API/microservice, that destroys my confidence more than a data breach.
Clubhouse didn't have 3.8B users.. why would they have 3.8B phone numbers?
This whole thing seems made up.
A fair share of my phone numbers are bogus(old numbers, info I store as a phone number even if its not) so the db extracted from here would be dubious
I'd say it's even more of a dark pattern than that. They didn't encourage me to "upload my contact list" but rather "give access to my contacts" (or something like that) Perhaps the difference is trivial in how it's coded yet even though I've removed their access to my contacts, they still have my contacts. I think they should have to delete them whenever I remove their access, or not even upload them in the first place but just read them when necessary.
Also, some apps seem to do this with photos, asking for access, does anyone know if these apps also upload all of one's photos once the user grants permission on iOS?
That would only be true if it were 380 _unique_ contacts per person. Surely there is significant overlap from user to user.
Shouldn't it be 380 distinct people?
they didn't "validate" anything, they just opened the csv. also i'd be interested in their take on the second column, that looks like clubhouse's scoring system (which they ran without telling anyone, likely for marketing purposes, according to this* article). if so, you can in fact tell which numbers are more significant than others.
*https://futurezone.at/apps/clubhouse-leakt-38-milliarden-tel...
Hmm, so the "highest" numbers would be publicly-knowable numbers anyway (because they are the numbers to dial and contact the government/customer service of a private company).
If this is only a list of numbers and their relative popularity, the best you can do is accusation of adultery (and even in that, you could say that you're "popular" because coworkers also store your numbers).
They should combine it with that zero click remote iMessage bug. That’d be some serious black hat marketing synergy.
Enough phone numbers for half the population of the world? Cool story, bro.
I refer here to the aspiring salespeople, not the person reporting it. I suspect this list will be available for free on the dark web within a couple of months. Much as I like to collect interesting data this doesn't seem useful.
I wonder how feasible a business model it is to collect all the data from all leaks which make their way to the internet, massage the data a little bit, and sell it as a brand new "hack" of some popular service. You can probably do this a few times a year without a problem.
Why fake a new data leak at all? It's likely to be illegal either way. Depending on the quality of your work I suspect it would be easy to find buyers for aggregated and cross validated data sets on the black market.
For that matter, I have to assume that the shadier businesses silently make use of publicly available leaks. The data is just too valuable to ignore depending on your business model.
Bilions ?? On clubhouse ?
Clubhouse does the classic “share your contacts with us to find your friends here” thing, but it sounds like they just upload your entire list into their database instead of doing anything remotely privacy aware. I’m mostly curious how much else they uploaded with the numbers - is this name + number + email etc? And if this dump is just numbers, do Clubhouse have the rest somewhere else?
According to the screenshot: All members plus every single number in each of their phone books.
Even if they had 10M users (which I doubt), at 100 contacts per user that's 1B contacts.
They forgot to "select distinct"?
It includes every users' contact list from their phone. So likely damn near everyone on the planet with a cell phone.
Are people really that stupid to give some mobile app company access to their contact list? On iPhone you have to explicitly give permission, I presume on Android as well. I find that hard to believe everyone is doing it.
Many apps will refuse to work if you don't allow access to your contacts, so people just give in and allow it.
Google is the biggest abuser in this area just grabbing all your contacts and linking them to your Google account once you add any Google account (like Gmail or Youtube) to your Android device.
Maybe not for smaller apps but apps with large user bases are under different rules than the rest.
It's extremely annoying to add a number to Telegram without adding it as a contact first, and allowing Telegram access to the contact list.
>Are people really that stupid to give some mobile app company access to their contact list?
Almost every social media startup in the last 15 years was bootstrapped this way.
Afaik WhatsApp (on Android at least) requires you giving access to your contacts. So roughly speaking a huge chunk, probably the majority, of smartphone users shared their contact list to at least one company, which strictly speaking might not even be legal in many cases.
After all that's how WhatsApp populates its contact list, it looks which users have each other's phone numbers. That way it doesn't need a user login and friend/contact requests, but in return you give up your privacy.
Not true. It'll work without, it's just very inconvenient.
Everyone doesn't have to. If one person with your number gives up their contact list, they have yours. I'd guess about 10-12% of the populace would have to cooperate.
That was what made Clubhouse so famous:
"After registering, the clubhouse app asks for access to your address book. This must be granted if you want to invite friends."
It must be granted to invite friends but you can deny it access and still use Clubhouse, just that until you grant access you can’t invite others.
I keep no contacts on my phone and gladly give that info away. I'm surprised people don't use multiple phones for privacy.
Most people don’t care about privacy.
Based on the popularity of WhatsApp, yes most people don't give it a second thought.
Yes, constantly.
Yes.
Yeah, not sure either. Suspecting it's some other Clubhouse, not the main (project planning) one (https://clubhouse.io).
It's the audio chat one: https://www.joinclubhouse.com/
Thanks, that makes more sense. :)
It's not fake, it's just not valuable.
Edit: Or rather, the tweet doesn't claim it's fake, just that it's not valuable.
No. It's fake. Even Clubhouse said so. It's just randomly generated data.
How does it work for the seller when the FBI is the one who ends up buying that list and then busted him in the auction?
Genuinely asking.. might be dumb question
If the seller gets caught that is how it works
If the seller doesn’t get caught due to the purchasing methods and general routine OPSEC, then its just another example of the Fed reliably monetizing everything, meaning there will always be a buyer and everyone should sell more.
That's what law enforcement does all the time: when there are illegal goods for sale, and a chance to catch the seller, they will go in, make the purchase and arrest the seller.
Sorry for the stupid question, but isn’t it illegal to buy illegal stuff? How does the police get away with that?
For instance in Denmark it is technically illegal to buy stolen goods, even if you genuinely aren’t aware of it being stolen. Im sure this applies to most countries.
LEOs often seem to be exempt when acting in an official capacity. I’m not sure what the restrictions are—do they need a court order in a situation like this?—but LEOs are definitely allowed to break laws and buy illegal wares.
Illegal is defined by law and laws applied to a subset of people. What do you think the police does with illegal substances? Not confiscating them because "owning" it is illegal? No, the police does not take ownership the state does and the laws do not apply to the state. There is nothing out there in the world that is illegal for everyone to handle. not drugs, not nukes, not illegal media etc. someone has to have the right to handle it somehow.
This differs from country to country. There's some info on Wikipedia: https://en.wikipedia.org/wiki/Sting_operation?wprov=sfti1
You're describing entrapment
Let's play devil's advocate here and assume I am the dude selling the list.
I would ask for monero and would not care if the FBI is the buyer. The most they can do is to watch exchanges where monero is exchanged versus dollars or other cryptocoins. Then do this a few times over and start buying goods with those then sell the goods on Amazon/eBay for hard $$$. Small amounts and even with 50 cents at a dollar is still worth it for one person.
I've wondered about the feasibility of using state run lotteries for laundering in a cash based criminal enterprise. The known odds of low cost/return scratch-offs and the need to only account for claimed winnings would make it tempting... if it wasn't so labor intensive.
I don't think it would be a good idea, given that you'd have to claim the winnings. It might work once or twice but not over and over again.
Additionally in most cases I'd think the lottery odds would be lower than the cost of traditional laundering (smurfing, through crooked banks, using cash based businesses like taxis etc.) Especially if you have to pay people to buy tickets.
> It might work once or twice but not over and over again.
Except for when it does: there are a bunch of people who have repeatedly jackpotted state lotteries, they're usually described as 'reclusive mathematicians'. But that isn't what I'm talking about. I just checked the TX Lottery Commission's site and it looks like scratchoffs would run, worst case, a 30% return. I can't be bothered to calculate the upper bounds, but I'd expect it to be 40%-ish. That seems good to me, I especially like that you can skip the part where you have to drive out to some hotel to meet an undercover Secret Service agent pretending to be a Wells Fargo employee responding to your help wanted notice in Soldier of Fortune.
Isn’t it great that a lot of high-tech crime is prevented by the people capable of it being too lazy to bother?
I learned a long time ago that the most effective way to correct a vice is to play it against another vice, sloth being an easy goto. But in this case... I'm not a drug dealer, so I don't need to launder large amounts of small bills. But... if I wanted to launder a bunch of public ledger based crypto: instead of a using a loud and proud "bitcoin tumbler", I'd use something like satoshibet. Of course, that is likely why the original no longer exists - and I imagine anyone standing up a replacement (without a sufficiently invasive KYC implementation) would face similar hostility. Anyway, I expect that'll change when a state run satoshibet eventually emerges.
Cant go wrong with Quick Pick.
Check mate.
How realistic would it be to send (anonymous) mass sms messages with phishing or other malicious links to those numbers? I’m occasionally getting sms message with bogus sender info (i cannot reply, nor get contact info), always wonder how spammers pull that off so easily.
As a challenge, I try to takedown these things by reporting them to Google Safebrowsing, their SSL provider (if they have one), their host, their URL shortener, etc.
Though in Canada, I'm seeing them apply some cloaking measures so they don't get removed as quickly.
I think there's two streams of this:
1. a crooked telecom that has low-level access
2. buy a bunch of SIM cards and dump them into one of these aliexpress machines that has 16 wireless modems in them that let you do whatever you want:
https://www.aliexpress.com/item/4000462982086.html
Can even network them to a bank thingy that'll hold 128 cards:
Ah I wonder if that's related to the bot flood I got recently.
I’ve been getting this since the FB hack (by “hack” I mean the recent bulk enumeration of 500m phone numbers that Facebook facilitated for an unknown party).
Hey @fabianbeiner how can I get in touch with you?
Is clubhouse still a thing in July 2021? How do you use it? (and who do you talk to?)
It's funny how the hacker who is selling stolen private data is also complaining about GDPR compliance and privacy. On the one hand, he's right that Clubhouse (if this is true) has done something bad, but the hacker is much worse.
They are done for this time. Leaking peoples' number who haven't even signed up yet because of their economy flame approach for literally anything, oh boy...
If you have enough cash and time you can legally create your own list of all possible numbers on the world. Pick a number, dial and see if it exists. Hang up to prevent further charges.
> create your own list of all possible numbers on the world. Pick a number, dial and see if it exists.
Let’s say you had the ability to do that 1,000x a minute using an automated dialer. Just in the US alone that would take you over a year to complete and how many of those numbers you verified changed active/disconnected status during that time?
(PS, I didn’t downvote you, just pointing out a problem with your theory)
You've invented wardialing
Knowing which numbers are capable of receiving SMS and which aren't has some value.
Especially in a world of number portability where you can't just say "oh, that's an old number, it must be POTS".
But I guess, here, if a number is from your contact list, it may still be POTS.
But at least you have higher assurance that it's an active user. If you wardial one day, you quickly find out how many numbers never lead to a human for various reasons. In theory, some of these are trap numbers and quickly flag the caller as suspicious, but I doubt it.
"Knowing which numbers are capable of receiving SMS and which aren't has some value."
This isn't difficult - I wrote a shell script named "lookup" that will give me background info for any phone number I feed it and tell me what kind of number it is, what carrier it is, who it belongs to, etc.:
... which is very useful since I often send (personal) SMS from the command line and sometimes I need to know if a number can receive it ...I'm not going to paste the entire script here but the meat of it is:
... and each lookup costs a penny or a half a penny or something ... I forget ...How would your script obtain this information though? Relying on twilio?
In some countries mobile phone numbers have a prefix so you know by that.
Also some POTS provider will accept SMS and either read it to you, or you can read them in some web portal (or the router possibly).
The Local Routing Number provides this value in the USA, and multiple carriers (eg:Twilio) offer daily deactivation reports from the cellular carriers so you can tell which numbers are unroutable.
Canada isn't as progressive. Only telecoms can see which telecom a number points to and for the purpose of call-routing only.
Great. It’s the weekend and I can theoretically now stop thinking about software, and yet here I am thinking of ways to efficiently compress lists of phone numbers
There was a thread about that last month,
https://news.ycombinator.com/item?id=27549075 ("Sorted Integer Compression")
The rabbit hole deepens…
Just enumerate them all, if none is missing it's fairly easy to compress. (And 1b per number is really inefficient) ;-)
main = traverse print [1..99999999]
The Kolmogorov complexity of the set of all phone numbers is pretty low. All phone numbers with a few missing is also pretty low.
In fact, I now wonder if you can even compress the 3.8b phone number set to less than 1 bit per phone number. It should be pretty doable since a significant chunk of the number space is not valid.
But not all numbers are valid? 911. Not all area codes exist.
What language is that?
Haskell
Presumably all non-american ones are not on your list?
How much?
I have even better - for every country, just covering all their operator's prefix and then 99999-9999999 numbers in that range. Definitely the biggest dataset around, and bigger is alwyas better, right?