Back

Curl-impersonate: Special build of curl that can impersonate the major browsers

245 points9 hoursgithub.com
davidsojevic1 hour ago

There's a fork of this that has some great improvements over to the top of the original and it is also actively maintained: https://github.com/lexiforest/curl-impersonate

There's also Python bindings for the fork for anyone who uses Python: https://github.com/lexiforest/curl_cffi

jchw8 hours ago

I'm rooting for Ladybird to gain traction in the future. Currently, it is using cURL proper for networking. That is probably going to have some challenges (I think cURL is still limited in some ways, e.g. I don't think it can do WebSockets over h2 yet) but on the other hand, having a rising browser engine might eventually remove this avenue for fingerprinting since legitimate traffic will have the same fingerprint as stock cURL.

rhdunn8 hours ago

It would be good to see Ladybird's cURL usage improve cURL itself, such as the WebSocket over h2 example you mention. It is also a good test of cURL to see and identify what functionality cURL is missing w.r.t. real-world browser workflows.

nonrandomstring5 hours ago

When I spoke to these guys [0] we touched on those quirks and foibles that make a signature (including TCP stack stuff beyond control of any userspace app).

I love this curl, but I worry that if a component takes on the role of deception in order to "keep up" it accumulates a legacy of hard to maintain "compatibility" baggage.

Ideally it should just say... "hey I'm curl, let me in"

The problem of course lies with a server that is picky about dress codes, and that problem in turn is caused by crooks sneaking in disguise, so it's rather a circular chicken and egg thing.

[0] https://cybershow.uk/episodes.php?id=39

immibis5 hours ago

What should instead happen is that Chrome should stop sending as much of a fingerprint, so that sites won't be able to fingerprint. That won't happen, since it's against Google's interests.

gruez4 hours ago

This is a fundamental misunderstanding of how TLS fingerprinting works. The "fingerprint" isn't from chrome sending a "fingerprint: [random uuid]" attribute in every TLS negotiation. It's derived from various properties of the TLS stack, like what ciphers it can accept. You can't make "stop sending as much of a fingerprint", without every browser agreeing on the same TLS stack. It's already minimal as it is, because there's basically no aspect of the TLS stack that users can configure, and chrome bundles its own, so you'd expect every chrome user to have the same TLS fingerprint. It's only really useful to distinguish "fake" chrome users (eg. curl with custom header set, or firefox users with user agent spoofer) from "real" chrome users.

+2
dochtman4 hours ago
thaumasiotes3 hours ago

> Ideally it should just say... "hey I'm curl, let me in"

What? Ideally it should just say "GET /path/to/page".

Sending a user agent is a bad idea. That shouldn't be happening at all, from any source.

eesmith7 hours ago

I'm hoping this means Ladybird might support ftp URLs.

navanchauhan6 hours ago

and even the Gopher protocol!

ryao6 hours ago

Did they also set IP_TTL to set the TTL value to match the platform being impersonated?

If not, then fingerprinting could still be done to some extent at the IP layer. If the TTL value in the IP layer is below 64, it is obvious this is either not running on modern Windows or is running on a modern Windows machine that has had its default TTL changed, since by default the TTL of packets on modern Windows starts at 128 while most other platforms start it at 64. Since the other platforms do not have issues communicating over the internet, so IP packets from modern Windows will always be seen by the remote end with TTLs at or above 64 (likely just above).

That said, it would be difficult to fingerprint at the IP layer, although it is not impossible.

gruez4 hours ago

>That said, it would be difficult to fingerprint at the IP layer, although it is not impossible.

Only if you're using PaaS/IaaS providers don't give you low level access to the TCP/IP stack. If you're running your own servers it's trivial to fingerprint all manner of TCP/IP properties.

https://en.wikipedia.org/wiki/TCP/IP_stack_fingerprinting

fc417fc8023 hours ago

What is the reasoning behind TTL counting down instead of up, anyway? Wouldn't we generally expect those routing the traffic to determine if and how to do so?

sadjad2 hours ago

The primary purpose of TTL is to prevent packets from looping endlessly during routing. If a packet gets stuck in a loop, its TTL will eventually reach zero, and then it will be dropped.

fc417fc8022 hours ago

That doesn't answer my question. If it counted up then it would be up to each hop to set its own policy. Things wouldn't loop endlessly in that scenario either.

knome2 hours ago

It does make traceroute, where each packet is fired with one more available step than the last, feasible, whereas 'up' wouldn't. Of course, then we'd just start with max-hops and walk the number down I suppose. I still expect it would be inconvenient during debugging for various devices to have various ceilings.

burnished2 hours ago

This is a wild guess but: I am under the impression that the early internet was built somewhat naively so I guess that the sender sets it because they know best how long it stays relevant for/when it makes sense to restart or fail rather than wait.

xrisk6 hours ago

Wouldn’t the TTL value of received packets depend on network conditions? Can you recover the client’s value from the server?

ralferoo5 hours ago

The argument is that if the many (maybe the majority) of systems are sending packets with a TTL of 64 and they don't experience problems on the internet, then it stands to reason that almost everywhere on the internet is reachable in less than 64 hops (personally, I'd be amazed if it any routes are actually as high as 32 hops).

If everywhere is reachable in under 64 hops, then packets sent from systems that use a TTL of 128 will arrive at the destination with a TTL still over 64 (or else they'd have been discarded for all the other systems already).

jamal-kumar4 hours ago

This tool is pretty sweet in little bash scripts combo'd up with gnu parallel on red team engagements for mapping https endpoints within whatever scoped address ranges that will only respond to either proper browsers due to whatever, or with the SNI stuff in order. Been finding it super sweet for that. Can do all the normal curl switches like -H for header spoofing

VladVladikoff7 hours ago

Wait a sec… if the TLS handshakes look different, would it be possible to have an nginx level filter for traffic that claims to be a web browser (eg chrome user agent), yet really is a python/php script? Because this would account for the vast majority of malicious bot traffic, and I would love to just block it.

aaron42net7 hours ago

Cloudflare uses JA3 and now JA4 TLS fingerprints, which are hashes of various TLS handshake parameters. https://github.com/FoxIO-LLC/ja4/blob/main/technical_details... has more details on how that works, and they do offer an Nginx module: https://github.com/FoxIO-LLC/ja4-nginx-module

gruez7 hours ago

That's basically what security vendors like cloudflare does, except with even more fingerprinting, like a javascript challenge that checks the js interpreter/DOM.

walrus017 hours ago

JS to check user agent things like screen window dimensions as well, which legit browsers will have and bots will also present but with a more uniform and predictable set of x and y dimensions per set of source IPs. Lots of possibilities for js endpoint fingerprinting.

jrochkind15 hours ago

Well, I think that's what OP is meant to avoid you doing, exactly.

immibis5 hours ago

Yes, and sites are doing this and it absolutely sucks because it's not reliable and blocks everyone who isn't using the latest Chrome on the latest Windows. Please don't whitelist TLS fingerprints unless you're actually under attack right now.

fc417fc8022 hours ago

If you're going to whitelist (or block at all really) please simply redirect all rejected connections to a proof of work scheme. At least that way things continue to work with only mild inconvenience.

pvg9 hours ago
croemer8 hours ago

Back then (2022) it was Firefox only

jruohonen4 hours ago

The notion of real-world TLS/HTTP fingerprinting was somewhat new to me, and it looks interesting in theory, but I wonder what the build's use case really is? I mean you have the heavy-handed JavaScript running everywhere now.

bossyTeacher7 hours ago

Cool tool but it shouldn't matter whether the client is a browser or not. I feel sad that we need such a tool in the real world

jimt12347 hours ago

About six months ago I went to a government auction site that required Internet Explorer. Yes, Internet Explorer. The site was active, too; the auction data was up-to-date. I added a user-agent extension in Chrome, switched to IE, retried and it worked; all functionality on the site was fine. So yeah, I was both sad and annoyed. My guess is this government office paid for a website 25 years ago and it hasn't been updated since.

jorvi5 hours ago

In South Korea, ActiveX is still required for many things like banking and government stuff. So they're stuck with both IE and the gaping security hole in it that is ActiveX.

pixl974 hours ago

SK: "Why fix a problem when we're going extinct in 3 generations anyway"

IMSAI80806 hours ago

Yeah it's probably an ancient web site. This was commonplace back in the day when Internet Explorer had 90%+ market share. Lazy web devs couldn't be bothered to support other browsers (or didn't know how) so just added a message demanding you use IE as opposed to fixing the problems with the site.

brutal_chaos_7 hours ago

You may enter our site iff you use software we approve. Anything else will be seen as malicious. Papers please!

I, too, am saddened by this gatekeeping. IIUC custom browsers (or user-agent) from scratch will never work on cloudflare sites and the like until the UA has enough clout (money, users, etc) to sway them.

DrillShopper6 hours ago

This was sadly always going to be the outcome of the Internet going commercial.

There's too much lost revenue in open things for companies to embrace fully open technology anymore.

jrockway6 hours ago

It's kind of the opposite problem as well; huge well-funded companies bringing down open source project websites. See Xe's journey here: https://xeiaso.net/blog/2025/anubis/

One may posit "maybe these projects should cache stuff so page loads aren't actually expensive" but these things are best-effort and not the core focus of these projects. You install some Git forge or Trac or something and it's Good Enough for your contributors to get work done. But you have to block the LLM bots because they ignore robots.txt and naively ask for the same expensive-to-render page over and over again.

The commercial impact is also not to be understated. I remember when I worked for a startup with a cloud service. It got talked about here, and suddenly every free-for-open-source CI provider IP range was signing up for free trials in a tight loop. These mechanical users had to be blocked. It made me sad, but we wanted people to use our product, not mine crypto ;)

burnished2 hours ago

>> Otherwise your users have to see a happy anime girl every time they solve a challenge. This is a feature.

I love that human, what a gem

throwawaytodey6 hours ago

[dead]

unit1491 hour ago

[dead]

11000011117 hours ago

[flagged]

Jotalea7 hours ago

[flagged]

ec1096858 hours ago

There are API’s that chrome provides that allows servers to validate whether the request came from an official chrome browser. That would detect that this curl isn’t really chrome.

It’d be nice if something could support curl’s arguments but drive an actual headless chrome browser.

darrenf8 hours ago

Are you referring to the Web Environment Integrity[0] stuff, or something else? 'cos WEI was abandoned in late 2023.

[0] https://github.com/explainers-by-googlers/Web-Environment-In...

do_not_redeem7 hours ago

Siblings are being more charitable about this, but I just don't think what you're suggesting is even possible.

An HTTP client sends a request. The server sends a response. The request and response are made of bytes. Any bytes Chrome can send, curl-impersonate could also send.

Chromium is open source. If there was some super secret handshake, anyone could copy that code to curl-impersonate. And if it's only in closed-source Chrome, someone will disassemble it and copy it over anyway.

gruez7 hours ago

>Chromium is open source. If there was some super secret handshake, anyone could copy that code to curl-impersonate. And if it's only in closed-source Chrome, someone will disassemble it and copy it over anyway.

Not if the "super secret handshake" is based on hardware-backed attestation.

do_not_redeem7 hours ago

True, but beside the point.

GP claims the API can detect the official chrome browser, and the official chrome browser runs fine without attestation.

dist-epoch5 hours ago

> someone will disassemble it and copy it over anyway.

Not if Chrome uses homomorphic encryption to sign a challange. It's doable today. But then you could run a real Chrome and forward the request to it.

do_not_redeem3 hours ago

No, even homomorphic encryption wouldn't help.

It doesn't matter how complicated the operation is, if you have a copy of the Chrome binary, you can observe what CPU instructions it uses to sign the challenge, and replicate the operations yourself. Proxying to a real Chrome is the most blunt approach, but there's nothing stopping you from disassembling the binary and copying the code to run in your own process, independent of Chrome.

+1
dist-epoch3 hours ago
binarymax8 hours ago

I’m interested in learning more about this. Are these APIs documented anywhere and are there server side implementation examples that you know of?

EDIT: this is the closest I could find. https://developers.google.com/chrome/verified-access/overvie... ...but it's not generic enough to lead me to the declaration you made.

KTibow5 hours ago

I think they confused Chrome and Googlebot.

bowmessage8 hours ago

There’s no way this couldn’t be replicated by a special build of curl.

anon63628 hours ago

Set a UA and any headers and/or cookies with regular cURL compiled with HTTP/3. This can be done with wrapper scripts very easily. 99.999% of problems solved with no special magic buried in an unclean fork.

mmh00008 hours ago

You should really read the "Why" section of the README before jumping to conclusions:

``` some web services use the TLS and HTTP handshakes to fingerprint which client is accessing them, and then present different content for different clients. These methods are known as TLS fingerprinting and HTTP/2 fingerprinting respectively. Their widespread use has led to the web becoming less open, less private and much more restrictive towards specific web clients

With the modified curl in this repository, the TLS and HTTP handshakes look exactly like those of a real browser. ```

For example, this will get you past Cloudflare's bot detection.

01HNNWZ0MV43FF8 hours ago

The README indicates that this fork is compiled with nss (from Firefox) and BoringSSL (from Chromium) to resist fingerprinting based on the TLS lib. CLI flags won't do that.

psanford8 hours ago

That doesn't solve the problem of TLS handshake fingerprinting, which is the whole point of this project.

andrewmcwatters8 hours ago

That’s not the point of this fork.

And “unclean fork” is such an unnecessary and unprofessional comment.

There’s an entire industry of stealth browser technologies out there that this falls under.