Coding with LLMs in the summer of 2025 – an update

dakiol • 9 days ago

> Gemini 2.5 PRO | Claude Opus 4

Whether it's vibe coding, agentic coding, or copy pasting from the web interface to your editor, it's still sad to see the normalization of private (i.e., paid) LLM models. I like the progress that LLMs introduce and I see them as a powerful tool, but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming. Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM, that not using them would be like not using an IDE or vim nowadays), since everyone is using private LLMs. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.

simonw • 9 days ago

The models I can run locally aren't as good yet, and are way more expensive to operate.

Once it becomes economical to run a Claude 4 class model locally you'll see a lot more people doing that.

The closest you can get right now might be Kimi K2 on a pair of 512GB Mac Studios, at a cost of about $20,000.

QRY • 9 days ago

Have you considered the Framework Desktop setup they mentioned in their announcement blog post[0]? Just marketing fluff, or is there any merit to it?

> The top-end Ryzen AI Max+ 395 configuration with 128GB of memory starts at just $1999 USD. This is excellent for gaming, but it is a truly wild value proposition for AI workloads. Local AI inference has been heavily restricted to date by the limited memory capacity and high prices of consumer and workstation graphics cards. With Framework Desktop, you can run giant, capable models like Llama 3.3 70B Q6 at real-time conversational speed right on your desk. With USB4 and 5Gbit Ethernet networking, you can connect multiple systems or Mainboards to run even larger models like the full DeepSeek R1 671B.

I'm futsing around with setups, but adding up the specs would give 384GB of VRAM and 512GB total memory, at a cost of about $10,000-$12,000. This is all highly dubious napkin math, and I hope to see more experimentation in this space.

There's of course the moving target of cloud costs and performance, so analysing break-even time is even more precarious. So if this sort of setup would work, its cost-effectiveness is a mystery to me.

[0] https://frame.work/be/en/blog/introducing-the-framework-desk...

lhl • 9 days ago

Strix Halo does not run a 70B Q6 dense model at real-time conversational speed - it has a real-world MBW of about 210 GB/s. A 40GB Q4 will clock just over 5 tok/s. A Q6 would be slower.

It will run some big MoEs at a decent speed (eg, Llama 4 Scout 109B-A17B Q4 at almost 20 tok/s). The other issue is its prefill - only about 200 tok/s due to having only very under-optimized RDNA3 GEMMs. From my testing, you usually have to trade off pp for tg.

If you are willing to spend $10K for hardware, I'd say you are much better off w/ EPYC and 12-24 channels of DDR5, and a couple fast GPUS for shared experts and TFLOPS. But, unless you are doing all-night batch processing, that $10K is probably better spent on paying per token or even renting GPUs (especially when you take into account power).

Of course, there may be other reasons you'd want to inference locally (privacy, etc).

moffkalast • 9 days ago

Yeah it's only really viable for chat use cases, coding is the most demanding in terms of generation speed, to keep the workflow usable it needs to spit out corrections in seconds, not minutes.

I use local LLMs as much as possible myself, but coding is the only use case where I still entirely defer to Claude, GPT, etc. because you need both max speed and bleeding edge model intelligence for anything close to acceptable results. When Qwen-3-Coder lands + having it on runpod might be a low end viable alternative, but likely still a major waste of time when you actually need to get something done properly.

cheeze • 9 days ago

I love Framework but it's still not enough IMO. My time is the most valuable thing, and a subscription to $paid_llm_of_choice is _cheap_ relative to my time spent working.

In my experience, something Llama 3.3 works really well for smaller tasks. For "I'm lazy and want to provide minimal prompting for you to build a tool similar to what is in this software package already", paid LLMs are king.

If anything, I think the best approach for free LLMs would be to run using rented GPU capacity. I feel bad knowing that I have a 4070ti super that sits idle for 95% of the time. I'd rather share an a1000 with bunch of folks and have that run at close to max utilization.

generic92034 • 9 days ago

jakebennet89 • 8 days ago

[dead]

smcleod • 9 days ago

The framework desktop isn't really that compelling for work with LLMs, it's memory bandwidth is very low compared to GPUs and Apple Silicon Max/Ultra chips - you'd really notice how slow LLMs are on it to the point of frustration. Even a 2023 Macbook Pro with a M2 Max chip has twice the usable bandwidth.

komali2 • 9 days ago

They demo'd it live at computex and it was slooooow. Like two characters a second slow. Iirc he had 4 machines clustered.

oblio • 9 days ago

One second, don't LLMs generally run in VRAM? If you put them in regular RAM, don't they have to go through the CPU which kills performance?

pxeger1 • 9 days ago

zackify • 9 days ago

The memory bandwidth is crap and you’ll never run anything close to Claude on that unfortunately. They should have shipped something 8x faster at least 2 tb/s bandwidth

NiloCK • 9 days ago

> Once it becomes economical to run a Claude 4 class model locally you'll see a lot more people doing that.

By that time Claude 5 (or whatever) will be available over API.

I am grateful for upward pressure from models with published binaries - I do believe this is fundamental floor-raising technology.

Choosing frontier-1 for the sake of privacy, autonomy, etc will always be a hard sell and only ever to a pretty niche market. Even me - I'm ideologically part of this market, but I'm already priced out hardware wise.

smallerize • 9 days ago

I don't have to actually run it locally to remove lock-in. Several cloud providers offer full DeepSeek R1 or Kimi K2 for $2-3/million output tokens.

ketzo • 9 days ago

In what ways is that better for you than using eg Claude? Aren’t you then just “locked in” to having a cloud provider which offers those models cheaply?

viraptor • 9 days ago

jmb99 • 9 days ago

What’s your budget and speed requirement? A quad-CPU Xeon E7 v4 server (Supermicro X10QBI, for example) with 1TB of RAM gives you ~340GB/s memory bandwidth and enough actual memory to host a full DeepSeek instance, but it will be relatively slow (a few tokens/s max in my experience). Up front cost a bit under $1k, less if you can source cheap 32GB DDR3 RAM. Power consumption is relatively high, ~1kW under load. But I don’t think you can self host a large model cheaper than that.

(If you need even more memory you could equip one of those servers with 6TB of DDR3 but you’ll lose a bit of bandwidth if you go over 2TB. DDR4 is also a slightly faster option but you’re spending 4x as much for the same capacity.)

pmarreck • 9 days ago

This would require massively more power than the Mac Studios.

jmb99 • 8 days ago

Yep, ~1kW as mentioned. Depending on your electrical rate, break even might be years down the line. And obviously the Mac Studios would perform substantially better.

Edit: And also, to get even half as much memory, you need to spend $10k. If you want to host actually-large LLMs (not quantized/distilled versions), you'll need to spend close to that much. Maybe you can get away with 256GB for now, but that won't even host full Deepseek now (and I don't know if 512GB either, with OS/etc overhead, and a large context window).

theshrike79 • 8 days ago

I think we're at early 2000's bitcoin markets here.

People were buying stores empty of GPUs to mine for BTC.

Then people built custom ASICs that couldn't do anything but mine BTC, but did it a lot cheaper and with a lot less electricity required -> nobody GPU mines anymore pretty much.

I'm waiting for a similar thing to happen to local AI.

theshrike79 • 8 days ago

This is the thing. I'm waiting for the equivalent of Google Coral[0], but powerful enough for AI workloads.

You can plug in the $60 Coral to a Raspberry Pi and get real-time image recognition running in Frigate.

When I can have:

1) Something similar inside my computer/laptop

2) Something I can plug in to my computer via USB-C

3) Something I can buy and install to my LAN so all devices in my home can connect to it

I'll buy it instantly.

What I don't want is a massive generic GPU that just happens to be good at AI workloads, I want custom hardware that's more efficient and cheaper.

(Off topic, but my guess is that Apple is aiming for #3 with an Apple TV variant so you can have more power than your phone, but still keep it 100% local)

[0] https://coral.ai/products/accelerator/

Abishek_Muthian • 9 days ago

What type of code you write for which the opensource models aren't good enough?

I use Qwen2.5 coder for auto complete and occasional chat. I don't want AI to edit my code and so this works well for me.

I agree that the hardware investment for local AI is steep but IMO the local models are good enough for most experienced coders who just want a better autocomplete than the one provided by the IDE by default.

InterviewFrog • 9 days ago

Using AI for autocomplete is like using a racecar to pick up groceries. This is exactly what the author says about avoiding LLMs for some ideological or psychological refusal.

manmal • 7 days ago

Nothing's wrong with using autocomplete in addition to agents.

arvinsim • 9 days ago

The people who are subscribing to private LLM models are not doing it for better autocomplete. These are the people who want more features like agents.

mythz • 9 days ago

Whilst it's not economically feasible to self-host, using premier OSS models like Kimi K2 / DeepSeek via OpenRouter gets you a great price with a fallback safety net of being able to self-host should the proprietary model Co's collude and try and squeeze more ROI out of us. Hopefully by then the hardware to run the OSS models will be a lot cheaper to run.

poulpy123 • 8 days ago

Yeah. I cannot even run significantly worse models on any machine I have at home.

oblio • 9 days ago

The thing is, code is quite compact. Why do LLMs need to train on content bigger than the size of the textual internet to be effective?

Total newb here.

airspresso • 9 days ago

Many reasons, one being that LLMs are essentially compressing the training data to unbelievably small data volumes (the weights). When doing so, they can only afford to keep the general principles and semantic meaning of the training data. Bigger models can memorize more than smaller ones of course, but are still heavily storage limited. Through this process they become really good at semantic understanding of code and language in general. It takes a certain scale of training data to achieve that.

oblio • 9 days ago

otabdeveloper4 • 9 days ago

a) Rent a GPU server. b) Learn to finetune your models. You're a programmer, right? Whatever happened to knowing your tools?

OP is right, these people are posers and fakers, not programmers.

simonw • 8 days ago

Have you had any success finetuning models? What did you do?

otabdeveloper4 • 7 days ago

Not yet. That day will come though.

zer00eyz • 9 days ago

> Once it becomes economical to run a Claude 4 class model locally you'll see a lot more people doing that.

Historically these sorts of things happened because of Moores law. Moores law is dead. For a while we have scaled on the back of "more cores", and process shrink. It looks like we hit the wall again.

We seem to be near the limit of scaling (physics) we're not seeing a lot in clock (some but not enough), and IPC is flat. We are also having power (density) and cooling (air wont cut it any more) issues.

The requirements to run something like claud 4 local aren't going to make it to house hold consumers any time soon. Simply put the very top end of consumer PC's looks like 10 year old server hardware, and very few people are running that because there isn't a need.

The only way we're going to see better models locally is if there is work (research, engineering) put into it. To be blunt that isnt really happening, because Fb/MS/Google are scaling in the only way they know how. Throw money at it to capture and dominate the market, lock out the innovators from your API and then milk the consumer however you can. Smaller, and local is antithetical to this business model.

Hoping for the innovation that gives you a moat, that makes you the next IBM isnt the best way to run a business.

Based on how often Google cancels projects, based on how often the things Zuck swear are "next" face plant (metaverse) one should not have a lot of hope about AI>

Aurornis • 9 days ago

> We seem to be near the limit of scaling (physics) we're not seeing a lot in clock (some but not enough), and IPC is flat. We are also having power (density) and cooling (air wont cut it any more) issues.

This is exaggeration. CPUs are still getting faster. IPC is increasing, not flat. Cooling on air is fine unless you’re going for high density or low noise.

This is just cynicism. Even an M4 MacBook Pro is substantially faster than an M1 from a few years ago, which is substantially faster than the previous versions.

Server chips are scaling core counts and bandwidth. GPUs are getting faster and faster.

The only way you could conclude scaling is dead is if you ignored all recent progress or you’re expecting improvements at an unrealistically fast rate.

zer00eyz • 8 days ago

> IPC is increasing, not flat.

Benchmarks going up is not IPC increasing. These are separate things.

Please look IPC for the latest GPU's from Nvidia, the latest CPU's from AMD. The IPC is flat. See intel loosing credibility with failing processors due to power problems from clocking because IPC is flat.

> Even an M4 MacBook Pro is substantially faster than an M1

Again, clocking. m4 (non pro) vs m1 are so close in IPC on common tasks that its negligible. The performance gains between the two are from memory bandwidth not core performance.

> Server chips are scaling core counts

Parallelism is not the same as performance. Intel dropping the "core duo" 20 year ago was that RUNNING at 2ghz was an admission that single threading was ending. 20 years on were 20 cores deep (consumer), and only at 4ghz with "boost clocks" (back to that pesky power and cooling problem).

And this product still exists today: the N150 (close enough). Its has lower power consumption and more cores. And what was the single core performance gain? 35% Improvement in 20 years.

None of these things are running any of the LLM's that power the tools were talking about. Those are in the datacenter. 700 core CPU's, 400-800gbps top of rack switching are the bleeding edge. This is where "power" and cooling have hit the wall. The spacing requirements of a bleeding edge NVIDIA install are impacting the costs of interconnect between systems. Lots of fiber and needing to be spaced out because of power/heat adds up to a boat load of extra networking costs. Having half empty racks because of density is now a reality.

And you see these same issues at home: power demands of GPU's for consumers and workstations are thought he roof. Were past what the PCI spec can provide, all that power is heat and has to go somewhere. Sometimes it burns up poorly designed connectors. The latest gen is consumes even more power, to push clocks higher, for very little gain (see flat IPC nvida).

esafak • 9 days ago

Model efficiency is outpacing Moore's law. That's what DeepSeek V3 was about. It's just we're simultaneously finding ways to use increase model capacity, and that's growing even faster...

zer00eyz • 9 days ago

viraptor • 9 days ago

> What problem in complexity theory was solved

None. We're still in the "if you spend enough effort you can make things less bad" era of LLMs. It will be a while before we even find out what are the theoretical limits in that area. Everyone's still running on roughly the same architecture after all - big corps haven't even touched recursive LLMs yet!

moron4hire • 9 days ago

I agree with you that Moore's Law being dead means we can't expect much more from current, silicon-based GPU compute. Any improvement from hardware alone is going to have to come from completely new compute technology, of which I don't think there is anything mature enough to expect any results in the next 10 years.

Right now, hardware wise, we need more RAM in GPUs than we really need compute. But it's a breakpoint issue: you need enough RAM to hold the model. More RAM that is less than the model is not going to improve things much. More RAM that is more than the model is largely dead weight.

I don't think larger models are going to show any major inference improvements. They hit the long tail of diminishing returns re: model training vs quality of output at least 2 years ago.

I think the best anyone can hope for in optimizing current LLM technology is improve the performance of inference engines, and there at most I can imagine only about a 5x improvement. That would be a really long tail of performance optimizations that would take at least a decade to achieve. In the 1 to 2 year timeline, I think the best that could be hoped for is a 2x improvement. But I think we may have already seen much of the low hanging optimization fruit already picked, and are starting to turn the curve into that long tail of incremental improvements.

I think everyone betting on LLMs improving the performance of junior to mid level devs and that leading to a Renaissance of software development speed is wildly over optimistic as to the total contribution to productivity those developers already represent. Most of the most important features are banged out by harried, highly skilled senior developers. Most everyone else is cleaning up around the edges of that. Even a 2 or 3x improvement of the bottom 10% of contributions is only going to grow the pie just so much. And I think these tools are basically useless to skilled senior devs. All this "boilerplate" code folks keep cheering the AI is writing for them is just not that big of a deal. 15 minutes of savings once a month.

But I see how this technology works and what people are asking it to do (which in my company is basically "all the hard work that you already weren't doing, so how are you going to even instruct an LLM to do it if you don't really know how to do it?") and there is such a huge gap between the two that I think it's going to take at least a 100x improvement to get there.

I can't see AI being all that much of an improvement on productivity. It still gives wrong results too many times. The work needed to make it give good results is the same sort of work we should have been doing already to be able to leverage classical ML systems with more predictable performance and output. We're going to spend trillions as an industry trying to chase AI that will only end up being an exercise in making sure documents are stored in a coherent, searchable way. At which point, why not do just that and avoid having to pressure the energy industry to firing up a bunch of old coal plants to meet demand?

mleo • 9 days ago

Why wouldn’t 3rd party hardware vendors continue to work on reducing costs of running models locally? If there is a market opportunity for someone to make money, it will be filled. Just because the cloud vendors don’t develop hardware someone will. Apple has vested interest in making hardware to run better models locally, for example.

zer00eyz • 9 days ago

> Why wouldn’t 3rd party hardware vendors continue to work on reducing costs of running models locally?

Every one wants this to happen they are all trying but...

EUV, what has gotten us down to 3nm and less is HARD. Reduction in chip size has lead to increases in density and lower costs. But now yields are DOWN and the design concessions to make the processes work are hurting costs and performance. There are a lot of hopes and prayers in the 1.8 nodes but things look grim.

Power is a massive problem for everyone. It is a MASSIVE a problem IN the data center and it is a problem for GPU's at home. Considering that locally is a PHONE for most people it's an even bigger problem. With all this power comes cooling issues. The industry is starting to look at all sorts of interesting ways to move heat away from cores... ones that don't involve air.

Design has hit a wall as well. If you look at NVIDIA's latest offering its IPC, (thats Instructions Per Clock cycle) you will find they are flat. The only gains between the latest generation and previous have come from small frequency upticks. These gains came from using "more power!!!", and thats a problem because...

Memory is a problem. There is a reason that the chips for GPU's are soldered on to the boards next to the processors. There is a reason that laptops have them soldered on too. CAMM try's to fix some of this but the results are, to say the least, disappointing thus far.

All of this has been hitting cpu's slowly, but we have also had the luxury of "more cores" to throw at things. If you go back 10-15 years a top end server is about the same as a top end desktop today (core count, single core perf). Because of all of the above issues I don't think you are going to get 700+ core consumer desktops in a decade (current high end for server CPU)... because of power, costs etc.

Unless we see some foundational breakthrough in hardware (it could happen), you wont see the normal generational lift in performance that you have in the past (and I would argue that we already haven't been seeing that). Someone is going to have to make MAJOR investments in the software side, and there is NO MOAT by doing so. Simply put it's a bad investment... and if we can't lower the cost of compute (and it looks like we can't) its going to be hard for small players to get in and innovate.

It's likely you're seeing a very real wall.

KronisLV • 8 days ago

The software is largely there: you can run Ollama, vLLM or whatever else you please today.

The models are somewhat getting there: even the smaller ones like Qwen3-30B-A3B and Devstral-23B are okay for some use cases and can run decently fast. They’re not amazing, but better than much larger models a year or two ago.

The hardware is absolutely not there: most development laptops will be too weak to run a bunch of tools, IDEs and local services alongside a LLM and will struggle to do everything at the pace of those cloud services.

Even if you seek compromise and get a pair of Nvidia L4 cards or something similar and put them on a server somewhere, the aforementioned Qwen3-30B-A3B will run at around 60 tokens/second for a single query but slow down as you throw a bunch of developers at it that all need chat and autocomplete. The smaller Devstral model will more than halve the performance at the starting point because it’s dense.

Tools like GitHub Copilot allow an Ollama connection pretty easily, Continue.dev also does but can be a bit buggy (their VS Code implementation is better than their JetBrains one), whereas the likes of RooCode only seem viable with cloud models cause they generate large system prompts and need more performance than you can squeeze out of somewhat modest hardware.

That said, with more MoE models and better training, things seem hopeful. Just look at the recent ERNIE-4.5 release, their model is a bit smaller than Qwen3 but has largely comparable benchmark results.

Those Intel Arc Pro B60 cards can’t come soon enough. Someone needs to at least provide a passable alternative to Nvidia, nothing more.

wizee • 8 days ago

On my M4 Max MacBook Pro, with MLX, I get around 70-100 tokens/sec for Qwen 3 30B-A3B (depending on context size), and around 40-50 tokens/sec for Qwen 3 14B. Of course they’re not as good as the latest big models (open or closed), but they’re still pretty decent for STEM tasks, and reasonably fast for me.

I have 128 GB RAM on my laptop, and regularly run multiple multiple VMs and several heavy applications and many browser tabs alongside LLMs like Qwen 3 30B-A3B.

Of course there’s room for hardware to get better, but the Apple M4 Max is a pretty good platform running local LLMs performantly on a laptop.

amelius • 8 days ago

Didn't Karpathy, in his latest talk, say something along the lines of: don't bother with less capable models, they are just a waste of time.

loudmax • 8 days ago

It probably depends what your objective is. One of the benefits you get from running less capable models is that it's easier to understand what their limitations are. The shortcomings of more powerful models are harder to see and understand, because the models themselves are so much more capable.

If you have no interest in the inner workings of LLMs and you just want the machine to spit out some end result while putting in minimal time and effort, then yes, absolutely don't waste your time with smaller, less capable models.

amelius • 8 days ago

stingraycharles • 8 days ago

And models like Qwen3 really don’t match the quality of Opus 4 and Gemini-Pro 2.5. And even if you manage to get your hands on 512GB of GPU RAM, it will be slow.

There’s simply so much going on under the hood at these LLM providers that are very hard to replicate locally.

justatdotin • 8 days ago

its impossible to catch up; but there is still much fertile and prospective territory within reach

ozgung • 9 days ago

> The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.

Just like every other subscription model, including the one in the Black Mirror episode, Common People. The value is too good to be true for the price at the beginning. But you become their prisoner in the long run, with increasing prices and degrading quality.

Aurornis • 9 days ago

I don’t get it. There are multiple providers. I cancel one provider and sign up for someone new in a few minutes when I feel like changing. I’ve been doing this every few months.

I think the only people worried about lock-in or Black Mirror themes are the people who are thinking about these subscriptions in an abstract sense.

It’s really easy to change providers. They’re all improving. Competition is intense.

darkoob12 • 8 days ago

In the early days of the Web, competition was intens in Search Engine market but eventually one of them won the competition and became the only viable option. I expect this will happen to AI as well. In future only one AI company will dominate the market and people will have no choice but to use it.

rwallace • 8 days ago

dbingham • 9 days ago

The same, in theory, applies to social media. But they've all enshittified in very similar ways now that they've captured their audiences. In theory there is intense competition between Meta, Twitter, TikTok, etc, but in actuality the same market forces drive the same enshittification across all of those platforms. They have convergent interests. If they all force more ads and suggested posts on you, they all make more money and you have no where to go.

People are reasonably worried that the same will happen to AI.

senko • 9 days ago

PeterStuer • 8 days ago

There is that chance. In other instances commoditization occurs before market consolidation.

As of now, specifically for coding assistance LLM, workflows remain generic enough to have relatively low switching costs between the different models.

lencastre • 9 days ago

Can you expand on your argument?

majormajor • 9 days ago

I don't think it's subscriptions so much as consumer startup pricing strategies:

Netflix/Hulu were "losing money on streaming"-level cheap.

Uber was "losing money on rides"-level cheap.

WeWork was "losing money on real-estate" level cheap.

Until someone releases wildly profitable LLM company financials it's reasonable to expect prices to go up in the future.

Course, advances in compute are much more reasonable to expect than advances in cheap media production, taxi driver availability, or office space. So there's a possibility it could be different. But that might require capabilities to hit a hard plateau so that the compute can keep up. And that might make it hard to justify the valuations some of these companies have... which could also lead to price hikes.

But I'm not as worried as others. None of these have lock-in. If the prices go up, I'm happy to cancel or stop using it.

For a current student or new grad who has only ever used the LLM tools, this could be a rougher transition...

Another thing that would change the calculation is if it becomes impossible to maintain large production-level systems competitively without these tools. That's presumably one of the things the companies are betting on. We'll see if they get there. At that point many of us probably have far bigger things to worry about.

bee_rider • 9 days ago

klabb3 • 9 days ago

> But I'm not as worried as others. None of these have lock-in.

They will. And when they do it will hit hard, especially if you’re not just a consumer but relying on it for work.

One vector is personalization. Your LLM gets to know you and your history. They will not release that to a different company.

Another is integrations. Perhaps you’re using LLMs for assistance, but only Gemini has access to your calendar.

Cloud used to be ”rent a server”. You could do it anywhere, but AWS was good & cheap. Now how is is it to migrate? Can you even afford the egress? How easy is it to combine offerings from different cloud providers?

x______________ • 9 days ago

Not op but a something from a few days ago that might be interesting for you:

  259. Anthropic tightens usage limits for Claude Code without telling users (techcrunch.com)
 395 points by mfiguiere 2 days ago | hide | 249 comments

https://news.ycombinator.com/item?id=44598254

nicce • 9 days ago

There is a reason why companies throw billions into AI and still are not profitable. They must be the first ones to hook the users in the long run and make service necessary part of user’s life. And then increase the price.

mleo • 9 days ago

Or expect price to deliver the service becomes cheaper. Or both.

nico • 9 days ago

Currently in the front page of HN: https://news.ycombinator.com/item?id=44622953

It isn’t specific to software/subscriptions but there are plenty of examples of quality degradation in the comments

signa11 • 9 days ago

enshittification/vendor-lockin/stickiness/… take your pick

jordanbeiber • 9 days ago

The argument is perhaps ”enshittification”, and that becoming reliant on a specific provider or even set of providers for ”important thing” will become problematic over time.

overgard • 9 days ago

I think it's an unlikely future.

What I think is more likely is people will realize that every line of code written is, to an extent, a liability, and generating massive amounts of sloppy insecure poorly performing code is a massive liability.

That's not to say that AI's will go away, obviously, but I think when the hype dies down and people get more accustomed to what these things can and can't do well we'll have a more nuanced view of where these things should be applied.

I suppose what's still not obvious to me is what happens if the investment money dries up. OpenAI and Anthropic, as far as I know, aren't anywhere near profitable and they require record breaking amounts of capital to come in just to sustain what they have. If what we currently see is the limit of what LLM's and other generative techniques can do, then I can't see that capital seeing a good return on its investment. If that's the case, I wonder if when the bubble bursts these things become massively more expensive to use, or get taken out of products entirely. (I won't be sad to see all the invasive Copilot buttons disappear..)

kossae • 9 days ago

The point on investment is apt. Even if they achieve twice as much as they’re able to today (some doubts amongst experts here), when the VC funding dries up we’ve seen what happens. It’s time to pay the piper. The prices rise to Enterprise-plan amounts, and companies start making much more real ROI decisions on these tools past the hype bubble. Will be interesting to see how that angle plays out. I’m no denier nor booster, but in the capitalist society these things inevitably balance out.

MoreQARespect • 8 days ago

The same thing happened with the first internet bubble. It didnt prevent the rise of the internet it just meant some players who, for instance, overinvested in infrastructure ended up taking an L while other players bought up their overbuilt assets for a song and capitalized upon it later.

rapind • 9 days ago

Not an issue and I'll tell you why.

If the gains plateau, then there's really no need to make productivity sacrifices here for the societal good, because there's so much competition, and various levels of open models that aren't far behind, that there will be no reason to stick with a hostile and expensive service unless it's tooling stays leaps ahead of the competition.

If the gains don't plateau, well then we're obsolete anyways, and will need to pivot to... something?

So I sympathize, but pragmatically I don't think there's much point in stressing it. I also suspect the plateau is coming and that the stock of these big players is massively overvalued.

worldsayshi • 9 days ago

> If the gains don't plateau, well then we're obsolete anyways

I think there's room for more nuance here. It could also be a situation of diminishing returns but not a sharp plateau. That could favour the big players. I think I find that scenario most likely, at least in between major breakthroughs.

rapind • 9 days ago

Well diminishing returns will have the same effect as a plateau. If you're on a log with your (much cheaper, Chinese) competition, then your advantage is very quickly microscopic.

cutemonster • 8 days ago

muglug • 9 days ago

> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.

Yet JetBrains has been a business longer than some of my colleagues have been alive, and Microsoft’s Visual Basic/C++/Studio made writing software for Windows much easier, and did not come cheap.

dakiol • 9 days ago

I see a big difference: I do use Jetbrains IDEs (they are nice), but I can switch to vim (or vscode) any time if I need to (e.g., let's say Jetbrains increase their price to a point that doesn't make sense, or perhaps they introduce a pervasive feature that cannot be disabled). The problem with paid LLMs is that one cannot easily switch to open-source ones (because they are not as good as the paid ones). So, it's a dependency that cannot be avoided, and that's imho something that shouldn't be overlooked.

jacobr1 • 9 days ago

Seems the same to me. If you are using the llm as a tool to build your product (rather than a dependency within it for functionality) you can easily switch to a different model, or IDE/Agentic-coder in the same way you can switch between vim and emacs. It might be a `worse` experience for you or have fewer feature, but you aren't locked in, other than in the sense of your preference for productivity. In fact in seems likely to mee that the tools I'll be using a year from now are going to be different than today - and almost certainly a different model will be leading. For example google surprised everyone with the quality of 2.5.

rolisz • 9 days ago

I was a hardcore vim user 10 years ago, but now I just use PyCharm to work. I'm paid to solve problems, not to futz around with vim configs.

Can you make vim work roughly the same way? Probably you can get pretty close. But how many hours do I have to sink into the config? A lot. And suddenly the PyCharm license is cheap.

And it's exactly the same thing with LLMs. You want hand crafted beautiful code, untainted by AI? You can still do that. But I'm paid to solve problems. I can solve them faster/solve more of them? I get more money.

skydhash • 9 days ago

Aurornis • 9 days ago

> The problem with paid LLMs is that one cannot easily switch to open-source ones (because they are not as good as the paid ones). So, it's a dependency that cannot be avoided

How is that any different than JetBrains versus vim?

Calling LLMs a strong dependency or a lock-in also doesn’t make sense. It’s so easy to switch LLMs or even toggle between them within something like Copilot.

You can also just not use them and write the code manually, which is something you still do in any non-trivial app.

I don’t understand all of these people talking about strong dependencies or vendor lock in, unless those comments are coming from people who haven’t actually used the tools?

throwaway8879p • 9 days ago

People who understand the importance of this choice but still opt for closed source software are the worst of the worst.

You won’t be able to switch to a meaningful vim if you channel your support to closed source software, not for long.

Best to put money where the mouth is.

muglug • 9 days ago

Until you’ve run a big open-source project you won’t quite understand how much time and energy it can eat up. All that effort won’t feed your family.

dakiol • 9 days ago

I don't contribute to vim precisely, but I do contribute to other open source projects. So, I do like to keep this balance between making open source tools better over time and using paid alternatives. I don't think that's possible tho with LLMs at the moment (and I dont think it would be possible in the future, but ofc i could be wrong).

eevmanu • 9 days ago

Open-weight and open-source LLMs are improving as well. While there will likely always be a gap between closed, proprietary models and open models, at the current pace the capabilities of open models could match today’s closed models within months.

layer8 • 9 days ago

> because they are not as good as the paid ones

The alternative is to restrict yourself to “not as good” ones already now.

wepple • 9 days ago

Anyone can switch from Claude to llama?

dakiol • 9 days ago

I don't think so. Let's do a silly experiment: antirez, could you ditch Gemini 2.5 PRO and Claude Opus 4, and instead use llama? Like never again go back to Gemini/Claude. I don't think he can (I don't think he would want to). I this is not on antirez, this is on everyone who's paying for LLMs at the moment: they are paying for them because they are so damn good compared to the open source ones... so there's no incentive to switch. But again, that's like the climate change: there's no incentive to pollute less (well, perhaps to save us, but money is more important).

ta12653421 • 9 days ago

Ah, there are community editions of the most imiportant tools (since 10+ years), and i doubt e.g. MS will close VS.NET Community Version in the future.

LeafItAlone • 9 days ago

>I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM

As of now, I’m seeing no lock-in for any LLM. With tools like Aider, Cursor, etc., you can swim on a whim. And with Aider, I do.

That’s what I currently don’t get in terms of investment. Companies (in many instances, VCs) are spending billions of dollars and tomorrow someone else eats their lunch. They are going to need to determine that method of lock-in at some point, but I don’t see it happening with the way I use the tools.

jerrygenser • 9 days ago

They can lock in by subsidizing the price of you use their tool, while making the default price larger for wrappers. This can draw people from the wrapper that can support multiple models to the specific CLI that supports the proprietary model.

aseipp • 9 days ago

Anthropic or Google offering a product and having margins they leverage is not "lock in" when there are dozens of alternatives at many various price points, including ones that can be run entirely locally (at high capex cost). It's like market fact #0 that, today, there is very little moat here other than capital, which is why OpenAI has now got multiple viable competitors despite their head start. Their APIs get copied, their tools get copied, the only way they remain competitive is with huge investments back into the core product to retain their leads. This is just what a competitive market looks like right now, and these offerings exist exactly because of downward pressure from other forces. The goal is of course to squeeze other players as much as possible, but these products have not yet proven to be sticky enough for their mere existence to do that. And there are many other players who have a lot of incentive to keep that downward pressure applied.

What you're describing is really just called "Offering a product for sale" and yes typically the people doing it will do, say, and offer things that encourage using their product over the competitors. That isn't "lock in" in any sense of the word. What are they supposed to do? Say "Our shit sucks and isn't price effective compared to others and we bring nothing to the table?" while giving you stuff for free?

LeafItAlone • 8 days ago

At present, the tools are effectively the same. Claude Code, OpenAI Codex, Google Gemini, etc. are basically the same CLI tools. Every so often one will introduce a new feature (e.g. MCP support), but it’s not long before the others also include it. It is easy to swap between them (and Aider) and on tasks where I want a “second opinion”, I do.

Even if they make their tooling cheaper, that’s not going to lock me in. It has to be the best model or have some killer feature. Which, again, could be usurped the next day with the rate these tools are advancing.

antirez • 9 days ago

It’s not that bad: K2 and DeepSeek R1 are at the level of frontier models of one year ago (K2 may be even better: I have enough experience only with V3/R1). We will see more coming since LLMs are incredibly costly to train but very simple in their essence (it’s like if their fundamental mechanic is built in the physical nature of the computation itself) so the barrier to entry is large but not insurmountable.

Fervicus • 9 days ago

> I cannot understand how programmers don't mind adding a strong dependency on a third party in order to keep programming

And how they don't mind freely opening up their codebase to these bigtech companies.

geoka9 • 9 days ago

> > I cannot understand how programmers don't mind adding a strong dependency on a third party in order to keep programming > > And how they don't mind freely opening up their codebase to these bigtech companies.

And how they don't mind opening up their development machines to agents driven by a black-box program that is run in the cloud by a vendor that itself doesn't completely understand how it works.

LeafItAlone • 9 days ago

You mean the same companies they are hosting their VCS in and providing the infrastructure they deploy their codebases to? All in support of their CRUD application that is in a space with 15 identical competitors? My codebase is not my secret sauce.

Fervicus • 9 days ago

Sure, the codebase itself isn't special. But it's the principle and ethics of it all. These companies trained their models unethically without consequence, and now people are eating up their artificially inflated hype and are lining up to give them money and their data on a silver platter.

LeafItAlone • 8 days ago

My opinion is that the value to be that I’ve been getting out of these tools in both my personal and professional projects is greater than value that they (and others using the downstream effects of them) get out of having my particular codebases.

Also, many of my personal projects are open sourced with pretty permissive licensing, so I’m not all that mad someone else has the code.

positron26 • 9 days ago

If the models are getting cheaper, better, and freer even when we use paid ones, then right now is the time to develop techniques, user interfaces, and workflows that become the inspirations and foundations of a future world of small, local, and phenomenally powerful models that have online learning, that can formalize their reason, that can bake deduction into their own weights and code.

Aurornis • 9 days ago

> but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming.

I don’t understand how people consider this a strong dependency.

Changing LLMs is trivial. Some times I’ll switch between LLMs on a whim to experiment. I can close one coding agent app and open another in seconds.

These claims about vendor lock-in and strong dependencies seem to mostly be coming from people watching from a distance, not the people on the ground using these tools.

bgwalter • 9 days ago

Yes, and what is worse is that the same mega-corporations who have been ostensibly promoting equity until 2025 are now pushing for a gated development environment that costs the same as a monthly rent in some countries or more than a monthly salary in others.

That problem does not even include lock-in, surveillance, IP theft and all other things that come with SaaS.

edanm • 9 days ago

> The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.

Why?

If I want to pick up many hobbies, not to mention lines of professional work, I have to pay for tools. Why is programming any different? Why should it be?

Complaining that tools that improve your life cost money is... weird, IMO. What's the alternative? A world in which people gift you life-and-work-improving tools for free? For no reason? That doesn't exist.

> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.

Btw, I think this was actually less true in the past. Compilers used to cost money in the 70s/80s. I think it's actually cyclical - most likely tools will cost money today, but then down the line, things will start getting cheaper again until they're free.

HarHarVeryFunny • 8 days ago

> Btw, I think this was actually less true in the past. Compilers used to cost money in the 70s/80s. I think it's actually cyclical - most likely tools will cost money today, but then down the line, things will start getting cheaper again until they're free.

I don't see it as an inevitable cycle. Free tools (gcc, emacs) and OS (Linux) came about as idealistic efforts driven by hobbyists, with no profit goal/expectation, then improved as large companies moved to support them out self-interest. Companies like RedHat have then managed to make a business model out of selling support for free software.

Free LLMs or other AI-based tools are only going to happen where there are similar altruistic and/or commercial interests at play, and of course the dynamics are very different given the massive development costs of LLMs. It's not a given that SOTA free tools will emerge unless it is the interest of some deep-pocketed entity to make that happen. Perhaps Meta will develop and release SOTA models, but then someone would have to host them which is also expensive. What would the incentive be for someone to host them for free or at-cost?

fragmede • 9 days ago

> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.

Not without a lot of hard thankless work by people like RMS to write said tools. Programming for a long while was the purview of Microsoft Visual Studio family, which cost hundreds, if not thousands of dollars. There existed other options, some of which was free, but, as is the case today with LLMs you can run at home, they were often worse.

This is why making software developer tools is such a tough market and why debugging remains basically in the dark ages (though there are the occasional bright lights like rr). Good quality tools are expensive, for doctors and mechanics, why do we as software developers expect ours to be free, libre and gratis?

bizzletk • 9 days ago

Programmers have the ability to write tools that make their jobs/lives easier. This is the perfect alignment of incentives where the person who benefits from the production of a high-quality tool has the ability to deliver one into their hands.

And once the tool has been made, many people just give it away for others to benefit from too.

jonas21 • 9 days ago

How is it a strong dependency? If Claude were to disappear tomorrow, you could just switch to Gemini. If all proprietary LLMs were to disappear tomorrow (I don't know how that would happen, but let's suppose for the sake of argument), then you switch to free LLMs, or even just go back to doing everything by hand. There's very little barrier to switching models if you have to.

righthand • 9 days ago

It’s weird that programmers will champion paying for Llm but not ad-free web search.

positron26 • 9 days ago

Ad-free search doesn't by itself produce a unique product. It's just a product that doesn't have noise, noise that people with attention spans and focus don't experience at all.

Local models are not quite there yet. For now, use the evil bad tools to prepare for the good free tools when they do get there. It's a self-correcting form of technical debt that we will never have to pay down.

righthand • 9 days ago

“To prepare for the good free tools”

Why do I have to prepare? Once the good free tools are available, it should just work no?

positron26 • 9 days ago

It should not be shocking that we do things better the more we do them. The designs take time to emerge. Experience makes better ideas develop.

As an Emacs user, I anticipate further programming to refine how the glove fits the hand. The solutions we will want as individuals have a lot of dispersion, so I cannot rely on others for everything.

There are plenty of times where I download what others have written and use it as it is, within the bounds of some switches and knobs. Do you want to have such a hands off approach with your primary interface to a computer?

haiku2077 • 9 days ago

I pay for search and have convinced several of my collaborators to do so as well

righthand • 9 days ago

I think the dev population mostly uses free search, just based on the fact no one has told me to “Kagi it” yet.

haiku2077 • 9 days ago

conradkay • 9 days ago

They have adblock

azan_ • 9 days ago

Paid models are just much, much better.

dakiol • 9 days ago

Of course they are. I wouldn't expect otherwise :)

But the price we're paying (and I don't mean money) is very high, imho. We all talk about how good engineers write code that depends on high-level abstractions instead of low-level details, allowing us to replace third party dependencies easily and test our apps more effectively, keeping the core of our domain "pure". Well, isn't it time we started doing the same with LLMs? I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones. That would at least allow us to switch to a free and opensource version if the companies behind the private LLMs go rogue. I'm afraid tho that wouldn't be enough, but it's a starting point.

To put an example: what would you think if you need to pay for every single Linux process in your machine? Or for every Git commit you make? Or for every debugging session you perform?

vunderba • 9 days ago

> an open source tool that can plug into either free and open source LLMs or private ones

Fortunately there are many of these that can integrate with offline LLMs through systems like LiteLLM/Ollama/etc. Off the top of my head, I'd look into Continue, Cline and Aider.

https://github.com/continuedev/continue

https://github.com/cline/cline

https://github.com/Aider-AI/aider

azan_ • 9 days ago

> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones. That would at least allow us to switch to a free and opensource version if the companies behind the private LLMs go rogue. I'm afraid tho that wouldn't be enough, but it's a starting point.

There are open source tools that do exactly that already.

dakiol • 9 days ago

rahimnathwani • 9 days ago

  But every single post I read don't mention them?

Why would they?

Does every single post about a Jetbrains feature mention that you can easily switch from Jetbrains to an open source editor like VS Code or vim?

Arainach • 9 days ago

>every single post I read don't mention them

Because the models are so much worse that people aren't using them.

Philosophical battles don't pay the bills and for most of us they aren't fun.

There have been periods of my life where I stubbornly persisted using something inferior for various reasons - maybe I was passionate about it, maybe I wanted it to exist and was willing to spend my time debugging and offer feedback - but there a finite number of hours in my life and often I'd much rather pay for something that works well than throw my heart, soul, time, and blood pressure at something that will only give me pain.

simonw • 9 days ago

> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones.

I have been building that for a couple of years now: https://llm.datasette.io

ghm2180 • 9 days ago

> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones.

Has someone computed/estimated what is at cost $$$ value of utilizing these models at full tilt: several messages per minute and at least 500,000 token context windows? What we need is a wikipedia like effort to support something truly open and continually improving in its quality.

IanCal • 8 days ago

> Well, isn't it time we started doing the same with LLMs? I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones.

Almost all providers and models can be used with the OpenAI api and swapping between them is trivial.

airstrike • 9 days ago

None of that applies here since we could all easily switch to open models at a moment's notice with limited costs. In fact, we switch between proprietary models every few months.

It just so happens that closed models are better today.

moron4hire • 9 days ago

They are a little better. Sometimes that little bit is an activation-energy level of difference. But overall, I don't see a huge amount of difference in quality between the open and closed models. Most of the time, it just takes a little more effort to get as good of results out of the open models as the closed ones.

blackoil • 9 days ago

That's a code time dependency which is of least concern. That's like saying how can a company hire a developer, they are now having a dependency.

Code written will continue to run, you can use alternate LLM to further write code. Unless you are a pure vibe coder, you can still type the code. IF programmer stop learning how to write code, that's on them and not Claude's or antirez responsibility.

We are using best tool available to us, right now it is Claude Code, tomorrow it may be something from OpenAI, Meta or Deepseek.

dirkc • 8 days ago

I've personally felt this way about many proprietary tech ecosystems in the past. Still do. I don't want to invest my energy to learn something if the carpet can be pulled from under my feet. And it does happen.

But that is my personal value judgement. And it doesn't mean other people will think the same. Luckily tech is a big space.

nominallyfree • 9 days ago

Us FORTH and LISP hackers will be doing free range code forever.

We can use cheap hardware that can be fixed with soldering irons and oscilloscopes.

People said for decades our projects just become weird DSLs. And now whatever little thing I want to do in any mainstream language involves learning some weird library DSL.

And now people be needing 24h GPU farm access to handle code.

In 50 years my grandkids that wish to will be able to build, repair and program computers with a garage workbench and old wrinkled books. I know most of the software economy will end up in the hands of major corporations capable of paying through the nose for black box low code solutions.

Doesn't matter. Knowledge will set you free if you know where to look.

smokel • 8 days ago

> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.

This was largely not the case before 2000, and may not be the case after, say 2030. It may well be that we are living in extraordinary times.

Before 2000 one had to buy expensive compilers and hardware to do serious programming. The rise of the commercial internet has made desktop computers cheaper, and the free software movement (again mostly by means of commercial companies enjoying the benefits) has made a lot of free software available.

However, we now have mostly smartphones, and serious security risks to deal with.

webappguy • 9 days ago

I personally can’t wait for programming to ‘die’. It has stolen a decade of my life minimum. Like veterinarians being trained to help pets ultimately finding out a huge portion of the job is killing them. I was not sufficiently informed that I’d spend a decade arguing languages, dealing with thousands of other developers with diverging opinions, legacy code, poorly if at all maintained libraries, tools, frameworks, etc if you have been in the game at least a decade please don’t @. Adios to programming as it was (happily welcoming a new DIFFERENT reality whatever that means). Nostalgia is for life, not staring at a screen 8hrs a day

llbbdd • 9 days ago

You got some arguably rude replies to this but you're right. I've been doing this a long time and the stuff you listed is never the fun part despite some insistence on HN that it somehow is. I love programming as a platonic ideal but those moments are fleeting between the crap you described and I can't wait for it to go.

vunderba • 9 days ago

> It has stolen a decade of my life minimum.

Feels like this is a byproduct of a poor work-life balance more than an intrinsic issue with programming itself. I also can't really relate since I've always enjoyed discussing challenging problems with colleagues.

I'm assuming by "die" you mean some future where autonomous agentic models handle all the work. In this world, where you can delete your entire programming staff and have a single PM who tells the models what features to implement next, where do you imagine you fit in?

I just hope for your sake that you have a fallback set of viable skills to survive in this theoretical future.

lbrito • 9 days ago

Maybe it's just not for you.

I've been programming professionally since 2012 and still love it. To me the sweet spot must've been the early mid 2000s, with good enough search engines and ample documentation online.

midasz • 9 days ago

Damn dude. I'm just having fun most of the time. The field is so insanely broad that if you've got an ounce of affinity there's a corner that would fit you snugly AND you'd make a decent living. Take a look around.

bluefirebrand • 9 days ago

Feel free to change careers and get lost, no one is forcing you to be a programmer.

If you feel it is stealing your life, then please feel free to reclaim your life at any time.

Leave the programming to those of us who actually want to do it. We don't want you to be a part of it either

oblio • 9 days ago

Don't be rude.

__loam • 9 days ago

hsuduebc2 • 9 days ago

Did you expect computer programming not to involve this much time at a computer screen? Most modern jobs especially in tech do. If it’s no longer fulfilling, it might be worth exploring a different role or field instead of waiting for the entire profession to change.

I understand your frustration but the problem is mostly people. Not the particular skill itself.

firecall • 9 days ago

A big issue will be if we see a further decline in SDK and API documentation.

Anecdotally, I recently find myself building a simple enough Android app for a client based on an iOS app I have already built.

I don’t really know Android dev, but know enough to get started having patched an App somewhat recently.

So started from scratch and having heard that the Android SDK had something similar to SwiftUI that’s what I went for.

In building the App I have found that Gemini is far more useful than Googles own documentation and tutorials.

The tutorials for basic things like NavBars is just wrong in places, maybe due to being outdated.

I’ve reported issues using the feedback tool. But still….

hahahal • 9 days ago

> Gemini 2.5 PRO | Claude Opus 4

What I thought you were going to say is “Gemini- wha?”.

I’ve used Gemini 2.5 PRO and would definitely not use it for most of my coding tasks. Yes, it’s better at hard things, but it’s not great at normal things.

I’ve not used Claude 4 Opus yet- I know it’s great at large context- but Claude 4 Sonnet Thinking is mostly good unless the task is too complex, and Claude 4 Sonnet is good for basic operations in 1-2 files- beyond that it’s challenged and makes awful mistakes.

atleastoptimal • 9 days ago

The advantage of having smarter models is greater than the risk/harm of them being closed source, especially so when speed of execution is a major factor.

michaelbuckbee • 8 days ago

Genuine question as I don't understand the deeper aspects of this, but is it possible that we will see AI specific hardware in the future that will make local AI's more possible?

My general thinking is that we're using graphics processors which _work_ but aren't really designed for AI (lack of memory, etc.).

PeterStuer • 8 days ago

I share uour concerns absolutely, and would run an open model local (or self hosted), but for now the reality is that what is available as such is not satisfactory compared to the closed frontier models only available through SaaS.

I hope this will change before (captured) regulation strangles open models.

zaptheimpaler • 9 days ago

Where is the strong dependency? I can point Cursor at Openrouter and use any LLM I want, from multiple providers. Every LLM provider supports the same OpenAI completions API.

I wish the stuff on top - deep research modes, multimodal, voice etc. had a more unified API as well but there's very little lock-in right now.

dinkumthinkum • 9 days ago

If you are so tied to an LLM that you cannot program without one then you are not a programmer. It is not the same as with an IDE or whatever, those are essentially ergonomic and do not speak to your actual competence. This will probably be an unpopular take but its just reality.

paulddraper • 9 days ago

And you thought Visual Studio was free?

Or Windows?

Or an Apple Developer License?

There are free/free-ish, options, but there have always been paid tools.

asadotzler • 9 days ago

Paid tools were rarely monthly subscriptions without which you could not produce code or executables.

Further, I can write and compile an application for Mac, Windows, or Linux with entirely free tools including tools provided directly by Apple and Microsoft.

This discussion is about is a monthly subscription, without which, most up and coming coders, certainly "vibe coders," are completely dead in the water.

These two dependencies are not the same. If that's not obvious to you, I don't know what else to say to that.

paulddraper • 8 days ago

You can write code with free tools.

ls-a • 9 days ago

Programming on/for Apple has never been free. So it's not a surprise to some engineers. You're right that programming might become the way of Apple in the future. However, I think engineers should rejoice because AI is the best thing that happened to them.

asadotzler • 9 days ago

I paid ~$600 for my first Windows compiler, over a grand in today's money. But I didn't have to keep paying every month forever to be able to code at all. Take Claude or whatever away from a vibe coder and they're completely dead in the water.

Apple's fee is like that Visual Studio purchase I made, a fee that lets me compile for their platforms. It's not a subscription without which I can't code anything at all.

Creating a new dependency on monthly subscriptions to unsustainable companies or products is a huge step away from accessible programming of the last 50 years and one that should not so casually be dismissed.

vitaflo • 9 days ago

Most people who develop software expect other people to pay for it. That's why devs make six figures. Yet devs want everything for free to create the software they charge money for? That's a bit hypocritical.

floucky • 9 days ago

Why do you see this as a strong dependency? The beauty of it is that you can change the model whenever you want. You can even just code yourself! This isn't some no-code stuff.

asadotzler • 9 days ago

Change the model, learn how to talk to a new and poorly documented model, and get entirely different results. Yep, easy as pie.

kelvinjps10 • 9 days ago

LLMS are basically free? Yes you're rate limited but I have just started paying for them now, before I'd bounce around between the providers but still free

jacobr1 • 9 days ago

The most cutting edge-models aren't usually free, at least at first.

ipaddr • 9 days ago

They are good enough for 90% of people and 90% of cases that I would trust an llm for.

What advantages are people getting on these new models?

dcre • 9 days ago

To be pedantic, I think it’s too late for a product with close to a billion monthly active users to be “normalized.”

sneak • 9 days ago

Strong dependency? I can still code without LLMs, just an order of magnitude slower.

There is no dependency at all.

wahnfrieden • 8 days ago

The dependency arises from rising productivity expectations from employers and the market.

A cabbie is still dependent on the car if they can still drive a horse buggy and order of magnitude slower.

You are only not dependent if you are not doing this professionally and can enjoy the leisure of tending to your horses.

kelvinjps10 • 9 days ago

Doesn't already happen with some people being unable to code without Google or similar?

irthomasthomas • 8 days ago

Kimi K2 competes with these. Even beats them in some evals.

20k • 9 days ago

+1, I use exclusively free tools for this exact reason. I've been using the same tools for 15 years now (GCC + IDE), and they work great

There is a 0% chance that I'm going to subscribe to being able to program, because its actively a terrible idea. You have to be very naïve to think that any of these companies are still going to be around and supporting your tools in 10-20 years time, so if you get proficient with them you're absolutely screwed

I've seen people say that AI agents are great because instead of using git directly, they can ask their AI agent to do it. Which would be fine if it was a free tool, but you're subscribing to the ability to even start and maintain projects

A lot of people are about to learn an extremely blunt lesson about capitalism

moron4hire • 9 days ago

A lot of people's problems with Git would go away if they just took a weekend and "read the docs." It's shocking how resistant most people are to the idea of studying to improve their craft.

I've been spending time with my team, just a few hours a week, on training them on foundational things, vs every other team in the company just plodding along, trying to do things the same way they always have, which already wasn't working. It's gotten to where my small team of 4 is getting called in to clean up after these much larger teams fail to deliver. I'm pretty proud of my little junior devs.

throw-number9 • 9 days ago

> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible .. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.

Yeah, coding (and to a lesser extent IT in general) at one point was a real meritocracy, where skill mattered more than expensive/unnecessary academic pedigree. Not perfect of course, but real nevertheless. And coders were the first engineers who really said "I won't be renting a suit for an interview, I think an old t-shirt is fine" and we normalized that. Part of this was just uncompromisingly practical.. like you can either do the work or not, and fuck the rest of that noise. But there was also a pretty punk aspect to this for many people in the industry.. some recognition that needing to have money to make money was a bullshit relic of closeted classism.

But we're fast approaching a time where both the old metrics (how much quality code are you writing how fast and what's your personal open source portfolio like?) and the new metrics (are you writing a blog post every week about your experience with the new models, is your personal computer fast enough to even try to run crappy local models?) are both going to favor those with plenty of money to experiment.

It's not hard to see how this will make inequality worse and disadvantage junior devs, or just talented people that didn't plan other life-events around purchasing API credits/GPUs. A pay-to-play kind of world was ugly enough in politics and business so it sucks a lot to see it creeping into engineering disciplines but it seems inevitable. If paying for tokens/GPU ever allows you to purchase work or promotion by proxy, we're right back to this type of thing https://en.wikipedia.org/wiki/Purchase_of_commissions_in_the...

cafp12 • 9 days ago

IMO It's not unlike all other "dev" tools we use at all. There are tons of free and open tools that usually lag a bit behind the paid versions. People pay for jetbrains, for mac os, and even to search the web (google ads).

You have very powerful open weight models, they are not the cutting edge. Even those you can't really run locally, so you'd have to pay a 3rd party to run it.

Also the competition is awesome to see, these companies are all trying hard to get customers and build the best model and driving prices down, and giving you options. No one company has all of the power, its great to see capitalism working.

mirekrusin • 9 days ago

You don't pay for macOS, you pay for apple device, operating system is free.

kgwgk • 9 days ago

You do pay for the operating system. And for future upgrades to the operating system. Revenue recognition is a complex and evolving issue.

cafp12 • 9 days ago

Thanks captain missing the point

mirekrusin • 8 days ago

You're welcome.

belter • 9 days ago

The issue is somebody will have to debug and fix what those LLM Leeches made up. I guess then companies will have to hire some 10x Prompters?

jacooper • 9 days ago

Kimi k2 exists now.

TacticalCoder • 9 days ago

I rely on these but there's zero loyalty. The moment something better is there, like when Gemini 2.5 Pro showed up, I immediately switch.

That's why I drink the whole tools kool-aid. From TFA:

> In this historical moment, LLMs are good amplifiers and bad one-man-band workers.

That's how I use them: write a function here, explain an error message there. I'm still in control.

I don't depend on LLMs: they just amplify.

I can pull the plug immediately and I'm still able to code, as I was two years ago.

Shall DeepSeek release a free SOTA model? I'll then use that model locally.

It's not because I use LLMs that I have a strong dependency on them.

Just like I was already using JetBrains' IntelliJ IDEA back when many here were still kids (and, yup, it was lightyears better than NetBeans and Eclipse) didn't make me have a strong dependency on JetBrains tools.

I'm back to Emacs and life is good: JetBrains IDEs didn't make me forget how to code, just as LLMs won't.

They're just throwaway tools and are to be regarded as such.

glitchc • 9 days ago

I'm certain these are advertorials masquerading as personal opinions. These people are being paid to promote the product, either through outright cash, credits on their platform or just swag.

simonw • 9 days ago

I recommend readjusting your advertorial-detecting radar. antirez isn't taking kickbacks from anyone.

I added a "disclosures" section to my own site recently, in case you're interested: https://simonwillison.net/about/#disclosures

amirhirsch • 9 days ago

It started out as an innocent kv cache before the redis industrial complex became 5% of the GDP

tptacek • 9 days ago

So, just so I have this straight, you think antirez is being paid by Google to hype Gemini.

Herring • 9 days ago

A lot of people are really bad at change. See: immigration. Short of giving everyone jazz improv lessons at school, there's nothing to be done.

To be fair, change is not always good. We still haven't fixed fitness/obesity issues caused (partly) by the invention of the car, 150 years later. I think there's a decent chance LLMs will have the same effect on the brain.

jstummbillig • 9 days ago

> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.

Since when? It starts with computers, the main tool and it's architecture not being free and goes from there. Major compilers used to not be free. Major IDEs used to not be free. For most things there were decent and (sometimes) superior free alternatives. The same is true for LLMs.

> The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.

That "excuse" could exactly capture the issue. It does not, because you chose to make it a weirder issue. Just as before: You will be free to either not use LLMs, or use open-source LLMs, or use paid LLMs. Just as before in the many categories that pertain to programming. It all comes at a cost, that you might be willing to pay and somebody else is free to really does not care that much about.

randallsquared • 9 days ago

> Major compilers used to not be free. Major IDEs used to not be free.

There were and are a lot of non-free ones, but since the 1990s, GCC and interpreted languages and Linux and Emacs and Eclipse and a bunch of kinda-IDEs were all free, and now VS Code is one of the highest marketshare IDEs, and those are all free. Also, the most used and learned programming language is JS, which doesn't need compilers in the first place.

jstummbillig • 9 days ago

There are free options and there continue to be non-free options. The same is true for LLMs.

vorador • 9 days ago

bluefirebrand • 9 days ago

> Major compilers used to not be free

There's never been anything stopping you from building your own

Soon there will be. The knowledge of how to do so will be locked behind LLMs, and other sources of knowledge will be rarer and harder to find as a result of everything switching to LLM use

jstummbillig • 9 days ago

For the past decades knowledge was "locked" behind search engines. Could you have rolled your own search engine indexing the web, to unlock that knowledge? Yes, in the same theoretical way that you can roll your own LLM.

bluefirebrand • 9 days ago

quantumHazer • 9 days ago

I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).

This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.

The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.

antirez • 9 days ago

Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.

spyckie2 • 9 days ago

Hey antirez,

What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.

Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.

jacobr1 • 9 days ago

Our team has pretty aggressively started using LLMs for automated code review. It will look at our PRs and post comments. We can adding more material for different things for it to consider- from a looking at a summarized version of our API guidelines, general prompts like, "You are an expert software engineer and QA professional, review this PR and point out any bugs or other areas of technical risk. Make concise suggestions for improvement where applicable." - it catches a ton of stuff.

Another area we've started doing is having it look at build failures and writing a report on suggested root causes before even a human looks at it - saves time.

Or (and we haven't rolled this out automatically yet but are testing a prototype) having it triage alarms from our metrics, with access to the logs and codebase to investigate.

infecto • 9 days ago

dearilos • 7 days ago

ghm2180 • 9 days ago

> but rather to learn how to conduct research

Further, I always assumed PhD level of knowledge meant coming up with the right questions. I would say it is at best a "Lazy Knowledge Rich worker", it won't explore hypothesis if you don't *ask it* to. A PHD would ask those questions to *themselves*. Let me give you a simple example:

The other day Claude Code(Max Pro Subscription) commented out a bunch of test assertions as a part of a related but separate test suite it was coding. It did not care to explore — what was a serious bug — why it was commenting it out because of a faulty assumption in the original plan. I had to ask it to change the plan by doing the ultra-think, think-hard trick to explore why it was failing, amend the plan and fix it.

The bug was the ORM object had null values because it was not refreshed after the commit and was fetched before by another DB session that had since been closed.*

vl • 9 days ago

It's ultrathink one word, not ultra-think. (See below).

I use Claude Code with Opus, and had same experience - was pushing it hard to implement complex test, and it gave me an empty test function with test plan inside in a comment (lol).

I do want to try Gemini 2.5 Pro, but I don't know a tool which would make experience compatible to Claude Code. Would it make sense to use with Cursor? Do they try to limit context?

  ~/.nvm/versions/node/v22.16.0/lib/node_modules/@anthropic-ai/claude-code $ npx prettier cli.js | ack ultrathink -C 20
  var jw1 = { HIGHEST: 31999, MIDDLE: 1e4, BASIC: 4000, NONE: 0 },
  Yk6 = {
    english: {
      HIGHEST: [
        { pattern: "think harder", needsWordBoundary: !0 },
        { pattern: "think intensely", needsWordBoundary: !0 },
        { pattern: "think longer", needsWordBoundary: !0 },
        { pattern: "think really hard", needsWordBoundary: !0 },
        { pattern: "think super hard", needsWordBoundary: !0 },
        { pattern: "think very hard", needsWordBoundary: !0 },
        { pattern: "ultrathink", needsWordBoundary: !0 },
      ],
      MIDDLE: [
        { pattern: "think about it", needsWordBoundary: !0 },
        { pattern: "think a lot", needsWordBoundary: !0 },
        { pattern: "think deeply", needsWordBoundary: !0 },
        { pattern: "think hard", needsWordBoundary: !0 },
        { pattern: "think more", needsWordBoundary: !0 },
        { pattern: "megathink", needsWordBoundary: !0 },
      ],
      BASIC: [{ pattern: "think", needsWordBoundary: !0 }],
      NONE: [],
    },

andrew_k • 9 days ago

Google has gemini-cli that is pretty close to Claude Code in terms of experience https://github.com/google-gemini/gemini-cli and has a generous free tier. Claude Code is still superior in my experience, Gemini CLI can go off-course pretty quickly if you accept auto edits. But it is handy for code reviews and planning with it's large context window.

elyase • 9 days ago

https://github.com/sst/opencode

chis • 9 days ago

If you understand that a PhD is about much more than just knowledge, it's still the case that having easy access to that knowledge is super valuable. My last job we often had questions that would just traditionally require a PhD-level person to answer, even if it wasn't at the limit of their research abilities. "What will happen to the interface of two materials if voltage is applied in one direction" type stuff, turns out to be really hard to answer but LLMs do a decent job.

quantumHazer • 9 days ago

Have you checked experimentally the response of the LLM?

Anyway I don't think this is ""PhD-knowledge"" questions, but job related electrical engineering questions.

pcrh • 8 days ago

Quite. "PhD-level knowledge" is the introduction to one's PhD thesis. The point of doing a PhD is to extend knowledge beyond what is already known, i.e that which cannot be known by an LLM.

ramraj07 • 9 days ago

Except during the data science craze of 2015s, there was never a situation that you could just have a phd in any field and get any "phd level job", so whatever pedantic idea you have of what phds learn, not a single person who's hiring phds agrees with you. On the contrary, even most phd professors treat you as only a vessel of the very specific topic you studied during your phd. Go try to get a postdoc in a top lab when your PhD was not exactly what they work on already. I know I tried! Then gave up.

kgwgk • 9 days ago

> The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.

It’s not like once you have a PhD anyone cares about the subject, right? The only thing that matters is that you learnt to conduct research.

quantumHazer • 9 days ago

I can't understand why once you have a PhD anyone should care more about the subject.

airstrike • 9 days ago

I think all conversations about coding with LLMs, vibe coding, etc. need to note the domain and choice of programming language.

IMHO those two variables are 10x (maybe 100x) more explanatory than any vibe coding setup one can concoct.

Anyone who is befuddled by how the other person {loves, hates} using LLMs to code should ask what kind of problem they are working on and then try to tackle the same problem with AI to get a better sense for their perspective.

Until then, every one of these threads will have dozens of messages saying variations of "you're just not using it right" and "I tried and it sucks", which at this point are just noise, not signal.

cratermoon • 9 days ago

They should also share their prompts and discuss exactly how much effort went into checking the output and re-prompting to get the desired result. The post hints at how much work it takes for the human, "If you are able to describe problems in a clear way and, if you are able to accept the back and forth needed in order to work with LLMs ... you need to provide extensive information to the LLM: papers, big parts of the target code base ... And a brain dump of all your understanding of what should be done. Such braindump must contain especially the following:" and more.

After all the effort getting to the point where the generated code is acceptable, one has to wonder, why not just write it yourself? The time spent typing is trivial to all the cognitive effort involved in describing the problem, and describing the problem in a rigorous way is the essence of programming.

keeda • 9 days ago

> After all the effort getting to the point where the generated code is acceptable, one has to wonder, why not just write it yourself?

Because it is still way, way, way faster and easier. You're absolutely right that the hard part is figuring out the solution. But the time spent typing is in no way trivial or cognitively simple, especially for more complex tasks. A single prompt can easily generate 5 - 10x the amount of code in a few seconds, with the added bonus that it:

a) figures out almost all the intermediate data structures, classes, algorithms and database queries;

b) takes care of all the boilerplate and documentation;

c) frequently accounts for edge cases I hadn't considered, saving unquantifiable amounts of future debugging time;

d) and can include tests if I simply ask it to.

In fact, these days once I have the solution figured out I find it frustrating that I can't get the design in my head into the code fast enough manually. It is very satisfying to have the AI churn out reams of code, and immediately run it (or the tests) to see the expected result. Of course, I review the diff closely before committing, but then I do that for any code in any case, even my own.

gf000 • 9 days ago

> frequently accounts for edge cases I hadn't considered, saving unquantifiable amounts of future debugging time;

And creates new ones you wouldn't even consider before, creating just as much, if not more future debugging :D

sothatsit • 9 days ago

I have actually been surprised at how few subtle bugs like this actually come up when using tools like Claude Code. Usually the bugs it introduces are glaringly obvious, and stem from a misunderstanding of the prompt, not due to the code being poorly thought out.

This has been a surprise to me, as I expected code review of AI-generated code to be much more difficult than it has been in practice. Maybe this has been because I only really use LLMs to write code that is easy to explain, and therefore probably not that complicated. If code is more complicated, then I will write it myself.

keeda • 8 days ago

That's what the code review is for :-) But to echo the sibling comments, I've not caught a subtle edge case or bug in the generated code in more than a year and half. There are mistakes and failure modes for sure, but they are very glaring, to the extent that I simply throw that code away and try again.

That said, I've adopted a specific way of working with AI that is very effective for my situation (mentioned in my comment history, but echoes a lot of what TFA advises.)

jatora • 9 days ago

IshKebab • 9 days ago

Sure but you know about all the edge cases you know about, so you can fix bugs in those. The AI sometimes points out edge cases you didn't think of.

cratermoon • 7 days ago

> A single prompt can easily generate 5 - 10x the amount of code in a few seconds

That doesn't sound like a great outcome. Every line of code is a potential bug and a liability. The least important skill in programming is typing speed.

datastoat • 9 days ago

> They should also share their prompts

Here's a recent ShowHN post (a map view for OneDrive photos), which documents all the LLM prompting that went into it:

https://news.ycombinator.com/item?id=44584335

UncleEntity • 9 days ago

> After all the effort getting to the point where the generated code is acceptable, one has to wonder, why not just write it yourself?

You know, I would often ask myself that very question...

Then I discovered the stupid robots are good at designing a project, you ask them to produce a design document, argue over it with them for a while, make revision and changes, explore new ideas, then, finally, ask them to produce the code. It's like being able to interact with the yaks you're trying to shave, what's not to love about that?

tines • 9 days ago

I would assume the argument is that you only need to provide the braindump and extensive information one time (or at least, collect it once, if not upload once) and then you can take your bed of ease as the LLM uses that for many tasks.

skydhash • 9 days ago

The thing is no one writes that much code, at least anyone that cares about code reuse. Mostly the times is spent collecting the information (especially communication with stakeholder), and verifying that the code you wrote didn't break anything.

spaceman_2020 • 8 days ago

As someone who just used AI to create a very comprehensive creative visual marketing campaign, there is absolutely a massive gap in output depending on your prompting skills

You can get images that can go straight to print or end up with pure slop based on your prompt

skippyboxedhero • 9 days ago

Have used Claude's GitHub action quite a bit now (10-20 issue implementations, a bit more PR reviews), and it is hit and miss so agree with the enhanced coding rather than just letting it run loose.

When the change is very small, self-contained feature/refactor it can mostly work alone, if you have tests that cover the feature then it is relatively safe (and you can do other stuff because it is running in an action, which is a big plus...write the issue and you are done, sometimes I have had Claude write the issue too).

When it gets to a more medium size, it will often produce something that will appear to work but actually doesn't. Maybe I don't have test coverage and it is my fault but it will do this the majority of the time. I have tried writing the issue myself, adding more info to claude.md, letting claude write the issue so it is a language it understands but nothing works, and it is quite frustrating because you spend time on the review and then see something wrong.

And anything bigger, unsurprisingly, it doesn't do well.

PR reviews are good for small/medium tasks too. Bar is lower here though, much is useless but it does catch things I have missed.

So, imo, still quite a way from being able to do things independently. For small tasks, I just get Claude to write the issue, and wait for the PR...that is great. For medium (which is most tasks), I don't need to do much actual coding, just directing Claude...but that means my productivity is still way up.

I did try Gemini but I found that when you let it off the leash and accept all edits, it would go wild. We have Copilot at work reviewing PRs, and it isn't so great. Maybe Gemini better on large codebases where, I assume, Claude will struggle.

milofeynman • 9 days ago

The problem here is the infrastructure required to demo the changes to the user. Like yeah you made a code-change, but now I have to pull it, maybe setup data to get it in the right state, check if it's functioning how I want it to. Looking at the code it produced in a diff can waste a lot of your time if it doesn't even work as expected.

1024core • 9 days ago

I have found that if I ask the LLM to first _describe_ to me what it wants to do without writing any code, then the subsequent code generated has much higher quality. I will ask for a detailed description of the things it wants to do, give it some feedback and after a couple of iterations, tell it to go ahead and implement it.

hyperadvanced • 9 days ago

Seconded. Winsome strategy, plays to the strengths of the LLM without letting it run wild

Keyframe • 9 days ago

Unlike OP, from my still limited but intense month or so diving into this topic so far, I had better luck with Gemini 2.5 PRO and Opus 4 on more abstract level like architecture etc. and then dealing input to Sonnet for coding. I found 2.5 PRO, and to a lesser degree Opus, were hit or miss; A lot of instances of them circling around the issue and correcting itself when coding (Gemini especially so), whereas Sonnet would cut to the chase, but needed explicit take on it to be efficient.

khaledh • 9 days ago

This is my experience too. I usually use Gemini 2.5 Pro through AI Studio for big design ideas that need to be validated and refined. Then take the refined requirements to Claude Code which does an excellent job most of the time in coding them properly. Recently I tried Gemini CLI, and it's not even close to Claude Code's sharp coding skills. It often makes syntax mistakes, and get stuck trying to get itself out of a rut; its output is so verbose (and fast) that it's hard to follow what it's trying to do. Claude Code has a much better debugging capability.

Another contender in the "big idea" reasoning camp: DeepSeek R1. It's much slower, but most of the time it can analyze problems and get to the correct solution in one shot.

antirez • 9 days ago

Totally possible. In general I believe that while more powerful in their best outputs, Sonnet/Opus 4 are in other ways (alignment / consistency) a regression on Sonnet 3.5v2 (often called Sonnet 3.6), as Sonnet 3.7 was. Also models are complex objects, and sometimes in a given domain a given model that on paper is weaker will work better. And, on top of that: interactive use vs agent requires different reinforcement learning training that sometimes may not be towards an aligned target... So also using the model in one way or the other may change how good it is.

jpdus • 9 days ago

This is also confirmed by internal cline statistics where Opus and Gemini 2.5 pro both perform worse than Sonnet 4 in real-world scenarios

https://x.com/pashmerepat/status/1946392456456732758/photo/1

headcanon • 8 days ago

> Gemini 2.5 PRO | Claude Opus 4

Glad to see my experience is reflected elsewhere. I've found Gemini 2.5 PRO to be the best bang-for-buck model: good reasoning and really cheap to run (counts as 1 request in cursor, where opus can blow my quotas out of the water). Code style works well for me too, its "basic" but thats what I want. If I have only one model to take to my deserted island this is the one I'd use right now.

For the heady stuff, I usually use o3 (but only to debug, its coding style is a bit weird for me), saving Opus 4 for when I need the "big guns".

I don't have Claude Code (cursor user for now), but if I did I'd probably use Opus more.

cadamsdotcom • 9 days ago

Having done a few months of this new “job” of agentic coding I strongly agree with everything in this post.

Frontier LLMs are easiest to work with for now. Open models _will_ catch up. We can be excited for that future.

You are able to learn things from LLMs, and you can ask them for recommendations for an approach to implement something. Or just tell your LLM the approach to take. Sometimes it overcomplicates things. You’ll develop an instinct for when that’s likely. You can head off the overcomplication ahead of time or ask for refactorings after the initial cut is built for you. After a while you get an instinct for which way will get the work done soonest. Most fascinating of all, it’ll all change again with the next round of frontier models.

You don’t need frontier models for every task. For instance I’ve been positively surprised by Github Copilot for straightforward features and fixes. When it’s obvious how to implement, and you won’t need to go back and forth to nail down finer design details, getting an initial PR from Copilot is a great starting place.

To everyone starting out, enjoy the ride, know that none of us know what we’re doing, and share what you learn along the way!

delduca • 9 days ago

I have a good example of how sometimes AI/LLM can write very very inefficient code: https://nullonerror.org/2025/07/12/ai-will-replace-programme...

dawnerd • 9 days ago

Similarly, AI is really bad at code golf. You’d think it would be great, able to know all the little secrets but nope. It needs verbose code.

verbify • 9 days ago

I wonder what fine tuning an LLM on code golf examples would produce.

lysecret • 9 days ago

IMO Claude code was a huge step up. We have a large and well structured python code base revolving mostly around large and complicated adapter pattern Claude is almost fully capable to implement a new adapter if given the right prompt/resources.

nlh • 9 days ago

Can anyone recommend a workflow / tools that accomplishes a slightly more augmented version of antirez’ workflow & suggestions minus the copy-pasting?

I am on board to agree that pure LLM + pure original full code as context is the best path at the moment, but I’d love to be able to use some shortcuts like quickly applying changes, checkpoints, etc.

My persistent (and not unfounded?) worry is that all the major tools & plugins (Cursor, Cline/Roo) all play games with their own sub-prompts and context “efficiency”.

What’s the purest solution?

afro88 • 9 days ago

You can actually just put Cursor in manual mode and it's the same thing. You 100% manage the context and there's no agentic loop.

If your codebase fits in the context window, you can also just turn on "MAX" mode and it puts it all in the context for you.

bGl2YW5j • 9 days ago

I use Jetbrains AI assistant for its great integration with the editor and the codebase, and have been experimenting with Claude Code too. Jetbrains Assistant still has better editor integration for things like reviewing generated diffs and generating code based on currently selected code.

My augmented workflow is to “chat” with GPT because it’s free and powerful, to refine my ideas and surface things I hadn’t thought about. Then I start writing code to get in to the flow of things. I’ve found if I use the LLM straight away, I disengage, become lazy and lose context and understanding over the code. In these situations, I’ve had to redo the code more often than not. Lack of understanding is one part of why, but more importantly, disengaged prompting leads to vague and incorrect outcomes.

When I’m very clear in my head about my goal, I create a prompt either directly from my cursor, or if the changes are larger, I ask the LLM to not apply the changes but instead show them to me. I do both these things within the IDE in the chat window. I review the code and sometimes I’m happy applying it as is, other times I copy and paste it and tweak it manually.

I’ve got barebones rules set up; I haven’t felt the need to go overboard. Jetbrains Assistant does a good job of passing relevant context to the model.

I keep my prompts top-down (https://en.wikipedia.org/wiki/BLUF_(communication)) and explicit. Sometimes I’m much more detailed than others, and I’ve found that extra detail isn’t always necessary for a good result.

williamzeng0 • 9 days ago

I'm actually building something for JetBrains, it's called https://sweep.dev. We're trying to bring next-edit prediction (like in Cursor) to JetBrains IDEs.

senko • 9 days ago

I use the agent panel in my editor of choice (Zed).

For each task I always start with a new (empty) context and manually tag relevant files to include (this is trivial since I know the codebase well).

First I use Claude 4 Sonnet in thinking mode (I could also use Gemini 2.5 Pro or Opus as per antirez' recommendations) to come up with a detailed plan on how to implement something (research/planning phase). I provide feedback and we iterate on the plan.

Then, in the same conversation I switch to Sonnet 4 non-thinking and tell it to implement what we just devised.

I manually review the changes and trst them. If something needs fixing or (more often) if I notice I missed some edge case/caveat, I tell it to do that (still same convo).

Commit, clear convo, next task.

For research that isn't directly tied to the code, I use ChatGPT or Claude (web apps) to brainstorm ideas, and sometimes copy/pasre these into the editor agent as starting point.

cheeseface • 9 days ago

Claude Code has worked well for me. It is easy to point it to the relevant parts of the codebase and see what it decides to read itself so you provide missing piece of code when necessary.

afro88 • 9 days ago

This is almost the opposite of what OP is asking, and what the post from antirez describes.

schneehertz • 8 days ago

Github Copilot's Edit mode allows you to manually specify the context, and it runs only once each time to write code with diff checking, without entering a agent loop.

bgwalter • 9 days ago

Translation: His company will launch "AI" products in order to get funding or better compete with Valkey.

I find it very sad that people who have been really productive without "AI" now go out of their way to find small anecdotal evidence for "AI".

brokencode • 9 days ago

I find it even more sad when people come out of the woodwork on every LLM post to tell us that our positive experiences using LLMs are imagined and we just haven’t realized how bad they are yet.

skippyboxedhero • 9 days ago

Some people got into coding to code, rather than build things.

If the AI is doing the coding then that is a threat to some people. I am not sure why, LLMs can be good and you can enjoy coding...those things are unrelated. The logic seems to be that if LLMs are good then coding is less fun, lol.

Cheer2171 • 9 days ago

Software jobs pay more than artist jobs because coding builds things. You can still be a code artist on your own time. Nobody is stopping you from writing in assembler.

skippyboxedhero • 9 days ago

cratermoon • 9 days ago

We don't just tell you they were imagined, we can provide receipts.

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

steveklabnik • 9 days ago

> We do not provide evidence that:

> AI systems do not currently speed up many or most software developers

> We do not claim that our developers or repositories represent a majority or plurality of software development work

brokencode • 9 days ago

Certainly an interesting result, but remember that a single paper doesn’t prove anything. This will no doubt be something studied very extensively and change over time as tools develop.

Personally, I find the current tools don’t work great for large existing codebases and complex tasks. But I’ve found they can help me quickly make small scripts to save me time.

I know, it’s not the most glamorous application, but it’s what I find useful today. And I have confidence the tools will continue to improve. They hardly even existed a few years ago.

nojito • 9 days ago

Cursor is an old way of using LLMs.

Not to mention in the study less than 1/2 have ever used it before the study.

roywiggins • 9 days ago

on_the_train • 9 days ago

If LLMs were actually useful, there would be no need to scream it everywhere. On the contrary: it would be a guarded secret.

neuronexmachina • 9 days ago

In my experience, devs generally aren't secretive about tools they find useful.

fellowniusmonk • 9 days ago

logsr • 9 days ago

posting a plain text description of your experience on a personal blog isn't exactly screaming. in the noise of the modern internet this would be read by nobody if it wasn't coming from one of the most well known open source software creators of all time.

people who believe in open source don't believe that knowledge should be secret. i have released a lot of open source myself, but i wouldn't consider myself a "true believer." even so, i strongly believe that all information about AI must be as open as possible, and i devote a fair amount of time to reverse engineering various proprietary AI implementations so that i can publish the details of how they work.

why? a couple of reasons:

1) software development is my profession, and i am not going to let anybody steal it from me, so preventing any entity from establishing a monopoly on IP in the space is important to me personally.

2) AI has some very serious geopolitical implications. this technology is more dangerous than the atomic bomb. allowing any one country to gain a monopoly on this technology would be extremely destabilizing to the existing global order, and must be prevented at all costs.

LLMs are very powerful, they will get more powerful, and we have not even scratched the surface yet in terms of fully utilizing them in applications. staying at the cutting edge of this technology, and making sure that the knowledge remains free, and is shared as widely as possible, is a natural evolution for people who share the open source ethos.

bgwalter • 9 days ago

If consumer "AI", and that includes programming tools, had real geopolitical implications it would be classified.

The "race against China" is a marketing trick to convince senators to pour billions into "AI". Here is who is financing the whole bubble to a large extent:

https://time.com/7280058/data-centers-tax-breaks-ai/

brokencode • 9 days ago

So ironic that you post this on Hacker News, where there are regularly articles and blog posts about lessons from the industry, both good and bad, that would be helpful to competitors. This industry isn’t exactly Coke guarding its secret recipe.

hobs • 9 days ago

I think many devs are guarding their secrets, but the last few decades have shown us that an open foundation can net huge benefits for everyone (and then you can put your secret sauce in the last mile.)

victorbjorklund • 9 days ago

If Internet was actually useful there would be no need to scream it everywhere. Guess that means the internet is totally useless?

alwillis • 9 days ago

If LLMs were actually useful, there would be no need to scream it everywhere. On the contrary: it would be a guarded secret.

LLMs are useful—but there’s no way such an innovation should be a “guarded secret” even at this early stage.

It’s like saying spreadsheets should have remained a secret when they amplified what people could do when they became mainstream.

halfmatthalfcat • 9 days ago

Could it not be that those positive experiences are just shining a light that the practices before using an LLM were inefficient? It’s more a reflection on the pontificator than anything.

jstanley • 9 days ago

Tautologically so! That doesn't show that LLMs are useless, it perfectly shows how they are useful.

johnfn • 9 days ago

Sure, but even then the perspective makes no sense. The common argument against AI at this point (e.g. OP) is that the only reason people use it is because they are intentionally trying to prop up high valuations - they seem unable to understand that other people have a different experience than they do. You’d think that just because there are some cases where it doesn’t work doesn’t necessarily mean that 100% of it is a sham. At worst it’s just up to individual taste, but that doesn’t mean everyone who doesn’t share your taste is wrong.

Consider cilantro. I’m happy to admit there are people out there who don’t like cilantro. But it’s like the people who don’t like cilantro are inventing increasingly absurd conspiracy theories (“Redis is going to add AI features to get a higher valuation”) to support their viewpoint, rather than the much simpler “some people like a thing I don’t like”.

bgwalter • 9 days ago

"Redis for AI is our integrated package of features and services designed to get your GenAI apps into production faster with the fastest vector database."

antirez • 9 days ago

Did you read my post? I hope you didn’t.

This post has nothing to do with Redis and is even a follow up to a post I wrote before rejoining the company.

syntheticnature • 9 days ago

This is HN. We don't read posts here.

kgwgk • 9 days ago

Amen. I have to confess that I made an exception here though. This may be the first submission I read before going into the comments in years.

babuloseo • 9 days ago

Please send your thoughts and prayers to Gemini 2.5 Pro hopefully they can recover and get well soon enough, I hope Google lets them out of the hospital soon and discharges them, the last 3 week has been hell for me without them there.

babuloseo • 9 days ago

OP as a free user of Gemini 2.5 Pro via Ai studio my friend has been hit by the equivalent of a car breaking approximately 3 weeks, I hope they can recover soon, it is not easy for them.

wg0 • 9 days ago

I don't understand.

Is author suggesting manually pasting redis C files into Gemini Pro chat window on the web?

thefourthchime • 9 days ago

I was mostly nodding my head until he got to this part.

The fundamental requirement for the LLM to be used is: don’t use agents or things like editor with integrated coding agents.

So right, is he like actually copying and pasting stuff into a chat window? I did this before Co-Pilot, but with cursor I would never think of doing that. He never mentioned Cursor or Claude Code so I wonder if he's even experienced it.

libraryofbabel • 9 days ago

Right, this didn’t make much sense to me either. Who’d still recommend copy-and-paste-into-chat coding these days with Claude Code and similar agents available? I wonder if he’s got agents / IDEs like windsurf, copilot, cursor etc where there is more complexity between you and the frontier LLM and various tricks to minimize token use. Claude Code, Gemini CLI etc aren’t like that and will just read in whole files into the context so that the LLM can see everything, which I think achieves what he wants but with all the additional magic of agents like edits, running tests, etc. as well.

Implicated • 9 days ago

> agents / IDEs like windsurf, copilot, cursor etc where there is more complexity between you and the frontier LLM and various tricks to minimize token use.

This is exactly why he's doing it the way he is and why what he describes is still the most effective, albeit labor intensive, way to work on hard/complex/difficult problems with LLMs.

Those tricks are for saving money. They don't make the LLM better at its task. They just make it so the LLM will do what you could/should be doing. We're using agents because we're lazy or don't have time or attention to devote, or the problems are trivial enough to solved with these "tricks" and added complexities. But, if you're trying to solve something complex or don't want to have a bunch of back and forth with the LLM or don't want to watch it iterate and do some dumb stuff... curate that context. Actually put thought and time into what you provide the LLM, both in context and in prompt - you may find that what you get is a completely different product.

Or, if you're just having it build views and buttons - keep vibing.

torginus • 9 days ago

I'm a little baby when it comes to Claude Code and agentic AI, that said I was a heavy user of Cursor since it came out, and before agents came out, I had to manually select which files would be included in the prompt of my query.

Now Cursor's agent mode does this for me, but it can be a hit or miss.

schneehertz • 8 days ago

The edit mode of GitHub Copilot requires manually providing context files and does not have RAG or other agent tools. I think this mode is much easier to use than the agent mode.

Implicated • 9 days ago

I'm a huge fan of claude code, max subscriber, etc. But... if you're working on a very specific problem and you're not being lazy - you're _better off_ to manually curate the context yourself rather than rely on Claude Code to collect it or know what to use. Even @'ing the files into the context... didn't we just see them extend the window on the number of lines this would _actually_ push into the context?

Before claude code was around I had already learned very quickly that exactly what antirez is describing, literally copy/pasting code into the chat verbatim, was _wildly_ effective. So much so that I had Claude build me a jetbrains plugin to streamline the process for me - I would select the files I wanted to include in context in a tree-style view of the project and it simply had a 'Copy' button that would, when pressed, compile the files into a single markdown document with simple metadata about each included file (filename, path) and a simple "divider" between files.

And, also like OP states, it's wildly more effective to take this approach combined with a new context/session/chat for each new task. (So, with Claude, I used projects heavily for this and this copied context markdown document was added to the project files and then I'd just start a new chat in that project any time I wanted to work on something that included them/that context I had crafted)

Claude Code and the "agent" approach is wildly effective, no doubt. But it doesn't really support this sort of very specific and "heavy" approach the the "problem". Rather - it's actually working pretty hard to use as few tokens as possible and weave through the maze of trying to be token efficient while getting/finding/using context that _is_ available to it but not presented directly to it. Rather - if you just feed it the whole ass class/module/package and provide a detailed, focused and intentful prompt you might be really surprised what you get.

These agents are _really_ going to thrive when the context window isn't a problem anymore. Claude (not code, but the desktop app, so using a "project") essentially one-shot implementing the necessary updates to phpredis (PHP extension in C for Redis) to support the new vector set data type. I first cloned the phpredis repo locally, initialized claude code and told it what I wanted to do and what the repo it was working with was and asked it to analyze the repo and the code and provide me a list of the files that would be essential for it to have in context when building out that new functionality. It provided me the list of files - I packed them up in a claude project and started over there. The only issues were related to it mixing up/assuming some stuff related to standard Redis commands and versions. And it's own tests caught that, fixed it. I only used this extension in my local/dev environments as I wanted to test the vector sets in Redis from within the same ecosystem my application was already working within (phpredis is significantly more performant that it's alternatives (which support the new commands) and it, also, serves as a perfect example of another of antirez's points... the utility provided by these tools to do things as "throwaway" proof of concepts is just amazing. I'm not a C developer, I didn't know anything about compiling php extensions (or how weird that whole ecosystem is in PHP-land) - I would have never gone through that effort to test just one method of doing something prior. I would have just used what was available rather than even contemplating doing something like forking an extension in a language I'm not familiar with and fixing it up the way I needed it to work (only because of time, I'm always down for a challenge, but I can't always afford one).

Context is king. Your ability to communicate to the box what you want is next in line for the throne.

cheschire • 9 days ago

I find agentic coding to be best when using one branch per conversation. Even if that conversation is only a single bugfix, branch it. Then do 2 or 3 iterations of that same conversation across multiple branches and choose the best result of the 3 and destroy the other two.

dcre • 9 days ago

“Always be part of the loop by moving code by hand from your terminal to the LLM web interface: this guarantees that you follow every process. You are still the coder, but augmented.”

I agree with this, but this is why I use a CLI. You can pipe files instead of copying and pasting.

lmeyerov • 9 days ago

Yeah it is also a bit of a shibboleth: vibes coding, when I'm productive for the 80% case with Claude code, is about the LLM cranking for 10-20min. I'm instructing & automating the LLM on how to do its own context management, vs artisanally making every little decision.

Ex: Implementing a spec, responding to my review comments, adding wider unit tests, running a role play for usability testing, etc. The main time we do what he describes of manually copying into a web ide is occasionally for a better short use of a model, like only at the beginning of some plan generation, or debug from a bunch of context we have done manually. Like we recently solved some nasty GPU code race this way, using a careful mix of logs and distributed code. Most of our job is using Boring Tools to write Boring Code, even if the topic/area is neato: you do not want your codebase to work like an adventure for everything, so we invest in making it look boring.

I agree the other commenter said: I manage context as part of the skill, but by making the AI do it. Doing that by hand is like slowly handcoding assembly. Instead, I'm telling Claude Code to do it. Ex: Download and crawl some new dependency I'm using for some tricky topic, or read in my prompt template markdown for some task, or generate and self-maintain some plan.md with high-level rules on context I defined. This is the 80% case.

Maybe one of the disconnects is task latency vs throughput as trade-offs in human attention. If I need the LLM to get to the right answer faster, so the task is done faster, I have to lean in more. But my time is valuable and I have a lot to do. If rather spend 50% less of my time per task, even if the task takes 4x longer, by the LLM spinning longer. In that saved human time, I can be working on another task: I typically have 2-3 terminals running Claude, so I only check in every 5-15min.

airstrike • 9 days ago

Your strategy only works for some domains.

lmeyerov • 9 days ago

Totally

We do this ~daily for:

* Multitier webapps

* DevOps infrastructure: docker, aws, ci systems, shell scripts, ...

* Analytics & data processing

* AI investigations (logs, SIEMs, ..) <--- what we sell!

* GPU kernels

* Compilers

* Docs

* Test amplification

* Spec writing

I think ~half the code happening by professional software engineers fits into these, or other vibes friendly domains. The stuff antirez does with databases seems close to what we do with compilers, GPU kernels, and infra.

We are still not happy with production-grade frontend side of coding, though by being strong on API-first design and keeping logic vs UI seperated, most of our UI code is friendly to headless.

amluto • 8 days ago

> Always be part of the loop by moving code by hand from your terminal to the LLM web interface

Is there any good tooling for making this part easier and less error prone, short of going to a full-fledged agent system?

faxmeyourcode • 8 days ago

I don't aggree 100% with the OP on this suggestion. I use an agent to build a feature, then if it's significantly important enough I will have it open a PR that I can review traditionally after testing.

Copying and pasting code may work but when the agent is right there and can go read a file on its own I don't see the point in copying and pasting this way.

Really though, like another comment said, I'm probably working on different problems than antirez, so your mileage may vary.

iandanforth • 9 days ago

The most interesting and divergent part of this post is this bit:

"Don’t use agents or things like editor with integrated coding agents."

He argues that the copy/paste back and forth with the web UI is essential for maintaining control and providing the correct context.

theodorewiles • 9 days ago

My question on all of the “can’t work with big codebases” is how would a codebase that was designed for an LLM look like? Composed of many many small functions that can be composed together?

antirez • 9 days ago

I believe it’s the same as for humans: different files implementing different parts of the system with good interfaces and sensible boundaries.

afro88 • 9 days ago

Well documented helps a lot too.

You can use an LLM to help document a codebase, but it's still an arduous task because you do need to review and fix up the generated docs. It will make, sometimes glaring sometimes subtle, mistakes. And you want your documentation to provide accuracy rather than double down on or even introduce misunderstanding.

dkdcio • 9 days ago

this is a common pattern I see -- if your codebase is confusing for LLMs, it's probably confusing for people too

physicles • 9 days ago

This fact is one of the most pleasant surprises I’ve had during this AI wave. Finally, a concrete reason to care about your docs and your code quality.

aitchnyu • 8 days ago

"What helps the human helps the AI" in https://blog.nilenso.com/blog/2025/05/29/ai-assisted-coding/

In future I'll go "In the name of our new darling bot, let us unit test and refactor this complicated thing".

exitb • 9 days ago

And on top of that - can you steer an LLM to create this kind of code? In my experience the models don’t really have a „taste” for detecting complexity creep and reengineering for simplicity, in the same way an experienced human does.

lubujackson • 9 days ago

I am vibe coding a complex app. You can certainly keep things clean but the trick is to enforce a rigid structure. This does add a veneer of complexity but simplifies " implement this new module" or "add this feature across all relevant files".

Hasnep • 9 days ago

And my question to that is how would that be different from a codebase designed for humans?

__MatrixMan__ • 9 days ago

I think it means finer toplevel granularity re: what's runnable/testable at a given moment. I've been exploring this for my own projects and although it's not a silver bullet, I think there's something to it.

----

Several codebases I've known have provided a three-stage pipeline: unit tests, integration tests, and e2e tests. Each of these batches of tests depend on the creation of one of three environments, and the code being tested is what ends up in those environments. If you're interested in a particular failing test, you can use the associated environment and just iterate on the failing test.

For humans with a bit of tribal knowledge about the project, humans who have already solved the get-my-dev-environment-set-up problem in more or less uniform way, this works ok. Humans are better at retaining context over weeks and months, whereas you have to spin up a new session with an LLM every few hours or so. So we've created environments for ourselves that we ignore most of the time, but that are too complex to be bite sized for an agent that comes on the scene as a blank slate every few hours. There are too few steps from blank-slate to production, and each of them is too large.

But if successively more complex environments can be built on each other in arbitrarily many steps, then we could achieve finer granularity. As a nix user, my mental model for this is function composition where the inputs and outputs are environments, but an analogous model would be layers in a docker files where you test each layer before building the one on top of it.

Instead of maybe three steps, there are eight or ten. The goal would be to have both whatever code builds the environment, and whatever code tests it, paired up into bite-sized chunks so that a failure in the pipeline points you a specific stage which is more specific that "the unit tests are failing". Ideally test coverage and implementation complexity get distributed uniformly across those stages.

Keeping the scope of the stages small maximizes the amount of your codebase that the LLM can ignore while it works. I have a flake output and nix devshell corresponding to each stage in the pipeline and I'm using pytest to mark tests based on which stage they should run in. So I run the agent from the devshell that corresponds with whichever stage is relevant at the moment, and I introduce it to onlythe tests and code that are relevant to that stage (the assumption being that all previous stages are known to be in good shape). Most of the time, it doesn't need to know that it's working stage 5 of 9, so it "feels" like a smaller codebase than it actually is.

If evidence emerges that I've engaged the LLM at the wrong stage, I abandon the session and start over at the right level (now 6 of 9 or somesuch).

victorbjorklund • 9 days ago

I found that it is beneficial to create more libraries. If I for example build a large integration to an API (basically a whole api client) I would in the past have it in the same repo but now I make it a standalone library.

Keyframe • 9 days ago

like a microservice architecture? overall architecture to get the context and then dive into a micro one?

stillsut • 8 days ago

Overall strong piece of writing. This part resonated with me as aptly described:

> more/better in the same time used in the past — which is what I do), when left alone with nontrivial goals they tend to produce fragile code bases that are larger than needed, complex, full of local minima choices, suboptimal in many ways.

And this part felt like a "bitter lesson" anti-pattern:

> Avoid any RAG that will show only part of the code / context to the LLM. This destroys LLMs performance. You must be in control of what the LLM can see when providing a reply.

Ultimately I think cli agents like claude-code and gemini-cli and aider will be controlling the context dynamically, and the human should not be spending premature optimization time on this activity.

If anyone's interested I've got some very exact stats on prompts and accepted solution linked in my LLM proof of concept repo: https://github.com/sutt/agro/blob/master/docs/dev-summary-v1...

apwell23 • 9 days ago

> ## Provide large context

I thought large contexts are not necessarily better and sometimes have opposite effect ?

antirez • 9 days ago

LLMs performance will suffer from both insufficient context and context flooding. Balancing is an art.

NitpickLawyer • 9 days ago

I found it depends very much on the task. For "architect" sessions you need as much context as you can reasonably gather. The more the merrier. At least gemini2.5 pro will gather the needed context from many files and it really does make a difference when you can give it a lot of it.

On coding you need to aggressively prune it, and only give minimum adjacent context, or it'll start going on useless tangents. And if you get stuck just refresh and start from 0, changing what is included. It's often faster than "arguing" with the LLM in multi-step sessions.

(the above is for existing codebases. for vibe-coding one-off scripts, just go with the vibes, sometimes it works surprisingly well from a quick 2-3 lines prompt)

brainless • 9 days ago

Lovely post @antirez. I like the idea that LLMs should be directly accessing my codebase and there should be no agents in between. Basically no software that filters what the LLM sees.

That said, are there tools that make going through a codebase easier for LLMs? I guess tools like Claude Code simply grep through the codebase and find out what Claude needs. Is that good enough or are there tools which keep a much more thorough view of the codebase?

apwell23 • 9 days ago

> Coding activities should be performed mostly with: Claude Opus 4

I've been going down to sonnet for coding over opus. maybe i am just writing dumb code

stpedgwdgfhgdd • 9 days ago

That is also what Anthropic recommends. In edge cases use Opus.

Opus is also way more expensive. (Don’t forget to switch back to Sonnet in all terminals)

northern-lights • 9 days ago

In my experience as well, Sonnet 4 is much better than Opus. Opus is great at the start of a project, where you would need to plan things, structure the project, figure out how to execute but it cannot beat Sonnet is actually executing it. It is also a lot cheaper.

cyral • 9 days ago

Opus is too expensive and I find it goes way off the rails often (just writing way way too much. Maybe that could be controlled with a better prompt on my end). Sonnet gives me more realistic code that isn't too overengineered.

leemoore • 9 days ago

Same, if you dont give opus big enough problems it's more likely to go off the rails. Not much more likely but a little more likely

jfkfibkririfk • 8 days ago

Did your meditation pay off? Did you hit stream entry?

jtonl • 9 days ago

Most of the time Sonnet 4 just works but need to refine context as much as you can.

krupan • 9 days ago

What is the overall feedback loop with LLMs writing code? Do they learn as they go like we do? Do they just learn from reading code on GitHub? If the latter, what happens as less and less code gets written by human experts? Do the LLMs then stagnate in their progress and start to degrade? Kind of like making analog copies of analog copies of analog copies?

steveklabnik • 9 days ago

> Do they learn as they go like we do?

It's complicated. You have to understand that when you ask an LLM something, you have the model itself, which is kind of like a function: put something in, get something out. However, you also pass an argument to that function: the context.

So, in a literal sense, no, they do not learn as they go, in the sense that the model, that function, is unchanged by what you send it. But the context can be modified. So, in some sense, an LLM in a agentic loop that goes and reads some code from GitHub can include that information in the context it uses in the future, so it will "learn" within the session.

> If the latter, what happens as less and less code gets written by human experts?

So, this is still a possible problem, because future trainings of future LLMs will end up being trained on code written by LLMs. If this is a problem or not is yet to be seen, I don't have a good handle on the debates in this area, personally.

Herring • 9 days ago

Code and math are similar to chess/go, where verification is (reasonably) easy so you can generate your own high-quality training data. It's not super straightforward, but you should still expect more progress in coming years.

cesarb • 9 days ago

> Code and math are similar to chess/go, where verification is (reasonably) easy

Verification for code would be a formal proof, and these are hard; with a few exceptions like seL4, most code does not have any formal proof. Games like chess and go are much easier to verify. Math is in the middle; it also needs formal proofs, but most of math is doing these formal proofs themselves, and even then there are still unproven conjectures.

Herring • 9 days ago

Verification for code is just running it. Maybe "verification" was the wrong word. The model just needs a sense of code X leads to outcome Y for a large number of (high-quality) XY pairs, to learn how to navigate the space better, same as with games.

krupan • 8 days ago

cushychicken • 9 days ago

I’m super curious to see the reactions in the comments.

antirez is a big fuggin deal on HN.

I’m sort of curious if the AI doubting set will show up in force or not.

sitkack • 9 days ago

I find it serendipitous that Antirez is into LLM based coding, because the attention to detail in Redis means all the LLMs have trained extensively on the Redis codebase.

Something that was meant for humans, has now been consumed by AI and he is being repaid for that openness in a way. It comes full circle. Consistency, clarity and openness win again.

lettergram • 9 days ago

Contrary to this post, I think the AI agents, particularly the online interface of OpenAI's Codex to be a massive help.

One example, I had a PR up that was being reviewed by a colleague. I was driving home from vacation when I saw the 3-4 comments come in. I read them when we stopped for gas, went to OpenAI / codex on my phone, dictated what I needed and made it PR to my branch. Then got back on the road & PR'd it. My colleague saw the PR, agreed and merged it in.

I think of it as having a ton of interns, the AI is about the same quality. It can help to have them, but they often get stuck, need guidance, etc. If you treat the AI like an intern and explain what you need it can often produce good results; just be prepared to fallback to coding quickly.

fumeux_fume • 9 days ago

I currently use LLMs as a glorified Stack Overflow. If I want to start integrating an LLM like Gemini 2.5 PRO into my IDE (I use Visual Studio Code), whats the best way to do this? I don't want to use a platform like Cursor or Claude Code which takes me away from my IDE.

Tiksi • 9 days ago

There's a gemini vscode plugin that does autocomplete and chatbot modes:

https://cloud.google.com/gemini/docs/codeassist/write-code-g...

physicles • 9 days ago

Cursor is an IDE. You can use its powerful (but occasionally wrong) autocomplete, and start asking it to do small coding tasks using the Ctrl+L side window.

fumeux_fume • 9 days ago

I don't want to leave my IDE

Philpax • 9 days ago

Worth noting that Cursor is a VS Code fork and you can copy all of your settings over to it. Not saying that you have to, of course, but that it's perhaps not as different as you might be imagining.

cyral • 9 days ago

I don't either but unfortunately Cursor is better than all the other plugins for IDEs like JetBrains. I just tab over to cursor and prompt it, then edit the code in my IDE of choice.

anthomtb • 9 days ago

Does running a Claude Code command in VSCode's integrated terminal count as leaving your IDE?

(We may have differing definitions of "leaving" ones IDE).

hedgehog • 9 days ago

GitHub Copilot is pretty easy to try within VS Code

fumeux_fume • 9 days ago

I want to use Gemini 2.5 PRO. I was an early tester of Copilot and it was awful.

kgwgk • 9 days ago

https://docs.github.com/en/copilot/reference/ai-models/suppo...

fumeux_fume • 9 days ago

haiku2077 • 9 days ago

Copilot has 2.5 Pro in the settings in github.com, along with claude 4

jwpapi • 9 days ago

Thank you very much this is exactly my experience. I sometimes let it vibe code frontend features that area easy to test in an already typed code base (add a field to this form), but most of the time its my sparring partner to review my code and evaluate all options. While it often recommends bullox or has logical flaws it helps me to do the obvious thing and to not miss a solution! Sometimes we have fancy play syndrome and want to code the complicated thing because of a fundamental leak we have. LLMS done a great job of reducing those of my flaws.

But just because I’ve not been lazy…

freeone3000 • 8 days ago

This is possibly the first HN AI article that actually matches my experience - where the models are good enough for small pieces other people have done once, or for where you might otherwise write a macro, but for anything beyond the scope of a single file write shitty code; and regardless, always have to be hand-held.

It’s a far cry away from “vibe code everything”, “this will eliminate jobs” that the current hype train is pushing, despite clearly using the agentic approach with large context provieee by Opus.

lunarcave • 9 days ago

> Despite the large interest in agents that can code alone, right now you can maximize your impact as a software developer by using LLMs in an explicit way, staying in the loop.

I think this is key here. Whoever has the best UX for this (right now, it's Cursor IMO) will get the bulk of the market share. But the switching costs are so low for this set of tooling that we'll see a rapid improvement in the products available, and possibly some new entrants.

abhi3188 • 9 days ago

A good way to get a model to answer questions about a codebase without overwhelming it or exceeding its token count is to: 1. just give it the directory structure 2. ask it questions based on that 3. after it answers a question ask it if there are any specific code files it needs to better answer the question you asked 4. attach only those files so it can confirm its answer and back it up with code

alexalx666 • 9 days ago

you can upload a zip of a feature files as they are in the project and ask to implement it on another platform on chatgpt 4.5, works pretty good.

speedgoose • 9 days ago

Thanks for writing this article.

I used a similar setup until a few weeks ago, but coding agents became good enough recently.

I don’t find context management and copy pasting fun, I will let GitHub Copilot Insiders or Claude Code do it. I’m still very much in the loop while doing vibe coding.

Of course it depends on the code base, and Redis may not benefit much from coding agents.

But I don’t think one should reject vibe coding at this stage, it can be useful when you know what the LLMs are doing.

YetAnotherNick • 9 days ago

> Coding activities should be performed mostly with:

> * Gemini 2.5 PRO > * Claude Opus 4

I think trying out all the LLMs for each task is highly underappreciated. There is no pareto optimal LLM for all skills. I give each prompt to 8 different LLMs using a Mac app. In my experience while Gemini is consistently in top 3 of 8, the difference between best output and Gemini Pro could be huge.

tomwphillips • 9 days ago

I'm surprised IDE integration is written off. I've been pleased with Junie's agent mode in IntelliJ. Works well.

DSingularity • 9 days ago

Sorry if I missed it in the article — what’s your setup? Do you use a CLI tool like aider or are you using an IDE like cursor?

antirez • 9 days ago

Terminal with vim in one side, the official web interface of the model in the other side. The pbcopy utility to pass stuff in the clipboard. I believe models should be used in their native interface as when there are other layers sometimes the model served is not exactly the same, other times it misbehaves because of RAG and in general no exact control of the context window.

lisa_coicadan • 4 days ago

I’ve seen this exact workflow (PDF → extract data → update structured files) come up a lot, and it’s impressive that Claude handled it end-to-end like that. We’ve been building Retab.com to handle those kinds of tasks more reliably, especially when you want structured output (like JSON) from messy documents like PDFs, scans, or even images. Instead of writing ad-hoc scripts or chaining LLM calls, you just upload the file, define what you want (via schema), and it gives you clean structured data. It’s AI-native but deterministic, no need to install PyPDF2 or debug model behavior. Just wanted to share in case others are solving similar problems repeatedly.

js2 • 9 days ago

This seems like a lot of work depending upon the use case. e.g. the other day I had a bunch of JSON files with contact info. I needed to update them with more recent contact info on an internal Confluence page. I exported the Confluence page to a PDF, then dropped it into the same directory as the JSON files. I told Claude Code to read the PDF and use it to update the JSON files.

It tried a few ways to read the PDF before coming up with installing PyPDF2, using that to parse the PDF, then updated all the JSON files. It took about 5 minutes to do this, but it ended up 100% correct, updating 7 different fields across two dozen JSON files.

(The reason for the PDF export was to get past the Confluence page being behind Okta authentication. In retrospect, I probably should've saved the HTML and/or let Claude Code figure out how to grab the page itself.)

How would I have done that with Gemini using just the web interface?

clscott • 5 days ago

Do I understand correctly that you deliberately entered personal contact information into LLM?

If so, I would be a reprimanding anyone in my org that did this. While it’s more effort I’d use the LLM to write a script to read the page with the Confluence api, parse it, write out the json files and push them where they need to go.

Add in basic assertions to check the data is present, in the expected format and there is enough of it. Alerting when the assertions fail, then I can schedule it and forget about it.

This is where LLMs shine, I can now build a robust solution in an hour instead of a day.

quantumHazer • 9 days ago

He uses vim and copy paste code from web interfaces because he wants to maintain control and understanding of the code. You can find proofs of this setup on his youtube channel [https://www.youtube.com/@antirez]

antirez • 9 days ago

Thanks. Also based on the coding rig you use models may not match the performance of what it is served via web. Or may not be as cheap. For instance the Gemini 2.5 pro 20$ account is very hard to saturate with queries.

indigodaddy • 9 days ago

Since I’ve heard Gemini-cli is not yet up to snuff, has anyone tried opencode+gemini? I’ve heard that with opencode you can login with Google account (have NOT confirmed this, but if anyone has any experience, pls advise) so not sure if that would get extra mileage from Gemini’s limits vs using a Gemini api key?

entropyneur • 9 days ago

Interesting. This is quite contrary to my experience. Using LLMs for things ouside my expertise produces crappy results which I can only identify as such months later when my expertise expands. Meanwhile delegating the boring parts that I know too well to agents proved to be a huge productivity boost.

benreesman • 9 days ago

Opus 4 just showed me Claude Code style work evasion heuristics for the first time today. I had been cautiously optimistic that they were just going to run the premium product at the exhorbidant price: you don't always want to pay it, but its there.

Untrustworthy is worse than useless.

vl • 9 days ago

I use Claude Code with Opus, and article recommends Gemini 2.5 Pro. I want to try it as well, but I don't know a tool which would make experience compatible to Claude Code. Would it make sense to use with Cursor? Do they try to limit context?

Karrot_Kream • 9 days ago

Gemini has its own Claude Code like tool you can use. https://github.com/google-gemini/gemini-cli

ramraj07 • 9 days ago

My experience, which seems fairly isolated, is that using Gemini's web chat interface and pasting entire sections of my codebase beats any other agent I've seen. Some come close, and some are good with very large files etc, but if you have a decently organized codebase then using gemini like this beats anything else.

vl • 9 days ago

I work on established project with medium-size codebase, copying to chat and back is just not practical.

This is why Claude Code rocks - it quite often finds relevant parts itself.

ramraj07 • 9 days ago

I use tools like 16x prompt to copy just the relevant files. Sure, the tools like Claude Code finds the relevant files, but what happens after they find files is still suboptimal to what I get if I paste them into gemini myself.

ok123456 • 9 days ago

One way to utilize these CLI coding agents that I like is to have them run static analysis tools in a loop, along with whatever test suite you have set up, systematically improving crusty code beyond the fixes that the static analysis tools offer.

qweiopqweiop • 9 days ago

This matches my take, but I'm curious if OP has used Claude code.

antirez • 9 days ago

Yep when I use agents I go for Claude Code. For example I needed to buy too many Commodore 64 than appropriate lately, and I let it code a Telegram bot advising me when popular sources would have interesting listings. It worked (after a few iterations) then I looked at the code base and wanted to puke but who cares in this case? It worked and it was much faster and I had zero to learn in the proces of doing it myself. I published a Telegram library for C in the past and know how it works and how to do scraping and so forth.

Keyframe • 9 days ago

For example I needed to buy too many Commodore 64 than appropriate lately

Been there, done that!

for those one-off small things, LLMs are rather cool. Especially Cloude Code and Gemini CLI. I was given an archive of some really old movies recently, but files were bearing title names in Croatian instead of original (mostly English ones). So I claude --dangerously-skip-permissions into the directory with movies and in a two-sentence prompt I asked it to rename files into a given format (that I tend to have in my archive) and for each title to find original name and year or release and use it in the file.. but, before commiting rename to give me a list of before and after for approval. It took like what, a minute of writing a prompt.

Now, for larger things, I'm still exploring a way, an angle, what and how to do it. I've tried from yolo prompting to structured and uber structured approaches, all the way to mimicking product/prd - architecture - project management / tasks - developer/agents.. so far, unless it's rather simpler projects I don't see it's happening that way. Most luck I had was "some structure" as context and inputs and then guiding prompting during sessions and reviewing stuff. Almost pair-programming.

SamInTheShell • 9 days ago

I like how this is written in a way that an LLM doing planning can probably infer what to do. Let me know if I hit the nail on the head with what you’re thinking @antirez

karel-3d • 8 days ago

ok when even someone like antirez is doing this stuff, maybe I will eventually try it out

...one day.

seivan • 9 days ago

[dead]

babuloseo • 9 days ago

[flagged]

neves • 8 days ago

Well, here in the Southern hemisphere it's winter time.

amelius • 8 days ago

Not an AI winter, I hope?

ycombadmin1 • 8 days ago

https://news.ycombinator.com/item?id=44633940 Error while reading