The premise might possibly be true, but as an actually seasoned Python developer, I've taken a look at one file: https://github.com/dx-tooling/platform-problem-monitoring-co...
All of it smells of a (lousy) junior software engineer: from configuring root logger at the top, module level (which relies on module import caching not to be reapplied), over not using a stdlib config file parser and building one themselves, to a raciness in load_json where it's checked for file existence with an if and then carrying on as if the file is certainly there...
In a nutshell, if the rest of it is like this, it simply sucks.
As a long time hobby coder, like 25 years and I think I’m pretty good(?), this whole LLM /vibecoding thing has zapped my creativity the past year or so. I like the craft of making things. I used tools I enjoy working with and learn new ones all the time (never got on the JS/react train). Sometimes I have an entrepreneur bug and want to create a marketable solution, but I often just like to build. Im also the kind of guy that has a shop he built, builds his own patio deck, home remodeling, Tinker with robotics, etc. Kind of just like to be a maker following my own creative pursuit.
All said, it’s hard on me knowing it’s possible to use llm to spit out a crappy but functional version of whatever I’ve dreamt up with out satisfaction of building it. Yet, it also seems to now be demotivating to spend the time crafting it when I know I could use llm to do a majority of it. So, I’m in a mental quagmire, this past year has been the first year since at least 2000 that I haven’t built anything significant in scale. It’s indirectly ruining the fun for me for some reason. Kind of just venting but curious if anyone else feels this way too?
I can echo your sentiment. Art is the manifestation of creativity, and to create any good art you need to train in whatever medium you choose. For the decade I've been a professional programmer, I've always argued that writing code was a creative job.
It's been depressing to listen to people pretend that LLM generated code is "the same thing". To trivialize the thoughtful lessons one has learned honing their craft. It's the same reason the Studio Ghilbi AI image trend gives me the ick.
Fascinating. it's gone the other way for me. because I can now whip up a serious contender to any SaaS business in a week, it's made everything more fun, not less.
I followed a lot of Twitter people who were vibecoding their way to SaaS platforms because I thought it would be interesting to follow.
So far none of them are having a great time after their initial enthusiasm. A lot of it is people discovering that there’s far more to a business than whipping up a SaaS app that does something. I’m also seeing a big increase in venting about how their progress is slowing to a crawl as the codebase gets larger. It’s interesting to see the complaints about losing days or weeks to bugs that the LLM introduced that they didn’t understand.
I still follow because it’s interesting, but I’m starting to think 90% of the benefit is convincing people that it’s going to be easy and therefore luring them into working on ideas they’d normally not want to start.
Yeah, I see that perspective bu I guess my thought process is “what’s the point, if everyone else can now do the same”
I had long ago culled many of those ideas based on my ability to execute the marketing plan or the “do I really even want to run that kind of business?” test. I already knew I could build whatever I wanted to exist so My days of pumping out side projects ended long ago and I became more selective with my time.
which turns it into passion. the side project that I'm only interested in because it could maybe make some money? eh.
a project in a niche where I live and breath the fumes off the work and I can help the whole ecosystem with their workflow? sign me up!
So you can create a serious contender to Salesforce or Zapier in a week?
like an Eventbrite or a shopmonkey. but yeah, you don't think you could? Salesforce is a whole morass. not every customer uses every corner of it, and Salesforce will nickel and dime you with their consultants and add ons and plugins. if you can be more specific as to which bit of Salesforce you want to provide to a client we can go deep.
That prompt looks horrifying.
I am not going to spend half an hour coming up with that prompt, tweaking it, and then spend many hours (on the optimistic side) to track down all the hallucinated code and hidden bugs. Have been there once, never going to do that again.
I'd rather do it myself to have a piece of mind.
I wonder how much time it would take with some samples from GitHub, and various documentation about python laying around (languages, cheatsheet, libraries)...
[flagged]
Increasingly I’m realizing that in most cases there is a SIGNIFICANT difference between how useful AI is on greenfield projects vs how useful it is on brownfield projects. For the former: pretty good! For the brownfield, it’s often worse than useless.
Right, but AI could change the ratio of greenfield vs brownfield then (« I’ll be faster if I rewrite this part from scratch »)
I struggle to wrap my head around how this would work (and how AI can be used to maintain and refine software in general). Brownfield code got brown by being useful and solving a real problem, and doing it well enough to be maintained. So the AI approach is to throwaway the code that's proved its usefulness? I just don't get it.
My experience on brownfield projects is the opposite.
I think there's a similar analogy here for products in the AI era.
Bolting AI onto existing products probably doesn't make sense. AI is going to produce an entirely new set of products with AI-first creation modalities.
You don't need AI in Photoshop / Gimp / Krita to manipulate images. You need a brand new AI-first creation tool that uses your mouse inputs like magic to create images. Image creation looks nothing like it did in the past.
You don't need Figma to design a webpage. You need an AI-first tool that creates the output - Lovable, V0, etc. are becoming that.
You don't need AI in your IDE. Your IDE needs to be built around AI. And perhaps eventually even programming languages and libraries themselves need AI annotations or ASTs.
You don't need AI in Docs / Gmail / Sheets. You're going to be creating documents from scratch (maybe pasting things in). "My presentation has these ideas, figures, and facts" is much different than creating and editing the structure from scratch.
There is so much new stuff to build, and the old tools are all going to die.
I'd be shocked if anyone is using Gimp, Blender, Photoshop, Premiere, PowerPoint, etc. in ten years. These are all going to be reinvented. The only way these products themselves survive is if they undergo tectonic shifts in development and an eventual complete rewrite.
Just for the record, Photoshop's first generative 'AI' feature, Content Aware Fill, is 15 years old.
That's a long time for Adobe not to have figured out what your are saying.
In my travels I find writing code to be natural and relaxing--a time to reflect on what I'm doing and why. LLMs haven't helped me out too much yet.
Coding by prompt is the next lowering of the bar and vibe coding even more so. Totally great in some scenarios and adds noise in others.
Very good concrete examples. AI is moving very fast so it can become overwhelming, but what has held true is focusing on writing thorough prompts to get the results you want.
Senior developers have the experience to think through and plan out a new application for an AI to write. Unfortunately a lot of us are bogged down by working our day jobs, but we need to dedicate time to create our own apps with AI.
Building a personal brand is never more important, so I envision a future where dev's have a personal website with thumbnail links (like a fancy youtube thumbnail) to all the small apps they have built. Dozens of them, maybe hundreds, all with beautiful or modern UIs. The prompt they used can be the new form of blog articles. At least that's what I plan to do.
This is extremely fascinating and finally something that feels extremely tangible as opposed to vibes based ideas around how AI will "take everyone's jobs" while failing to fill in the gaps between. This feels extremely gap filling.
I find it quite interesting how we can do a very large chunk of the work up front in design, in order to automate the rest of the work. Its almost as if waterfall was the better pattern all along, but we just lacked the tools at that time to make it work out.
Waterfall has always been the best model as long as specs are frozen, which is never the case.
When I first started in dev, on a Unix OS, we did 'waterfall' (though we just called it releasing software, thirty years ago). We did a a major release every year, minor releases every three months, and patches as and when. All this software was sent to customers on mag tapes, by courier. Minor releases were generally new features.
Definitely times were different back then. But we did release software often, and it tended to be better quality than now (because we couldn't just fix-forward). I've been in plenty of Agile companies whose software moves slower than the old days. Too much haste, not enough speed.
Specs were never frozen with waterfall.
sure, but if you're generating the code in a very small amount of time from the specs then suddenly its no longer the code that is the source, its the specs.
That's what waterfall always wanted to be and it failed because writing the code usually took a lot longer than writing the specs, but now perhaps, that is no longer the case.
This is interesting, thanks for posting. I've been searching for some sort of 'real' usage of AI-coding. I'm a skeptic of the current state of things, so it's useful to see real code.
I know Python, but have been coding in Go for the last few years. So I'm thinking how I'd implement this in Go.
There's a lot of code there. Do you think it's a lot, or it doesn't matter? It seems reasonably clear though, easy to understand.
I'd have expected better documentation/in-line comments. Is that something that you did/didn't specify?
> Once again, the AI agent implemented this entire feature without requiring me to write any code manually.
> For controllers, I might include a small amount of essential details like the route name: [code]
Commit history: https://github.com/dx-tooling/platform-problem-monitoring-co...
Look, I honestly think this is a fair article and some good examples, but what is with this inane “I didn’t write any of it myself” claim that is clearly false that every one of these articles keeps bringing up?
What’s wrong with the fact you did write some code as part of it? You clearly did.
So weird.
I completely agree, as a fellow senior coder. It allows me to move significantly faster through my tasks and makes me much more productive.
It also makes coding a lot less painful because I'm not making typos or weird errors (since so much code autocompletes) that I spend less time debugging too.
Hot take: I don't see a problem with this and in fact we will see in a few years that senior engineers will be needed less in the future.
I have a business which is turning in millions in ARR at the moment (made in the pandemic) it's a pest control business and we have got a small team with only 1 experienced senior engineer, we used to have 5 but with AI we reduced it to one which we are still paying well.
Even with maintenance, we plan ahead for this with an LLM and make changes accordingly.
I think we will see more organizations opting for smaller teams and reducing engineer count since now the code generated is to the point that it works, it speeds up development and that it is "good enough".
This is excellent, and matches my experience.
Those lamenting the loss of manual programming: we are free to hone our skills on personal projects, but for corporate/consulting work, you cannot ignore 5x speed advantage. It's over. AI-assisted coding won.
Is it really 5x? I'm more surprised about 25+ years of experience, and being hard pressed to learn enough python to code the project. It's not like he's learning programming again, or being recently exposed to OOP. Especially when you can find working code samples about the subproblems in the project.
What a high quality article, packed with gems. What a treat.
That free GitHub Copilot though. Microsoft is a relentless drug dealer. If you haven't tried Copilot Edits yet, hold on to your hat. I started using it in a clean Express project and a Vue3 project in VS Code. Basically flawless edits from prompt over multiple files, new files...the works. Easy.
But can it center a div?
This maps pretty well to my experience.
Other devs will say things like "AI is just a stupid glorified autocomplete, it will never be able to handle my Very Special Unique Codebase. I even spent 20 minutes one time trying out Cursor, and it just failed"
Nope, you're just not that good obviously. I am literally 10x more productive at this point. Sprint goals have become single afternoons. If you are not tuned in to what's going on here and embracing it, you are going to be completely obsolete in the next 6 months unless you are some extremely niche high level expert. It wont be a dramatic moment where anyone gets "fired for AI". Orgs will just simply not replace people through attrition when they see productivity staying the same (or even increasing) as headcount goes down.
At all the jobs I had, the valuable stuff was shipped features. The baseline was for them to work well and to be released on time. The struggle was never writing the code, it was to clarify specifications. By comparison, learning libraries and languages was fun.
I don't really need AI to write code for me, because that's the easy part. The aspect that it needs to be good at is helping me ship features that works. And to this date, there's never been a compelling showcase for that one.
That's the problem. The new norm will be 10x of pre-AI productivity, nobody will be able justify hand-writing code. And until the quality bar of LLM's/their successors get much better (see e.g. comments above looking at the details in the examples given), you'll get accumulation of errors that are higher than what decent programmers get. With higher LOC and more uninspected complexity, you'll get significantly lower quality overall. The coming wave of AI-coded bugs will be fun for all. GOTO FAIL;
After spending a week coding exclusively with AI assistants, I got functional results but was alarmed by the code quality. I discovered that I didn't actually save much time, and the generated code was so complex and unfamiliar that I was scared to modify it. I still use Copilot and Claude and would say I'm able to work through problems 2-3x faster than I would be without AI but I wouldn't say I get a 10x improvement.
My projects are much more complex than standard CRUD applications. If you're building simple back-office CRUD apps, you might see a 10x productivity improvement with AI, but that hasn't been my experience with more complex work.
Giving your anecdotal experience is only useful if you include anecdotal context: seniority, years of experience, technology, project size, sprint goal complexity…
Can you talk through specifically what sprint goals you’ve completed in an afternoon? Hopefully multiple examples.
Grounding these conversations in an actual reality affords more context for people to evaluate your claims. Otherwise it’s just “trust me bro”.
And I say this as a Senior SWE who’s successfully worked with ChatGPT to code up some prototype stuff, but haven’t been able to dedicate 100+ hours to work through all the minutia of learning how to drive daily with it.
I think experiences vary. AI can work well with greenfield projects, small features, and helping solve annoying problems. I've tried using it on a large Python Django codebase and it works really well if I ask for help with a particular function AND I give it an example to model after for code consistency.
But I have also spent hours asking Claude and ChatGPT with help trying to solve several annoying Django problems and I have reached the point multiple times where they circle back and give me answers that did not previously work in the same context window. Eventually when I figure out the issue, I have fun and ask it "well does it not work as expected because the existing code chained multiple filter calls in django?" and all of a sudden the AI knows what is wrong! To be fair, there was only one sentence in the django documentation that mentions not chaining filter calls on many to many relationships.
Also somewhat strangely, I've found Python output has remained bad, especially for me with dataframe tasks/data analysis. For remembering matplotlib syntax I still find most of them pretty good, but for handling datagframes, very bad and extremely counter productive.
Saying that, for typed languages like TypeScript and C#, they have gotten very good. I suspect this might be related to the semantic information can be found in typed languages, and hard to follow unstructured blobs like dataframes, and there for, not well repeated by LLMs.
The more I browse through this, the more I agree. I feel like one could delete almost all comments from that project without losing any information – which means, at least the variable naming is (probably?) sensible. Then again, I don't know the application domain.
Also…
there is a lot of obviously useful abstraction being missed, wasting lines of code that will all need to be maintained.The scary thing is: I have seen professional human developers write worse code.
> I feel like one could delete almost all comments from that project without losing any information
I far from a heavy LLM coder but I’ve noticed a massive excess of unnecessary comments in most output. I’m always deleting the obvious ones.
But then I started noticing that the comments seem to help the LLM navigate additional code changes. It’s like a big trail of breadcrumbs for the LLM to parse.
I wouldn’t be surprised if vibe coders get trained to leave the excess comments in place.
>The scary thing is: I have seen professional human developers write worse code.
This is kind of the rub of it all. If the code works, passes all relevant tests, is reasonably maintainable, and can be fitted into the system correctly with a well defined interface, does it really matter? I mean at that point its kind of like looking at the output of a bytecode compiler and being like "wow what a mess". And it's not like they can't write code up to your stylistic standards, it's just literally a matter of prompting for that.
> If the code works, passes all relevant tests, is reasonably maintainable, and can be fitted into the system correctly with a well defined interface, does it really matter?
You're not wrong here, but there's a big difference in programming one-off tooling or prototype MVPs and programming things that need to be maintained for years and years.
We did this song and dance pretty recently with dynamic typing. Developers thought it was so much more productive to use dynamically typed languages, because it is in the initial phases. Then years went by, those small, quick-to-make dynamic codebases ended up becoming unmaintainable monstrosities, and those developers who hyped up dynamic typing invented Python/PHP type hinting and Flow for JavaScript, later moving to TypeScript entirely. Nowadays nobody seriously recommends building long-lived systems in untyped languages, but they are still very useful for one-off scripting and more interactive/exploratory work where correctness is less important, i.e. Jupyter notebooks.
I wouldn't be surprised to see the same pattern happen with low-supervision AI code; it's great for popping out the first MVP, but because it generates poor code, the gung-ho junior devs who think they're getting 10x productivity gains will wisen up and realize the value of spending an hour thinking about proper levels of abstraction instead of YOLO'ing the first thing the AI spits out when they want to build a system that's going to be worked on by multiple developers for multiple years.
> those small, quick-to-make dynamic codebases ended up becoming unmaintainable monstrosities
In my experience, type checking / type hinting already starts to pay off when more than one person is working on an even small-ish code base. Just because it helps you keep in mind what comes/goes to the other guy's code.
what are you going to do when something suddenly doesn't work and cursor endlessly spins without progress no matter how many "please don't make mistakes" you add? delete the whole thing and try to one-shot it again?
At the very least, if a professional human developer writes garbage code you can confidently blame them and either try to get them to improve or reduce the impact they have on the project.
With AI they can simply blame whatever model they used and continually shovel trash out there instantly.
Ok - not wrong at all. Now take that feedback and put it in a prompt back to the LLM.
They’re very good at honing bad code into good code with good feedback. And when you can describe good code faster than you can write it - for instance it uses a library you’re not intimately familiar with - this kind of coding can be enormously productive.
> They’re very good at honing bad code into good code with good feedback.
And they're very bad at keeping other code good across iterations. So you might find that while they might've fixed the specific thing you asked for—in the best case scenario, assuming no hallucinations and such—they inadvertently broke something else. So this quickly becomes a game of whack-a-mole, at which point it's safer, quicker, and easier to fix it yourself. IME the chance of this happening is directly proportional to the length of the context.
This typically happens when you run the chat too long. When it gives you a new codebase, fire up a new chat so the old stuff doesn't poison the context window.
Nah. This isn’t true. Every time you hit enter you’re not just getting a jr dev, you’re getting a randomly selected jr dev.
So, how did I end up with a logging.py, config.py, config in __init__.py and main.py? Well I prompted for it to fix the logging setup to use a specific format.
I use cursor, it can spit out code at an amazing rate and reduced the amount of docs I need to read to get something done. But after its second attempt at something you need to jump in and do it yourself and most likely debug what was written.
Are you reading a whole encyclopedia each time you assigned to a task? The one thing about learning is that it compounds. You get faster the longer you use a specific technology. So unless you use a different platform for each task, I don't think you have to read that much documentation (understanding them is another matter).
I do plan on experimenting with the latest versions of coding assistants, but last I tried them (6 months ago), none could satisfy all of the requirements at the same time.
Perhaps there is simply too much crappy Python code around that they were trained on as Python is frequently used for "scripting".
Perhaps the field has moved on and I need to try again.
But looking at this, it would still be faster for me to type this out myself than go through multiple rounds of reviews and prompts.
Really, a senior has not reviewed this, no matter their language (raciness throughout, not just this file).
Here's a rl example from today:
I asked $random_llm to give me code to recursively scan a directory and give me a list of file names relative to the top directory scanned and their sizes.
It gave me working code. On my test data directory it needed ... 6.8 seconds.
After 5 min of eliminating obvious inefficiencies the new code needed ... 1.4 seconds. And i didn't even read the docs for the used functions yet, just changed what seemed to generate too many filesystem calls for each file.
Nice, sounds like it saved you some time.
You "AI" enthusiasts always try to find a positive spin :)
What if I had trusted the code? It was working after all.
I'm guessing that if i asked for string manipulation code it would have done something worth posting on accidentally quadratic.
I "love" this part:
An extremely useful and insightful comment. Then you look where it's actually used, ... so like, the entire function and its call (and its needlessly verbose comment) could be removed because the existence of the directory is being checked anyway by pathlib.This might not matter here because it's a small, trivial example, but if you have 10, 50, 100, 500 developers working on a codebase, and they're all thoughtlessly slinging code like this in, you're going to have a dumpster fire soon enough.
I honestly think "vibe coding" is the best use case for AI coding, because at least then you're fully aware the code is throwaway shit and don't pretend otherwise.
edit: and actually looking deeper, `ensure_dir_exists` actually makes the directory, except it's already been made before the function is called so... sigh. Code reviews are going to be pretty tedious in the coming years, aren't they?