Simple: Release Early. Release Often until it Ain't Broke. Don't change things that Ain't Broke.
I don't think there's much consternation over actual bugfixes. There's consternation over constantly-shifting UIs and APIs.
The only time there's any tension between those two things is when people conflate bugs with poor design. The messy but correct thing to do when you've correctly implemented a poor design is to maintain backward compatibility while also offering the option to use the newer, more correct behavior. See, e.g., strcat() → strncat() → strlcat()
Preferably also provide assistance to people using the old things, in the form of making the old API a wrapper for the new API where possible, adding warnings for the old API, adding migration tools etc.
Meanwhile Canonical: "Let's fix what ain't broke at least once every 2 years."
(I think it would be strcat() → strlcat(), since strncat() is not really safer / newer.)
btw, strlcpy() is now being removed from Linux kernel code. strscpy() is the latest "secure" string copy function:
Definitely not safer (perhaps even more dangerous due to its unexpected semantics), but I do think it's slightly newer. It doesn't appear in K&R 1st edition, released in 1978, but does appear in K&R 2nd edition from 1988.
OpenBSD claims it first appeared in Version 7 Unix, which was released in 1979.
Just always use memcpy, you should always know the size of your buffers and if you don’t you will get a buffer overflow at some point.
I've heard this advice before, and the problem with this advice is that people who just use memcpy in practice often write code with bugs in it. So the advice is bad advice. Good advice will result in code that has fewer bugs, or bugs with lower severity.
It can also lead to accidentally quadratic behavior which is why the modernized string ops that get so much hate these days rely on one iteration over the input.
Man, I feel this.
I worked for a corporation that was so change-averse, that we needed to buy our development systems on eBay.
They actually had a point, and it was hard to argue against, but I found it absolutely infuriating, as the solution was to plan for change, and establish a process to be constantly evaluating and refining for new developments.
Instead, it was "Wait until we can't bear it any longer, then have a huge screaming match meeting." A new budget would be approved for new machines (and, thus, new operating systems and development tools), processes would be updated, and that would sit, until it was no longer new.
As I said, it was hard to argue against, because it was a 100-year-old company that had been successfully delivering really high-end stuff, since Day One. They did that by being so conservative that they hadn't discovered fire, yet. Measure 300 times, cut once, etc.
When there was a problem, it escalated quickly (and was used as fuel to tighten things down even more), as the company was held to standards that are probably up there with NASA.
Speaking of NASA, I can't help but notice that this rather plucky little outfit, called SpaceX, seems to be running circles around them. They seem to have figured out how to "fail fast," yet also deliver insanely high Quality stuff.
Might be worth ignoring their CEO's tweets, and look at what they are doing...
SpaceX shows the success of using modern, commodity hardware rather than purchased-by-government contract obsolete components for space applications. The same is true for the cubesat revolution.
I honestly think the idea of "if it ain't broke, don't fix it" is pretty silly as applied to large, engineered systems. If your system is more than 1000 lines of code, I guarantee you have a latent bug somewhere. Even if you don't have any bugs, the ecosystem around you changes so fast that you eventually will.
I'd be curious though, in 20 years, what kind of technical debt SpaceX will accumulate.
I mean, as vendors, part of the debt we inherit is that of our customers. Then our customers switch to something else. As vendors, how do we support both the past and the future? By making changes opt-in if possible, but still, that technical debt is there, and requires devs and other support staff.
I don't work in anything rocket-science, but I stopped asking clients if they want to upgrade. We decide when to do it for them, and we take the blame if something goes sideways.
I still prefer to have something break, than lose a client because they didn't realize they could upgrade instead of opting for another product (and they probably silently grew resentful at the lack of features of the old version).
> I'd be curious though, in 20 years, what kind of technical debt SpaceX will accumulate.
They're already 20 years in. Wonder if there's anything left from the beginning (or if it all got blown up).
I mean falcon 9 is the tech debt. they are no longer working on it, only maintenance mode. their goal is to get a new stack of tech working, before falcon 9 becomes too out dated. falcon 9 had such. ahead start, that it is still better than everything else in it's class. but it won't be long now until it because just another rocket in the market. I give 5 -10 years and it isn't special anymore. SpaceX is betting that starship will be larger and cheaper and working by then.
nasa also doesn't have the luxury of picking their own direction, or tech stack, or doing whatever they want. nasa is burdened by Congress dictation what rocket to build, even if it isn't the rocket nasa wants. look at the river program though, nasa still does cutting edge stuff with cutting edge hardware, at really high quality. just don't look at sls or iss
Don’t fix what isn’t broken is tough, because broken is debatable.
Is your stable Univac mainframe application broken? Operationally, probably not. But your business cannot adapt becuase it relies on a platform for which everyone who knows it is dead.
I think libraries (used as dependencies by other software) and "top-level" software/applications (used by users directly, whether that user is a developer or not) -- have different pressures and concerns.
Libraries should be much more conservative. I don't think "release early and often" was said about libraries.
And yes, it's tricky that it's not always a clear line between these two categories. A unix command line utility is sort of top-level software, but also likely to be use in a scripted/automated fashion.
So it's not always cut and dry, but the more you are aware of software depending on your software, the more careful you should be with releases.
A great example of an end-user application that is like a library is Microsoft Excel. Sure, the UX can change, but the formula engine has to be extremely conservative, staying bug-compatible with basically every version, or all hell breaks lose.
I notice the same pattern on an infrastructural level.
If I work on a test system, or a system a small number of people depend on... a year or two ago this would've hurt my pride, but why spend 4 hours planning a change, if you can just muddle through problems in 2 hours. The user impact will be none or low based on the assumption, so move quickly, break things, fix things, document the problems and fixes. Everything's good.
Yet, I also have systems a dozen teams or more rely upon. For these systems, we have to move much slower and more deliberate. We cannot touch some of our database with larger changes, unless we have a way in, a way out, and a prediction how long all of these take, as well as announcement to the customers, and so on, and so on. We'd love to update those faster, but it's rough.
I agree. As a developer using libraries and component, I really hate "release early and often".
As a user, I also hate "release early and often".
In both roles, releasing early and often effectively means I'm always using beta software, with all the headaches using such software brings.
However, as part of an internal development process -- that is, before customers see the release, "fail fast" is a totally legitimate and decent approach.
I think it depends a lot on where the library is in its lifecycle.
A new library should iterate rapidly and be very clear about that. Big warnings, obvious version numbering, and good communications. You look for an audience of actual users that can keep up with that. That's how you figure out what the right library is, both in terms of the big abstractions and the small details.
But then once you hit a 1.0 release, things change. You're shifting from an audience of fellow explorers to an audience who wants stability. You can still do exploratory work, but it has to be additive in the 1.x series, and eventually you need to start a 2.x series as you learn more about what needs to change. So the iteration still happens, it just happens away from the people who just want the basic thing you got right via the initial burst of iteration.
I would love to read a Malcom Gladwell book on this topic. It touches on a core truth, not only in engineering but in society as a whole. Three examples from different spheres:
In software, I've seen repeatedly how new hires with big ideas (and youthful confidence that they know a better way of doing things) will come into a company and want to rewrite things "properly." In one management position, much of my role involved the politics of defending a well-working, well-maintained, big-revenue-driving application from a steady onslaught of such ambitious exuberance.
At the time, I thought "this must be the same phenomenon that drives shrinkflation": new MBA arrives at pickle-making company, seeks to bolster their career by demonstrably saving the company millions, convinces execs to put one less pickle in each pickle jar, consumers won't notice; step 3: profit!! In 2022, when you buy a cereal box, it's half air on the inside. In 1980, it was only 20% air.
I also shake my head when media pundits equate the success of a particular congressional session with the amount of new legislation they pass. I can imagine no simpler way to cruft-paralyze a democracy (while following the rules) than "releasing laws early and often."
> In software, I've seen repeatedly how new hires with big ideas (and youthful confidence that they know a better way of doing things) will come into a company and want to rewrite things "properly."
In my experience, these new hires are right a certain percentage of the time, and wrong a certain percentage of the time. I can take no default position here.
I tell new hires that they're gonna see something that offends their sensibilities about the way code should be written. That puts them in the center of a conflict. On the one hand, they believe the code should be written differently. On the other hand, someone else wrote the code that way for a reason. Their first step is to dig in and figure out the reasons on both sides--why it was written one way, why they think it should be written another. The second step is to figure out what the consequences and risks are of making changes or not making changes.
The way I think about it, I want to simultaneously keep the software working, and protect the "flame" that new engineers have to improve things--something which can be all too easily extinguished.
That’s a sensible way to temper newbies enthusiasm with experience.
Newbies may be right in that a system would be better rewritten; but often wrong that they think they can do it ‘properly’. At least, if they dive straight into it.
So making them use the Charleston’s fence approach is a good one.
Shrink-flation is driven by short term quarter earnings thinking. Reducing the amount of your product (and therefore it's overall quality compared to competitors), is an act of cannibalization of a brand.
They may increase profits for a short time, but as consumers realize what's happened, ultimately they see the brand as not having standards and lead to it becoming a commodity. A small group of long term invested owners would never do such a thing, because that would be 'killing the goose who laid the golden egg', but modern shareholding has no loyalty to quality.
They're not opposite.
"If it ain't broke don't fix it" => don't try to "optimize" something that works well. Maybe it will be able to handle more load after your big refactor, but if it already handle the load nicely and there is no indication load will go up, don't bother.
"Release early and often" => release after each change, don't bundle everything into a big release that happen every 6 months.
The first one is about deciding what to do, the second one is how you release the work you did.
> "Release early and often" => release after each change, don't bundle everything into a big release that happen every 6 months
After years of dealing with software that does this, I've soured on it completely. It has resulted in greater difficulty using software, and the software being less solid.
I've seen this happen as well; perhaps for different reasons.
My early career was mostly fin-tech and when installing into banks or other financial institutions, they _very much_ want the every year or 6 mo release cycle. And not just because of "old stodgy", but rather they want a good bit of ramp-up time to ensure things are running right. In a Wall St trading firm I worked at, a release ran in parallel with real data for MONTHS before the old version was retired.
So, I'm used to that model.
From the customer's standpoint, they could "gear up" for the next release, get their ducks in a line, get the right people to assess things. I get that some of that is because the infrequent releases are "big", but also as a human customer it's fun to see a shiny new truck toy rather than a coat of anti-rust paint on its exhaust pipe, then a replacement for the broken window crank, then a new color on the odometer numbers, then ...
As a developer I also miss the Big Project Ramp Up, then a period of a lot of work (which can be Waterfall, or Agile, or bespoke artisinal project management, which is pretty much 99% of the time, just CALLED "Agile"), then a Big Project Release (+ party, some down time, and prepare for the next one).
SAAS release early/often is like death by 1000 boring dull cuts. They cut out the spikiness _so much_ that there is so little novelty that it's just not fun to work on it.
I've seen this happen as well, but I think it's more of a problem of bundling too small a changeset in a release. A partial implementation of a feature shouldn't warrant a release (without feature flags to disable it), but bundling a year's worth of work in a single release is, in my opinion, a recipe for disaster (and this is sadly something that happens all too often in enterprise and non-software engineering businesses).
Exactly! False dichotomy.
Those are two different things.. Nothing to balance about. But put it in a graph or drawing, and it means it's true and supported by 'science'
Release early, don't fix it
Creating bugs? No, I am creating learning opportunities for more junior developers.
Are you my manager?
"Move Fast and Break Things" at maximum speed!
This is the way.
More often than not its "if it ain't broke don't fix it because we are too busy releasing early and often".
The problem is a bunch of overpaid devs conflate anything that isn't absolute bleeding edge as a "bug", and we get into and endless cycle of UI refreshes and updates that exist solely to exist.
How about use Feature flags and ship daily? Decouple your release-schedule/deployments from your product iteration/launch timeline.
Never miss an opportunity to add complexity!
Centralized coordination complexity v/s decentralized FF rollout complexity. The bigger the team contributing to a deploy, the better the latter gets because engineers rolling out their FF have all the context they need.
Feature flags or another way to release bleeding edge features to only some of your users can sometimes be the best way to iterate over new features quickly when your most user base is risk-averse
Unofficial RedHat motto.
I think I know your meaning, though I would rephrase it as decoupling your deployments from your releases/launches.
These are not mutually exclusive. It seems like the whole premise here just lacking nuance.
We have relevant case of this. We’re on Angular 8 and the app is working just fine - customer is happy, it’s pretty easy to maintain, well-tested and hardened, etc., but we are now facing having to update to 14.x “or else”. It’s gonna be a long and difficult migration.
Well depending on the size of your codebase, npm dependencies and Angular APIs used, it might or might not be a big deal. You can have a rough estimate of the complexity by running 'ng update'. Yes major releases come at a rapid pace and it is hard to always be on the latest version. We update regularly once we are 2-3 major versions behind and usually it's not so awful.
Lord Vetinari in the Discworld lives by the mantra that "If it ain't broke, don't fix it". Other characters noted that, under his governance, "things that didn't work ... got broken."
Break it often and don't fix it.
TFA seems to try to pin DEC's collapse & failure on their committment to back compatibility. Yet Microsoft has essentially done the same thing with Windows, and has not suffered the way DEC did. Seems like a weak argument.
How do people disentangle "ship the incorrect thing and let customers let us know what they really want" vs "ship the correct thing but built incorrectly and let customers know why its wrong". I hate the second with a burning passion because it seems like its bound to give the impression that you really don't know what you are doing. I've been put into situations too many times where I just could not get the point across that "yes I understand that we may not be giving the customers what they want, but what you are telling me to build is clearly not correct" its maddening. Sorry if its off topic but it seems at least adjacent to the articles balance. This seems to happen with PMs who do not have a deep enough background in the technical detail and yet reject the feedback given because it will reflect that poor understanding. "Agile" environments seem to enable this kind of behavior. I'm beginning to think I need to pivot to PM in my career so I can avoid the madness as a developer.
My personal take is that "built incorrectly" should almost never be tolerated. The customer doesn't need to know how it was built, merely how long it will take. But if by "built incorrectly" you mean an unwise business decision (logging in with a phone number for instance) then yeesh, I wish I had an answer. Putting your foot down sometimes means leaving a job.
That depends on your customer. Some are cool with 'something is better than nothing'. Some are not. Some also are very 'I put it in the contract just do it that way' others are 'I just want the right thing'. Some just do not know what they want at all and just want you to do it and put all the liability off onto you but do not really want to spell out what that means at all. Being a PM is that turned up to 11 but now you do not control the code at all.
I guess what I'm talking about is this:
1) PO wants feature X for customer P.
2) Architect in meeting with PM and PO indicates that feature X is a simple addition to Y
3) Engineer is assigned X and says no this isn't a simple addition to Y because on deeper inspection it will fail for cases 1,2,3, we would need a different architecture.
4) PM says build it the naive way, let the customers find case 1,2,3 before we fix, we are an Agile team after all.
5) I quit.
Without details, it is really, really, really hard to say whether cases 1, 2, 3 are real, important issues that need to be fixed or just needless complexity that 99.9% of users are never going to care about. The PM is trying to distinguish between those two cases because they've been burned before on developers double, tripling, and quadrupling their estimates as they find "one more thing" that is inelegant or could potentially be improved and delaying the launch schedule. You need to convincingly make the case to the architect (and to the PM) that issues 1, 2 and 3 are critical to the users, fundamental to the design and going to be much harder to clean up later if we don't spend the time to get them right now. Or you're wrong, and you should just suck it up and build the simpler thing today and leave the complexity of the new thing for the future, when you can decide whether the feature is even worth the cost of having whatever new architecture it would require.
Trust the Engineer. PMs care only for date.
Exactly. Also to add it depends on your business model of how you are selling software. Some customers are perfectly fine with MVP and iteration. Others will want everything spelled out beforehand in a contract 3 inches thick. That many times depends on how you are selling your work and what the company you are selling to expects. One place I worked we always shipped MVP. Then would get customers to pay for any new features. New features could be cases 2 and 3 do not work correctly for the customer, however case 1 works just fine for the first customer but they never use 2 and 3. It can come down to how your companies budget works which will drive the business model you have. Now if you are shipping 'boxed' software that mentality could end your product. As instead of a reputation of 'works nicely with the customers' you are 'this junk software is broken out of the box'. It is one of the things I ask during an interview. I want to know what sort of shop it is 'how do you sell your software'. Each are viable methods to make money but some people do not like working that way. I am flexible but I would like to know up front.
Some truths to keep in mind:
- Every company is navigating the marketplace, and making decisions with imperfect information
- Not every decision will be perfect (or even, good)
- Not every decision-maker will be perfect (or even, good)
- Even a collection of individually smart/reasonable people, can end up collectively making pretty awful/illogical decisions
- "Good" decisions don't guarantee market success; conversely, "bad" decisions can still result in good outcomes
- Judging the quality of decisions and decision-makers based on outcomes, is an imperfect measure of the actual "quality" of those things/people
In your first example, you were presumably an entry-level engineer, but you either mistakenly took on too much burden (emotional or practical) in terms of decision-making yourself, or you misunderstood what types of expectations you should have for the actual decision-makers.
Decision-makers are allowed / expected to make such bets: "how many and which corners can we cut as a company, to get a product out to market, that clients will want to purchase, in a sensible time-frame?" This is not unusual, this happens all the time, at every single company, all around the world. The companies who do this more successfully, are the ones who find a sweet spot between cost-cutting, efficiency, time-to-market, and customer demands/satisfaction/delight. This is a very difficult thing to juggle, and really really smart business leaders consistently fail to find the right balance, or make the wrong calls. Hopefully the mistakes aren't fatal to a company, but unavoidably sometimes they will be. So yes, your company leaders made a bad call based on the outcome, but that on its own is not enough to indict the decision or the decision-makers as being fundamentally wrong.
The fact that your company made a set of decisions that ultimately led to failure, doesn't necessarily prove that they were a bad company. And to be a devil's advocate for a second, even "mislabeling standard X" might be forgivable under certain circumstances, such as launching a product with an "X pending" label, even though you didn't finish certification process for X yet, or maybe you didn't even start (but hey not starting doesn't mean it can't say "pending").
As a manager, I actually actively filter-in for what Amazon would call "Have Backbone" as a value, when interviewing engineers, and I ask them to provide examples of times where they fundamentally disagreed with the product team, disagreed with what they were asked to build, disagreed with a proposed architecture, etc. I want engineers on my team who will speak up, who are opinionated, who care enough about their work to take pride in it and put forth effort to improve beyond the status quo.
That being said, your examples seem to indicate a rigidity of black/white thinking, all-or-nothing thinking, and an inability to collaborate towards finding a solution. These were probably the most extreme examples you had, so I'm not judging every interaction or your entire personality as being so rigid, but hopefully you have by now experienced other examples in your career, where collaborative problem-solving was possible, where you did more than point out fatal flaws but also helped formulate a path to mitigate or solve them. The companies where that was more encouraged or made possible, are the ones you probably want to work for.
Well, this isn't prima facie unreasonable without more info (or knowledge of the implicit assumptions you are making but not publishing).
- Are cases 1,2,3 named that way, because they are the top priority cases (i.e. the #1, #2, and #3 most important product features that customers care about)? Even if they are, what is the cost of a new architecture? Will it take you 3 years and 20 engineers, to rebuild Y or to make Y.v2, just so you can support X "properly"? By then the market may have moved on, the feature may be worthless, so it may make perfect sense to deliver a bad version of X that relies on Y.
- Or, are cases 1,2,3 legitimately either rare, or low-impact, or do have viable manual workarounds? If so, then it's entirely reasonable to defer/punt on doing new architecture right now, because either you know these cases are unimportant, or at least you don't have positive proof that these cases are important enough to justify new architecture. With more data, or clear customer demand, you can make a better case for rebuilding Y "properly". The real problem comes later: what happens if you do get strong signals of customer demand, you can prove the current solution is not scalable or extensible, and yet the business still decides that Y is good enough to never touch... well that's a business that doesn't want to stay in business.
Agile is about practicality/pragmatism, over adherence to dogma or preconceived notions. Just because Y is the wrong architecture to deliver X, does not mean it is the wrong decision to ship partial feature X. Don't be dogmatic about "correct architecture", if you care about for-profit software engineering as a profession.
Of course, if your goal is different, if SWE is a craft or a hobby or an ivory tower pursuit for you, then feel free to make whatever decisions you want that don't fit your vision of "correctness".
See this comment for more specifics: https://news.ycombinator.com/item?id=32953007
I think there its a little unclear that by case, I mean testcases that would fail to pass to fulfill a single feature. E.g. "I need an addition feature for a calculator" but naive implementation will result in it working for 1+1 and fail for all others.
Seems like you’re working with a PM who thinks agile is an excuse to release bad software. MVP should still solve a customer pain point.
To play devils advocate though, maybe case 1,2,3 are low enough risk to release. Having metrics set to watch if these are actually big problems could be “good enough.”
I would do 4.5) escalate to PM boss everytime 1,2,3 happens.
> PM says build it the naive way
Why PM has an option to choose here?
> will fail for cases 1,2,3
a) tip QA to test these cases
b) add cases 1,2,3 to "Known issues"
After all that is the reason they call it "beta": "Cause it beta then nothing"
And if it ain't broke, fix it 'til it is!
Just keep things backwards compatible, then do whatever you want.
“If it ain’t broke, don’t fix it” is the absolute worst mindset to have! I've seen systems that are literally garbage because of it.
Error Budgets are your friend
tl;dr continually test, continually deploy.
Grotesque generalities. The devil hides in the details: planned obsolescence and redhad (IBM) with its glibc symbol versionning frenzy and not a clean ELF ABI set of libs (as an example).
Yep. Open source is not from enough, lean and stable in time open source is required.
> Customers often ask me, why pour time into updates when what I have runs just fine? The answer is that a bug in your old version might assert itself at the worst possible time and force you to forego sleep for the next five days to recover
Why not do proper testing ? I know it's expensive. And when you have an OS as an init system is even difficult. It is sad the the UNIX philosophy is dying being replaced with the Windows philosophy.
You cannot test everything to perfection. Shit will break in production. Question is: how to minimize shit breaking in production and hopefully eliminating critical bugs that can cause revenue/data loss.