Back

Balancing “If it ain’t broke, don’t fix it” vs. “Release early and often”

87 points8 hoursredhat.com
torstenvl5 hours ago

Simple: Release Early. Release Often until it Ain't Broke. Don't change things that Ain't Broke.

I don't think there's much consternation over actual bugfixes. There's consternation over constantly-shifting UIs and APIs.

The only time there's any tension between those two things is when people conflate bugs with poor design. The messy but correct thing to do when you've correctly implemented a poor design is to maintain backward compatibility while also offering the option to use the newer, more correct behavior. See, e.g., strcat() → strncat() → strlcat()

pabs35 hours ago

Preferably also provide assistance to people using the old things, in the form of making the old API a wrapper for the new API where possible, adding warnings for the old API, adding migration tools etc.

moffkalast2 hours ago

Meanwhile Canonical: "Let's fix what ain't broke at least once every 2 years."

klodolph4 hours ago

(I think it would be strcat() → strlcat(), since strncat() is not really safer / newer.)

cpeterso3 hours ago

btw, strlcpy() is now being removed from Linux kernel code. strscpy() is the latest "secure" string copy function:

https://lwn.net/Articles/905777/

torstenvl3 hours ago

Definitely not safer (perhaps even more dangerous due to its unexpected semantics), but I do think it's slightly newer. It doesn't appear in K&R 1st edition, released in 1978, but does appear in K&R 2nd edition from 1988.

OpenBSD claims it first appeared in Version 7 Unix, which was released in 1979.

https://man.openbsd.org/strncat.3 https://en.wikipedia.org/wiki/Version_7_Unix

david2ndaccount4 hours ago

Just always use memcpy, you should always know the size of your buffers and if you don’t you will get a buffer overflow at some point.

klodolph4 hours ago

I've heard this advice before, and the problem with this advice is that people who just use memcpy in practice often write code with bugs in it. So the advice is bad advice. Good advice will result in code that has fewer bugs, or bugs with lower severity.

kevin_thibedeau3 hours ago

It can also lead to accidentally quadratic behavior which is why the modernized string ops that get so much hate these days rely on one iteration over the input.

ChrisMarshallNY4 hours ago

Man, I feel this.

I worked for a corporation that was so change-averse, that we needed to buy our development systems on eBay.

They actually had a point, and it was hard to argue against, but I found it absolutely infuriating, as the solution was to plan for change, and establish a process to be constantly evaluating and refining for new developments.

Instead, it was "Wait until we can't bear it any longer, then have a huge screaming match meeting." A new budget would be approved for new machines (and, thus, new operating systems and development tools), processes would be updated, and that would sit, until it was no longer new.

As I said, it was hard to argue against, because it was a 100-year-old company that had been successfully delivering really high-end stuff, since Day One. They did that by being so conservative that they hadn't discovered fire, yet. Measure 300 times, cut once, etc.

When there was a problem, it escalated quickly (and was used as fuel to tighten things down even more), as the company was held to standards that are probably up there with NASA.

Speaking of NASA, I can't help but notice that this rather plucky little outfit, called SpaceX, seems to be running circles around them. They seem to have figured out how to "fail fast," yet also deliver insanely high Quality stuff.

Might be worth ignoring their CEO's tweets, and look at what they are doing...

pclmulqdq4 hours ago

SpaceX shows the success of using modern, commodity hardware rather than purchased-by-government contract obsolete components for space applications. The same is true for the cubesat revolution.

I honestly think the idea of "if it ain't broke, don't fix it" is pretty silly as applied to large, engineered systems. If your system is more than 1000 lines of code, I guarantee you have a latent bug somewhere. Even if you don't have any bugs, the ecosystem around you changes so fast that you eventually will.

mgbmtl3 hours ago

I'd be curious though, in 20 years, what kind of technical debt SpaceX will accumulate.

I mean, as vendors, part of the debt we inherit is that of our customers. Then our customers switch to something else. As vendors, how do we support both the past and the future? By making changes opt-in if possible, but still, that technical debt is there, and requires devs and other support staff.

I don't work in anything rocket-science, but I stopped asking clients if they want to upgrade. We decide when to do it for them, and we take the blame if something goes sideways.

I still prefer to have something break, than lose a client because they didn't realize they could upgrade instead of opting for another product (and they probably silently grew resentful at the lack of features of the old version).

soperj3 hours ago

> I'd be curious though, in 20 years, what kind of technical debt SpaceX will accumulate.

They're already 20 years in. Wonder if there's anything left from the beginning (or if it all got blown up).

Beached44 minutes ago

I mean falcon 9 is the tech debt. they are no longer working on it, only maintenance mode. their goal is to get a new stack of tech working, before falcon 9 becomes too out dated. falcon 9 had such. ahead start, that it is still better than everything else in it's class. but it won't be long now until it because just another rocket in the market. I give 5 -10 years and it isn't special anymore. SpaceX is betting that starship will be larger and cheaper and working by then.

Beached46 minutes ago

nasa also doesn't have the luxury of picking their own direction, or tech stack, or doing whatever they want. nasa is burdened by Congress dictation what rocket to build, even if it isn't the rocket nasa wants. look at the river program though, nasa still does cutting edge stuff with cutting edge hardware, at really high quality. just don't look at sls or iss

Spooky233 hours ago

Don’t fix what isn’t broken is tough, because broken is debatable.

Is your stable Univac mainframe application broken? Operationally, probably not. But your business cannot adapt becuase it relies on a platform for which everyone who knows it is dead.

jrochkind15 hours ago

I think libraries (used as dependencies by other software) and "top-level" software/applications (used by users directly, whether that user is a developer or not) -- have different pressures and concerns.

Libraries should be much more conservative. I don't think "release early and often" was said about libraries.

And yes, it's tricky that it's not always a clear line between these two categories. A unix command line utility is sort of top-level software, but also likely to be use in a scripted/automated fashion.

So it's not always cut and dry, but the more you are aware of software depending on your software, the more careful you should be with releases.

wongarsu5 hours ago

A great example of an end-user application that is like a library is Microsoft Excel. Sure, the UX can change, but the formula engine has to be extremely conservative, staying bug-compatible with basically every version, or all hell breaks lose.

tetha3 hours ago

I notice the same pattern on an infrastructural level.

If I work on a test system, or a system a small number of people depend on... a year or two ago this would've hurt my pride, but why spend 4 hours planning a change, if you can just muddle through problems in 2 hours. The user impact will be none or low based on the assumption, so move quickly, break things, fix things, document the problems and fixes. Everything's good.

Yet, I also have systems a dozen teams or more rely upon. For these systems, we have to move much slower and more deliberate. We cannot touch some of our database with larger changes, unless we have a way in, a way out, and a prediction how long all of these take, as well as announcement to the customers, and so on, and so on. We'd love to update those faster, but it's rough.

JohnFen4 hours ago

I agree. As a developer using libraries and component, I really hate "release early and often".

As a user, I also hate "release early and often".

In both roles, releasing early and often effectively means I'm always using beta software, with all the headaches using such software brings.

However, as part of an internal development process -- that is, before customers see the release, "fail fast" is a totally legitimate and decent approach.

wpietri4 hours ago

I think it depends a lot on where the library is in its lifecycle.

A new library should iterate rapidly and be very clear about that. Big warnings, obvious version numbering, and good communications. You look for an audience of actual users that can keep up with that. That's how you figure out what the right library is, both in terms of the big abstractions and the small details.

But then once you hit a 1.0 release, things change. You're shifting from an audience of fellow explorers to an audience who wants stability. You can still do exploratory work, but it has to be additive in the 1.x series, and eventually you need to start a 2.x series as you learn more about what needs to change. So the iteration still happens, it just happens away from the people who just want the basic thing you got right via the initial burst of iteration.

fendy30023 hours ago

Then let me introduce you to https://0ver.org/.

jjslocum34 hours ago

I would love to read a Malcom Gladwell book on this topic. It touches on a core truth, not only in engineering but in society as a whole. Three examples from different spheres:

In software, I've seen repeatedly how new hires with big ideas (and youthful confidence that they know a better way of doing things) will come into a company and want to rewrite things "properly." In one management position, much of my role involved the politics of defending a well-working, well-maintained, big-revenue-driving application from a steady onslaught of such ambitious exuberance.

At the time, I thought "this must be the same phenomenon that drives shrinkflation": new MBA arrives at pickle-making company, seeks to bolster their career by demonstrably saving the company millions, convinces execs to put one less pickle in each pickle jar, consumers won't notice; step 3: profit!! In 2022, when you buy a cereal box, it's half air on the inside. In 1980, it was only 20% air.

I also shake my head when media pundits equate the success of a particular congressional session with the amount of new legislation they pass. I can imagine no simpler way to cruft-paralyze a democracy (while following the rules) than "releasing laws early and often."

klodolph4 hours ago

> In software, I've seen repeatedly how new hires with big ideas (and youthful confidence that they know a better way of doing things) will come into a company and want to rewrite things "properly."

In my experience, these new hires are right a certain percentage of the time, and wrong a certain percentage of the time. I can take no default position here.

I tell new hires that they're gonna see something that offends their sensibilities about the way code should be written. That puts them in the center of a conflict. On the one hand, they believe the code should be written differently. On the other hand, someone else wrote the code that way for a reason. Their first step is to dig in and figure out the reasons on both sides--why it was written one way, why they think it should be written another. The second step is to figure out what the consequences and risks are of making changes or not making changes.

The way I think about it, I want to simultaneously keep the software working, and protect the "flame" that new engineers have to improve things--something which can be all too easily extinguished.

strawhatguy3 hours ago

That’s a sensible way to temper newbies enthusiasm with experience.

Newbies may be right in that a system would be better rewritten; but often wrong that they think they can do it ‘properly’. At least, if they dive straight into it.

So making them use the Charleston’s fence approach is a good one.

strawhatguy3 hours ago

Gah! Chesterton's

SQueeeeeL4 hours ago

Shrink-flation is driven by short term quarter earnings thinking. Reducing the amount of your product (and therefore it's overall quality compared to competitors), is an act of cannibalization of a brand.

They may increase profits for a short time, but as consumers realize what's happened, ultimately they see the brand as not having standards and lead to it becoming a commodity. A small group of long term invested owners would never do such a thing, because that would be 'killing the goose who laid the golden egg', but modern shareholding has no loyalty to quality.

eloisant5 hours ago

They're not opposite.

"If it ain't broke don't fix it" => don't try to "optimize" something that works well. Maybe it will be able to handle more load after your big refactor, but if it already handle the load nicely and there is no indication load will go up, don't bother.

"Release early and often" => release after each change, don't bundle everything into a big release that happen every 6 months.

The first one is about deciding what to do, the second one is how you release the work you did.

JohnFen4 hours ago

> "Release early and often" => release after each change, don't bundle everything into a big release that happen every 6 months

After years of dealing with software that does this, I've soured on it completely. It has resulted in greater difficulty using software, and the software being less solid.

michaelcampbell3 hours ago

I've seen this happen as well; perhaps for different reasons.

My early career was mostly fin-tech and when installing into banks or other financial institutions, they _very much_ want the every year or 6 mo release cycle. And not just because of "old stodgy", but rather they want a good bit of ramp-up time to ensure things are running right. In a Wall St trading firm I worked at, a release ran in parallel with real data for MONTHS before the old version was retired.

So, I'm used to that model.

From the customer's standpoint, they could "gear up" for the next release, get their ducks in a line, get the right people to assess things. I get that some of that is because the infrequent releases are "big", but also as a human customer it's fun to see a shiny new truck toy rather than a coat of anti-rust paint on its exhaust pipe, then a replacement for the broken window crank, then a new color on the odometer numbers, then ...

As a developer I also miss the Big Project Ramp Up, then a period of a lot of work (which can be Waterfall, or Agile, or bespoke artisinal project management, which is pretty much 99% of the time, just CALLED "Agile"), then a Big Project Release (+ party, some down time, and prepare for the next one).

SAAS release early/often is like death by 1000 boring dull cuts. They cut out the spikiness _so much_ that there is so little novelty that it's just not fun to work on it.

mdtusz4 hours ago

I've seen this happen as well, but I think it's more of a problem of bundling too small a changeset in a release. A partial implementation of a feature shouldn't warrant a release (without feature flags to disable it), but bundling a year's worth of work in a single release is, in my opinion, a recipe for disaster (and this is sadly something that happens all too often in enterprise and non-software engineering businesses).

drewcoo3 hours ago

Exactly! False dichotomy.

jbverschoor6 hours ago

Those are two different things.. Nothing to balance about. But put it in a graph or drawing, and it means it's true and supported by 'science'

henriquecm86 hours ago

Release early, don't fix it

tomalaci5 hours ago

Creating bugs? No, I am creating learning opportunities for more junior developers.

devteambravo4 hours ago

Are you my manager?

BurningFrog5 hours ago

"Move Fast and Break Things" at maximum speed!

badrabbit6 hours ago

This is the way.

jmartrican38 minutes ago

More often than not its "if it ain't broke don't fix it because we are too busy releasing early and often".

exabrial2 hours ago

The problem is a bunch of overpaid devs conflate anything that isn't absolute bleeding edge as a "bug", and we get into and endless cycle of UI refreshes and updates that exist solely to exist.

silentsea906 hours ago

How about use Feature flags and ship daily? Decouple your release-schedule/deployments from your product iteration/launch timeline.

allendoerfer6 hours ago

Never miss an opportunity to add complexity!

silentsea904 hours ago

Centralized coordination complexity v/s decentralized FF rollout complexity. The bigger the team contributing to a deploy, the better the latter gets because engineers rolling out their FF have all the context they need.

int0x2e4 hours ago

Feature flags or another way to release bleeding edge features to only some of your users can sometimes be the best way to iterate over new features quickly when your most user base is risk-averse

generalizations6 hours ago

Unofficial RedHat motto.

alldayeveryday5 hours ago

I think I know your meaning, though I would rephrase it as decoupling your deployments from your releases/launches.

silentsea904 hours ago

Edited.

bradwood52 minutes ago

These are not mutually exclusive. It seems like the whole premise here just lacking nuance.

temporallobe2 hours ago

We have relevant case of this. We’re on Angular 8 and the app is working just fine - customer is happy, it’s pretty easy to maintain, well-tested and hardened, etc., but we are now facing having to update to 14.x “or else”. It’s gonna be a long and difficult migration.

kmitz2 hours ago

Well depending on the size of your codebase, npm dependencies and Angular APIs used, it might or might not be a big deal. You can have a rough estimate of the complexity by running 'ng update'. Yes major releases come at a rapid pace and it is hard to always be on the latest version. We update regularly once we are 2-3 major versions behind and usually it's not so awful.

pdpi2 hours ago

Lord Vetinari in the Discworld lives by the mantra that "If it ain't broke, don't fix it". Other characters noted that, under his governance, "things that didn't work ... got broken."

chuso5 hours ago

Break it often and don't fix it.

PaulDavisThe1st3 hours ago

TFA seems to try to pin DEC's collapse & failure on their committment to back compatibility. Yet Microsoft has essentially done the same thing with Windows, and has not suffered the way DEC did. Seems like a weak argument.

peteradio7 hours ago

How do people disentangle "ship the incorrect thing and let customers let us know what they really want" vs "ship the correct thing but built incorrectly and let customers know why its wrong". I hate the second with a burning passion because it seems like its bound to give the impression that you really don't know what you are doing. I've been put into situations too many times where I just could not get the point across that "yes I understand that we may not be giving the customers what they want, but what you are telling me to build is clearly not correct" its maddening. Sorry if its off topic but it seems at least adjacent to the articles balance. This seems to happen with PMs who do not have a deep enough background in the technical detail and yet reject the feedback given because it will reflect that poor understanding. "Agile" environments seem to enable this kind of behavior. I'm beginning to think I need to pivot to PM in my career so I can avoid the madness as a developer.

MetaWhirledPeas5 hours ago

My personal take is that "built incorrectly" should almost never be tolerated. The customer doesn't need to know how it was built, merely how long it will take. But if by "built incorrectly" you mean an unwise business decision (logging in with a phone number for instance) then yeesh, I wish I had an answer. Putting your foot down sometimes means leaving a job.

sumtechguy7 hours ago

That depends on your customer. Some are cool with 'something is better than nothing'. Some are not. Some also are very 'I put it in the contract just do it that way' others are 'I just want the right thing'. Some just do not know what they want at all and just want you to do it and put all the liability off onto you but do not really want to spell out what that means at all. Being a PM is that turned up to 11 but now you do not control the code at all.

peteradio6 hours ago

I guess what I'm talking about is this:

1) PO wants feature X for customer P.

2) Architect in meeting with PM and PO indicates that feature X is a simple addition to Y

3) Engineer is assigned X and says no this isn't a simple addition to Y because on deeper inspection it will fail for cases 1,2,3, we would need a different architecture.

4) PM says build it the naive way, let the customers find case 1,2,3 before we fix, we are an Agile team after all.

5) I quit.

nightpool5 hours ago

Without details, it is really, really, really hard to say whether cases 1, 2, 3 are real, important issues that need to be fixed or just needless complexity that 99.9% of users are never going to care about. The PM is trying to distinguish between those two cases because they've been burned before on developers double, tripling, and quadrupling their estimates as they find "one more thing" that is inelegant or could potentially be improved and delaying the launch schedule. You need to convincingly make the case to the architect (and to the PM) that issues 1, 2 and 3 are critical to the users, fundamental to the design and going to be much harder to clean up later if we don't spend the time to get them right now. Or you're wrong, and you should just suck it up and build the simpler thing today and leave the complexity of the new thing for the future, when you can decide whether the feature is even worth the cost of having whatever new architecture it would require.

flerchin4 hours ago

Trust the Engineer. PMs care only for date.

+1
MetaWhirledPeas5 hours ago
+1
peteradio5 hours ago
strgcmc5 hours ago

Well, this isn't prima facie unreasonable without more info (or knowledge of the implicit assumptions you are making but not publishing).

- Are cases 1,2,3 named that way, because they are the top priority cases (i.e. the #1, #2, and #3 most important product features that customers care about)? Even if they are, what is the cost of a new architecture? Will it take you 3 years and 20 engineers, to rebuild Y or to make Y.v2, just so you can support X "properly"? By then the market may have moved on, the feature may be worthless, so it may make perfect sense to deliver a bad version of X that relies on Y.

- Or, are cases 1,2,3 legitimately either rare, or low-impact, or do have viable manual workarounds? If so, then it's entirely reasonable to defer/punt on doing new architecture right now, because either you know these cases are unimportant, or at least you don't have positive proof that these cases are important enough to justify new architecture. With more data, or clear customer demand, you can make a better case for rebuilding Y "properly". The real problem comes later: what happens if you do get strong signals of customer demand, you can prove the current solution is not scalable or extensible, and yet the business still decides that Y is good enough to never touch... well that's a business that doesn't want to stay in business.

Agile is about practicality/pragmatism, over adherence to dogma or preconceived notions. Just because Y is the wrong architecture to deliver X, does not mean it is the wrong decision to ship partial feature X. Don't be dogmatic about "correct architecture", if you care about for-profit software engineering as a profession.

Of course, if your goal is different, if SWE is a craft or a hobby or an ivory tower pursuit for you, then feel free to make whatever decisions you want that don't fit your vision of "correctness".

peteradio4 hours ago

See this comment for more specifics: https://news.ycombinator.com/item?id=32953007

I think there its a little unclear that by case, I mean testcases that would fail to pass to fulfill a single feature. E.g. "I need an addition feature for a calculator" but naive implementation will result in it working for 1+1 and fail for all others.

skellera6 hours ago

Seems like you’re working with a PM who thinks agile is an excuse to release bad software. MVP should still solve a customer pain point.

To play devils advocate though, maybe case 1,2,3 are low enough risk to release. Having metrics set to watch if these are actually big problems could be “good enough.”

antupis6 hours ago

I would do 4.5) escalate to PM boss everytime 1,2,3 happens.

helge92106 hours ago

> PM says build it the naive way

Why PM has an option to choose here?

> will fail for cases 1,2,3

a) tip QA to test these cases

b) add cases 1,2,3 to "Known issues"

After all that is the reason they call it "beta": "Cause it beta then nothing"

esaym3 hours ago

And if it ain't broke, fix it 'til it is!

unity10014 hours ago

Just keep things backwards compatible, then do whatever you want.

gdsdfe4 hours ago

“If it ain’t broke, don’t fix it” is the absolute worst mindset to have! I've seen systems that are literally garbage because of it.

AaronM5 hours ago

Error Budgets are your friend

scombridae4 hours ago

tl;dr continually test, continually deploy.

onetokeoverthe4 hours ago
sylware6 hours ago

Grotesque generalities. The devil hides in the details: planned obsolescence and redhad (IBM) with its glibc symbol versionning frenzy and not a clean ELF ABI set of libs (as an example).

Yep. Open source is not from enough, lean and stable in time open source is required.

hulitu7 hours ago

> Customers often ask me, why pour time into updates when what I have runs just fine? The answer is that a bug in your old version might assert itself at the worst possible time and force you to forego sleep for the next five days to recover

Why not do proper testing ? I know it's expensive. And when you have an OS as an init system is even difficult. It is sad the the UNIX philosophy is dying being replaced with the Windows philosophy.

codegeek4 hours ago

You cannot test everything to perfection. Shit will break in production. Question is: how to minimize shit breaking in production and hopefully eliminating critical bugs that can cause revenue/data loss.