Back

Balancing “If it ain’t broke, don’t fix it” vs. “Release early and often”

128 points3 yearsredhat.com
torstenvl3 years ago

Simple: Release Early. Release Often until it Ain't Broke. Don't change things that Ain't Broke.

I don't think there's much consternation over actual bugfixes. There's consternation over constantly-shifting UIs and APIs.

The only time there's any tension between those two things is when people conflate bugs with poor design. The messy but correct thing to do when you've correctly implemented a poor design is to maintain backward compatibility while also offering the option to use the newer, more correct behavior. See, e.g., strcat() → strncat() → strlcat()

proc03 years ago

The line of when it's ready to be used is not clear and I think it has been pushed forward such that the user increasingly becomes the tester. This is unacceptable in hardware products, imagine if kitchen appliances were sold with issues, and you had to take it back to the store for fixes and updates. Unfortunately, users don't know any better, and I think that's why so much of the industry gets away with serving them a bad user experience that is bad because in essence you are testing it.

kqr3 years ago

The software we write is far more complex than a kitchen appliance.

For simple systems, yes, test all components, build in redundancies, etc. All those things that made reliable system design possible in the 1950s.

For complex systems you're quickly wasting your time on diminishing returns. The failure modes of complex systems lie in obscure interactions between components conditional on rare input combinations. You have to find that stuff out in production. You have to make users into testers.

However, there are better and worse ways to do this. Ideally, a failure should only be experienced by a user with a fallback solution, not a plain failure to accomplish the task entirely.

bbjJqJmxFNbAcg93 years ago

That's a tautology. You're saying that we shouldn't fix the complexity and bloat that gives rise to bugs because we'd have to fix the complexity and bloat that gives rise to bugs.

We landed on the moon with a reliable computer and reliable software in the 1950s. What software that you've written is "far more complex" than that?

(Personally, I'd posit that a lot of the complexity is the ten or fifty layers of framework and language and library between the programmer and the computer. But I digress...)

proc03 years ago

Sure, there could be exceptions like parts of security systems that would have to be tested in production because of limitations of data/threat modeling. I'm thinking mostly end user software, UI's, games, etc. that increasingly ship things expecting to patch it along the way.

foobiekr3 years ago

How does this line of thinking address security?

moffkalast3 years ago

Meanwhile Canonical: "Let's fix what ain't broke at least once every 2 years."

gw993 years ago

At least it's not Gnome "Let's rewrite again what we didn't get around to finishing and fixing 2 years ago"

pabs33 years ago

Preferably also provide assistance to people using the old things, in the form of making the old API a wrapper for the new API where possible, adding warnings for the old API, adding migration tools etc.

klodolph3 years ago

(I think it would be strcat() → strlcat(), since strncat() is not really safer / newer.)

cpeterso3 years ago

btw, strlcpy() is now being removed from Linux kernel code. strscpy() is the latest "secure" string copy function:

https://lwn.net/Articles/905777/

torstenvl3 years ago

Definitely not safer (perhaps even more dangerous due to its unexpected semantics), but I do think it's slightly newer. It doesn't appear in K&R 1st edition, released in 1978, but does appear in K&R 2nd edition from 1988.

OpenBSD claims it first appeared in Version 7 Unix, which was released in 1979.

https://man.openbsd.org/strncat.3 https://en.wikipedia.org/wiki/Version_7_Unix

david2ndaccount3 years ago

Just always use memcpy, you should always know the size of your buffers and if you don’t you will get a buffer overflow at some point.

klodolph3 years ago

I've heard this advice before, and the problem with this advice is that people who just use memcpy in practice often write code with bugs in it. So the advice is bad advice. Good advice will result in code that has fewer bugs, or bugs with lower severity.

kevin_thibedeau3 years ago

It can also lead to accidentally quadratic behavior which is why the modernized string ops that get so much hate these days rely on one iteration over the input.

ChrisMarshallNY3 years ago

Man, I feel this.

I worked for a corporation that was so change-averse, that we needed to buy our development systems on eBay.

They actually had a point, and it was hard to argue against, but I found it absolutely infuriating, as the solution was to plan for change, and establish a process to be constantly evaluating and refining for new developments.

Instead, it was "Wait until we can't bear it any longer, then have a huge screaming match meeting." A new budget would be approved for new machines (and, thus, new operating systems and development tools), processes would be updated, and that would sit, until it was no longer new.

As I said, it was hard to argue against, because it was a 100-year-old company that had been successfully delivering really high-end stuff, since Day One. They did that by being so conservative that they hadn't discovered fire, yet. Measure 300 times, cut once, etc.

When there was a problem, it escalated quickly (and was used as fuel to tighten things down even more), as the company was held to standards that are probably up there with NASA.

Speaking of NASA, I can't help but notice that this rather plucky little outfit, called SpaceX, seems to be running circles around them. They seem to have figured out how to "fail fast," yet also deliver insanely high Quality stuff.

Might be worth ignoring their CEO's tweets, and look at what they are doing...

pclmulqdq3 years ago

SpaceX shows the success of using modern, commodity hardware rather than purchased-by-government contract obsolete components for space applications. The same is true for the cubesat revolution.

I honestly think the idea of "if it ain't broke, don't fix it" is pretty silly as applied to large, engineered systems. If your system is more than 1000 lines of code, I guarantee you have a latent bug somewhere. Even if you don't have any bugs, the ecosystem around you changes so fast that you eventually will.

Gibbon13 years ago

SpaceX shows the advantage of not allowing aerospace lobbyists and congress to design things. That's how you get the SLS with it's inane H2/LOX first stage and solid propellant boosters.

mgbmtl3 years ago

I'd be curious though, in 20 years, what kind of technical debt SpaceX will accumulate.

I mean, as vendors, part of the debt we inherit is that of our customers. Then our customers switch to something else. As vendors, how do we support both the past and the future? By making changes opt-in if possible, but still, that technical debt is there, and requires devs and other support staff.

I don't work in anything rocket-science, but I stopped asking clients if they want to upgrade. We decide when to do it for them, and we take the blame if something goes sideways.

I still prefer to have something break, than lose a client because they didn't realize they could upgrade instead of opting for another product (and they probably silently grew resentful at the lack of features of the old version).

soperj3 years ago

> I'd be curious though, in 20 years, what kind of technical debt SpaceX will accumulate.

They're already 20 years in. Wonder if there's anything left from the beginning (or if it all got blown up).

Beached3 years ago

I mean falcon 9 is the tech debt. they are no longer working on it, only maintenance mode. their goal is to get a new stack of tech working, before falcon 9 becomes too out dated. falcon 9 had such. ahead start, that it is still better than everything else in it's class. but it won't be long now until it because just another rocket in the market. I give 5 -10 years and it isn't special anymore. SpaceX is betting that starship will be larger and cheaper and working by then.

izacus3 years ago

I mean it's pretty clear - the early Teslas all had dying onboard computers due to flash wear which they even refused to fix because the warranty has passed.

Those aren't companies that build anything lasting or environmentally friendly. It's just all throwaway toys.

Beached3 years ago

nasa also doesn't have the luxury of picking their own direction, or tech stack, or doing whatever they want. nasa is burdened by Congress dictation what rocket to build, even if it isn't the rocket nasa wants. look at the river program though, nasa still does cutting edge stuff with cutting edge hardware, at really high quality. just don't look at sls or iss

bbjJqJmxFNbAcg93 years ago

NASA hasn't really had the proper funding to do... anything... in a few decades.

And if SpaceX had gone to space before NASA it would probably look like NASA could run circles around SpaceX...

"Fail fast" is great when the only thing at risk is your own money but when you throw human life into the mix not so much.

How do you think their fatality rate will compare to that of NASA?

Spooky233 years ago

Don’t fix what isn’t broken is tough, because broken is debatable.

Is your stable Univac mainframe application broken? Operationally, probably not. But your business cannot adapt becuase it relies on a platform for which everyone who knows it is dead.

jjslocum33 years ago

I would love to read a Malcom Gladwell book on this topic. It touches on a core truth, not only in engineering but in society as a whole. Three examples from different spheres:

In software, I've seen repeatedly how new hires with big ideas (and youthful confidence that they know a better way of doing things) will come into a company and want to rewrite things "properly." In one management position, much of my role involved the politics of defending a well-working, well-maintained, big-revenue-driving application from a steady onslaught of such ambitious exuberance.

At the time, I thought "this must be the same phenomenon that drives shrinkflation": new MBA arrives at pickle-making company, seeks to bolster their career by demonstrably saving the company millions, convinces execs to put one less pickle in each pickle jar, consumers won't notice; step 3: profit!! In 2022, when you buy a cereal box, it's half air on the inside. In 1980, it was only 20% air.

I also shake my head when media pundits equate the success of a particular congressional session with the amount of new legislation they pass. I can imagine no simpler way to cruft-paralyze a democracy (while following the rules) than "releasing laws early and often."

klodolph3 years ago

> In software, I've seen repeatedly how new hires with big ideas (and youthful confidence that they know a better way of doing things) will come into a company and want to rewrite things "properly."

In my experience, these new hires are right a certain percentage of the time, and wrong a certain percentage of the time. I can take no default position here.

I tell new hires that they're gonna see something that offends their sensibilities about the way code should be written. That puts them in the center of a conflict. On the one hand, they believe the code should be written differently. On the other hand, someone else wrote the code that way for a reason. Their first step is to dig in and figure out the reasons on both sides--why it was written one way, why they think it should be written another. The second step is to figure out what the consequences and risks are of making changes or not making changes.

The way I think about it, I want to simultaneously keep the software working, and protect the "flame" that new engineers have to improve things--something which can be all too easily extinguished.

strawhatguy3 years ago

That’s a sensible way to temper newbies enthusiasm with experience.

Newbies may be right in that a system would be better rewritten; but often wrong that they think they can do it ‘properly’. At least, if they dive straight into it.

So making them use the Charleston’s fence approach is a good one.

strawhatguy3 years ago

Gah! Chesterton's

SQueeeeeL3 years ago

Shrink-flation is driven by short term quarter earnings thinking. Reducing the amount of your product (and therefore it's overall quality compared to competitors), is an act of cannibalization of a brand.

They may increase profits for a short time, but as consumers realize what's happened, ultimately they see the brand as not having standards and lead to it becoming a commodity. A small group of long term invested owners would never do such a thing, because that would be 'killing the goose who laid the golden egg', but modern shareholding has no loyalty to quality.

jrochkind13 years ago

I think libraries (used as dependencies by other software) and "top-level" software/applications (used by users directly, whether that user is a developer or not) -- have different pressures and concerns.

Libraries should be much more conservative. I don't think "release early and often" was said about libraries.

And yes, it's tricky that it's not always a clear line between these two categories. A unix command line utility is sort of top-level software, but also likely to be use in a scripted/automated fashion.

So it's not always cut and dry, but the more you are aware of software depending on your software, the more careful you should be with releases.

wongarsu3 years ago

A great example of an end-user application that is like a library is Microsoft Excel. Sure, the UX can change, but the formula engine has to be extremely conservative, staying bug-compatible with basically every version, or all hell breaks lose.

JohnFen3 years ago

I agree. As a developer using libraries and component, I really hate "release early and often".

As a user, I also hate "release early and often".

In both roles, releasing early and often effectively means I'm always using beta software, with all the headaches using such software brings.

However, as part of an internal development process -- that is, before customers see the release, "fail fast" is a totally legitimate and decent approach.

wpietri3 years ago

I think it depends a lot on where the library is in its lifecycle.

A new library should iterate rapidly and be very clear about that. Big warnings, obvious version numbering, and good communications. You look for an audience of actual users that can keep up with that. That's how you figure out what the right library is, both in terms of the big abstractions and the small details.

But then once you hit a 1.0 release, things change. You're shifting from an audience of fellow explorers to an audience who wants stability. You can still do exploratory work, but it has to be additive in the 1.x series, and eventually you need to start a 2.x series as you learn more about what needs to change. So the iteration still happens, it just happens away from the people who just want the basic thing you got right via the initial burst of iteration.

fendy30023 years ago

Then let me introduce you to https://0ver.org/.

paulryanrogers3 years ago

In defense of ReactOS they have a roadmap and are sticking to it. It's also very ambitious considering the resources they have.

wpietri3 years ago

I am suitably horrified.

tetha3 years ago

I notice the same pattern on an infrastructural level.

If I work on a test system, or a system a small number of people depend on... a year or two ago this would've hurt my pride, but why spend 4 hours planning a change, if you can just muddle through problems in 2 hours. The user impact will be none or low based on the assumption, so move quickly, break things, fix things, document the problems and fixes. Everything's good.

Yet, I also have systems a dozen teams or more rely upon. For these systems, we have to move much slower and more deliberate. We cannot touch some of our database with larger changes, unless we have a way in, a way out, and a prediction how long all of these take, as well as announcement to the customers, and so on, and so on. We'd love to update those faster, but it's rough.

peteradio3 years ago

How do people disentangle "ship the incorrect thing and let customers let us know what they really want" vs "ship the correct thing but built incorrectly and let customers know why its wrong". I hate the second with a burning passion because it seems like its bound to give the impression that you really don't know what you are doing. I've been put into situations too many times where I just could not get the point across that "yes I understand that we may not be giving the customers what they want, but what you are telling me to build is clearly not correct" its maddening. Sorry if its off topic but it seems at least adjacent to the articles balance. This seems to happen with PMs who do not have a deep enough background in the technical detail and yet reject the feedback given because it will reflect that poor understanding. "Agile" environments seem to enable this kind of behavior. I'm beginning to think I need to pivot to PM in my career so I can avoid the madness as a developer.

MetaWhirledPeas3 years ago

My personal take is that "built incorrectly" should almost never be tolerated. The customer doesn't need to know how it was built, merely how long it will take. But if by "built incorrectly" you mean an unwise business decision (logging in with a phone number for instance) then yeesh, I wish I had an answer. Putting your foot down sometimes means leaving a job.

sumtechguy3 years ago

That depends on your customer. Some are cool with 'something is better than nothing'. Some are not. Some also are very 'I put it in the contract just do it that way' others are 'I just want the right thing'. Some just do not know what they want at all and just want you to do it and put all the liability off onto you but do not really want to spell out what that means at all. Being a PM is that turned up to 11 but now you do not control the code at all.

peteradio3 years ago

I guess what I'm talking about is this:

1) PO wants feature X for customer P.

2) Architect in meeting with PM and PO indicates that feature X is a simple addition to Y

3) Engineer is assigned X and says no this isn't a simple addition to Y because on deeper inspection it will fail for cases 1,2,3, we would need a different architecture.

4) PM says build it the naive way, let the customers find case 1,2,3 before we fix, we are an Agile team after all.

5) I quit.

nightpool3 years ago

Without details, it is really, really, really hard to say whether cases 1, 2, 3 are real, important issues that need to be fixed or just needless complexity that 99.9% of users are never going to care about. The PM is trying to distinguish between those two cases because they've been burned before on developers double, tripling, and quadrupling their estimates as they find "one more thing" that is inelegant or could potentially be improved and delaying the launch schedule. You need to convincingly make the case to the architect (and to the PM) that issues 1, 2 and 3 are critical to the users, fundamental to the design and going to be much harder to clean up later if we don't spend the time to get them right now. Or you're wrong, and you should just suck it up and build the simpler thing today and leave the complexity of the new thing for the future, when you can decide whether the feature is even worth the cost of having whatever new architecture it would require.

+1
peteradio3 years ago
flerchin3 years ago

Trust the Engineer. PMs care only for date.

+1
MetaWhirledPeas3 years ago
strgcmc3 years ago

Well, this isn't prima facie unreasonable without more info (or knowledge of the implicit assumptions you are making but not publishing).

- Are cases 1,2,3 named that way, because they are the top priority cases (i.e. the #1, #2, and #3 most important product features that customers care about)? Even if they are, what is the cost of a new architecture? Will it take you 3 years and 20 engineers, to rebuild Y or to make Y.v2, just so you can support X "properly"? By then the market may have moved on, the feature may be worthless, so it may make perfect sense to deliver a bad version of X that relies on Y.

- Or, are cases 1,2,3 legitimately either rare, or low-impact, or do have viable manual workarounds? If so, then it's entirely reasonable to defer/punt on doing new architecture right now, because either you know these cases are unimportant, or at least you don't have positive proof that these cases are important enough to justify new architecture. With more data, or clear customer demand, you can make a better case for rebuilding Y "properly". The real problem comes later: what happens if you do get strong signals of customer demand, you can prove the current solution is not scalable or extensible, and yet the business still decides that Y is good enough to never touch... well that's a business that doesn't want to stay in business.

Agile is about practicality/pragmatism, over adherence to dogma or preconceived notions. Just because Y is the wrong architecture to deliver X, does not mean it is the wrong decision to ship partial feature X. Don't be dogmatic about "correct architecture", if you care about for-profit software engineering as a profession.

Of course, if your goal is different, if SWE is a craft or a hobby or an ivory tower pursuit for you, then feel free to make whatever decisions you want that don't fit your vision of "correctness".

peteradio3 years ago

See this comment for more specifics: https://news.ycombinator.com/item?id=32953007

I think there its a little unclear that by case, I mean testcases that would fail to pass to fulfill a single feature. E.g. "I need an addition feature for a calculator" but naive implementation will result in it working for 1+1 and fail for all others.

skellera3 years ago

Seems like you’re working with a PM who thinks agile is an excuse to release bad software. MVP should still solve a customer pain point.

To play devils advocate though, maybe case 1,2,3 are low enough risk to release. Having metrics set to watch if these are actually big problems could be “good enough.”

helge92103 years ago

> PM says build it the naive way

Why PM has an option to choose here?

> will fail for cases 1,2,3

a) tip QA to test these cases

b) add cases 1,2,3 to "Known issues"

After all that is the reason they call it "beta": "Cause it beta then nothing"

antupis3 years ago

I would do 4.5) escalate to PM boss everytime 1,2,3 happens.

eloisant3 years ago

They're not opposite.

"If it ain't broke don't fix it" => don't try to "optimize" something that works well. Maybe it will be able to handle more load after your big refactor, but if it already handle the load nicely and there is no indication load will go up, don't bother.

"Release early and often" => release after each change, don't bundle everything into a big release that happen every 6 months.

The first one is about deciding what to do, the second one is how you release the work you did.

JohnFen3 years ago

> "Release early and often" => release after each change, don't bundle everything into a big release that happen every 6 months

After years of dealing with software that does this, I've soured on it completely. It has resulted in greater difficulty using software, and the software being less solid.

michaelcampbell3 years ago

I've seen this happen as well; perhaps for different reasons.

My early career was mostly fin-tech and when installing into banks or other financial institutions, they _very much_ want the every year or 6 mo release cycle. And not just because of "old stodgy", but rather they want a good bit of ramp-up time to ensure things are running right. In a Wall St trading firm I worked at, a release ran in parallel with real data for MONTHS before the old version was retired.

So, I'm used to that model.

From the customer's standpoint, they could "gear up" for the next release, get their ducks in a line, get the right people to assess things. I get that some of that is because the infrequent releases are "big", but also as a human customer it's fun to see a shiny new truck toy rather than a coat of anti-rust paint on its exhaust pipe, then a replacement for the broken window crank, then a new color on the odometer numbers, then ...

As a developer I also miss the Big Project Ramp Up, then a period of a lot of work (which can be Waterfall, or Agile, or bespoke artisinal project management, which is pretty much 99% of the time, just CALLED "Agile"), then a Big Project Release (+ party, some down time, and prepare for the next one).

SAAS release early/often is like death by 1000 boring dull cuts. They cut out the spikiness _so much_ that there is so little novelty that it's just not fun to work on it.

mdtusz3 years ago

I've seen this happen as well, but I think it's more of a problem of bundling too small a changeset in a release. A partial implementation of a feature shouldn't warrant a release (without feature flags to disable it), but bundling a year's worth of work in a single release is, in my opinion, a recipe for disaster (and this is sadly something that happens all too often in enterprise and non-software engineering businesses).

drewcoo3 years ago

Exactly! False dichotomy.

jbverschoor3 years ago

Those are two different things.. Nothing to balance about. But put it in a graph or drawing, and it means it's true and supported by 'science'

henriquecm83 years ago

Release early, don't fix it

tomalaci3 years ago

Creating bugs? No, I am creating learning opportunities for more junior developers.

devteambravo3 years ago

Are you my manager?

BurningFrog3 years ago

"Move Fast and Break Things" at maximum speed!

badrabbit3 years ago

This is the way.

sedatk3 years ago

I mention this problem in my book Street Coder in the section titled "Fix it even if it ain't broken". There's value in changing code, but there's also risk in doing it.

One of my suggestions is that any component that you avoid breaking, deserves to be broken the earliest. The rigidity (the resistance to change and the inclination to break) of code is a sign of greater trouble in the future, so should be addressed the soonest. That means, I don't agree with the "don't fix it if it ain't broken" sentiment fully. If that component's so valuable, it must be covered by hell a lot of test coverage to make it impossible to break it with changes in the first place. If you can break it without knowing, that's a problem that needs to be addressed immediately. I say, break it, identify breaking points, add tests to those points, so the component becomes flexible.

My second suggestion is to make the changes, but discarding them after. I also emphasize this in the section titled "Write it from scratch". By doing that, you gain certain level of insight into that rigid design, and improve your understanding of the code base, which eventually leads you to the phase where you can finally code changes that you're comfortable sharing with your colleagues.

You may think of the writing code only to throw it away a wasteful exercise, but I argue that it's not as great a loss as you may anticipate.

silentsea903 years ago

How about use Feature flags and ship daily? Decouple your release-schedule/deployments from your product iteration/launch timeline.

allendoerfer3 years ago

Never miss an opportunity to add complexity!

silentsea903 years ago

Centralized coordination complexity v/s decentralized FF rollout complexity. The bigger the team contributing to a deploy, the better the latter gets because engineers rolling out their FF have all the context they need.

int0x2e3 years ago

Feature flags or another way to release bleeding edge features to only some of your users can sometimes be the best way to iterate over new features quickly when your most user base is risk-averse

generalizations3 years ago

Unofficial RedHat motto.

alldayeveryday3 years ago

I think I know your meaning, though I would rephrase it as decoupling your deployments from your releases/launches.

silentsea903 years ago

Edited.

exabrial3 years ago

The problem is a bunch of overpaid devs conflate anything that isn't absolute bleeding edge as a "bug", and we get into and endless cycle of UI refreshes and updates that exist solely to exist.

m4633 years ago

I think there is a continuum...

server folks want nothing to change, ever.

desktop folks want the latest and greatest.

mobile folks get spurious changes forced down their neck and want... well who cares what they want, this is what they are getting.

pjbster3 years ago

Odd that Kaizen doesn't get a mention here. The article draws battle lines between the two ends of the spectrum but there could be a middle way.

In Kaizen, "Fail Fast, Fail Early" would never get ouf of the gate. The game here is all about making other, smaller bets which gradually ratchet the company to a better place without ever having to come off the rope.

Perhaps this organisational conflict is rooted in the realities of burning through capital whilst still searching for a sustainable revenue model.

In many ways the idea of Scrum is very Kaizen-like. It mandates the idea of experimentation and measurement. With trying stuff even when what you have appears to be working. And with documenting failed experiments and moving on. Nothing risky about any of it but you wouldn't know it when you see the typical reaction to "how about we move the standups to the afternoon instead of first thing in the morning?"

The innovation pipeline described is encouraging but it's already too dogmatic in its proposed implementation. Kaizen is much more organic and, dare I say it, cultural. One of the key principles of Kaizen is empowerment and the innovation pipeline doesn't offer that at all.

gdsdfe3 years ago

“If it ain’t broke, don’t fix it” is the absolute worst mindset to have! I've seen systems that are literally garbage because of it.

PaulDavisThe1st3 years ago

TFA seems to try to pin DEC's collapse & failure on their committment to back compatibility. Yet Microsoft has essentially done the same thing with Windows, and has not suffered the way DEC did. Seems like a weak argument.

pdpi3 years ago

Lord Vetinari in the Discworld lives by the mantra that "If it ain't broke, don't fix it". Other characters noted that, under his governance, "things that didn't work ... got broken."

chuso3 years ago

Break it often and don't fix it.

GnarfGnarf3 years ago

"If it ain't broke don't fix it"

Would you fly on an airline that followed this principle?

Wait till the engine fails before fixing it.

In the air.

temporallobe3 years ago

We have relevant case of this. We’re on Angular 8 and the app is working just fine - customer is happy, it’s pretty easy to maintain, well-tested and hardened, etc., but we are now facing having to update to 14.x “or else”. It’s gonna be a long and difficult migration.

kmitz3 years ago

Well depending on the size of your codebase, npm dependencies and Angular APIs used, it might or might not be a big deal. You can have a rough estimate of the complexity by running 'ng update'. Yes major releases come at a rapid pace and it is hard to always be on the latest version. We update regularly once we are 2-3 major versions behind and usually it's not so awful.

cratermoon3 years ago

I bet Equifax wishes they hadn't looked at their Apache Struts-based website and though "It ain't broke".

jmartrican3 years ago

More often than not its "if it ain't broke don't fix it because we are too busy releasing early and often".

teawrecks3 years ago

The problem with "if it ain't broke, don't fix it" is that it's always broken for someone.

bradwood3 years ago

These are not mutually exclusive. It seems like the whole premise here just lacking nuance.

kazinator3 years ago

Release early and often, but make sure you're fixing something broken each time?

unity10013 years ago

Just keep things backwards compatible, then do whatever you want.

arminiusreturns3 years ago

Release to prod so you can fix stuff in lower envs.

esaym3 years ago

And if it ain't broke, fix it 'til it is!

AaronM3 years ago

Error Budgets are your friend

jillesvangurp3 years ago

Reducing cycle times for changes minimizes integration testing effort. Effort and risk increases non linearly with the amount of change and time. So, release as often as you can to minimize the risk and effort while balancing the cost of doing so. Some release processes just involve a certain amount of heavy process that takes time and money. E.g. app store releases are a PITA so doing that multiple times per day is not worth it. Server updates on the other hand are fine. Create a PR, tests pass, merge it and it goes live right away. We automated that process. A decision to merge implies rolling out the change. The minimum cycle time is the time it takes to build and deploy (about 7 minutes for the PR and another 10 to deploy) plus whatever time we need to do a change. Some changes are as small as 1 character. We have a production branch that we merge to from our master branch. We test the and use the master branch intensively and merge to production multiple times per week. No point in sitting on changes that work fine. Get it out and create some value for your customers and company and keep feedback loops short as well.

Short cycle times is also why I use a rolling release linux distribution (Manjaro) and browser (Firefox). Always fresh and up to date. And even though I'm on the Firefox Beta channel I never have to deal with it breaking or being unstable. It's stable because they have frequent nightly builds. By the time builds hit the beta channel they are already rock solid. I was on the nightly channel for a while and never experienced many issues there either. Great example of short cycle times. With the Beta channel I'm a few weeks separated from changes happening and me seeing the feature. With the release channel it's another few weeks.

Not updating because it aint broken is very valid until the time comes when you finally have to upgrade and all hell breaks loose because you are two years behind on dealing with breaking changes and have to do a massive project to make it happen. It was getting increasingly more broken while you were doing nothing; you just did not know about it. It's still technical debt. And now you get to deal with the non linear effort to fix it and pay the price.

So, on all projects where I'm in charge we update everything very frequently. If something doesn't work I want to know ASAP and mitigate now instead of not even knowing stuff is not going to work for another few years. If you stay on top of changes like that, the effort for this is very low. Mostly stuff just works. Occasionally some library has an issue. And then we fix it, work around it or wait for the next version (and document why we can't update). Easy stuff. Basic project hygiene. The first thing I do when working on a project I haven't touched in a while is update dependencies. If I'm working on it all dependencies have to be current. I get annoyed with being a few minor versions behind. I might wait a few dot releases with major releases. But generally, I want to get that over with ASAP. If it breaks, I'll at least know that I need to deal with that. Rolling back is always an option.

scombridae3 years ago

tl;dr continually test, continually deploy.

onetokeoverthe3 years ago
sylware3 years ago

Grotesque generalities. The devil hides in the details: planned obsolescence and redhad (IBM) with its glibc symbol versionning frenzy and not a clean ELF ABI set of libs (as an example).

Yep. Open source is not from enough, lean and stable in time open source is required.

hulitu3 years ago

> Customers often ask me, why pour time into updates when what I have runs just fine? The answer is that a bug in your old version might assert itself at the worst possible time and force you to forego sleep for the next five days to recover

Why not do proper testing ? I know it's expensive. And when you have an OS as an init system is even difficult. It is sad the the UNIX philosophy is dying being replaced with the Windows philosophy.

codegeek3 years ago

You cannot test everything to perfection. Shit will break in production. Question is: how to minimize shit breaking in production and hopefully eliminating critical bugs that can cause revenue/data loss.