Back

Disasters I've seen in a microservices world

256 points3 yearsworld.hey.com
throwawaaarrgh3 years ago

In the modern world of computing, the one thing I have seen literally everywhere I've been is people implementing systems that they absolutely do not understand.

Most microservices I've seen implemented are not actually microservices. Most teams (unless they are developing "simple" web or mobile applications) have no idea who is consuming their services, or how, or if what they've made is working correctly. They frequently don't have an understanding of the different models of releasing software, much less of a complete system architecture, failure modes, consistency models, reliability estimates, performance limits, etc. Mostly what I see time and again is a team of people who just write some code that seems to work for them, and then go home, without ever considering how or if it's working in the real world.

I don't know anything about modern computer science education. But it appears that new developers have absolutely no idea how anything but algorithms work. It's like we taught them how hammers and saws and wrenches work, and then told them to go build a skyscraper. There are only two ways I know of for anyone today to correctly build a modern large-scale computer system: 1) Read every single book and blog post, watch every conference talk, and listen to every podcast that exists about building modern large-scale computing systems, and 2) Spend 5+ years making every mistake in the book as you try to build them yourself.

It feels like the industry mostly just re-learns the same mistakes over and over like we're in Groundhog's Day (we're the extras, not Bill Murray). But it's equally possible that I just lack perspective and am expecting too much. Maybe the auto industry at the turn of the 20th century also spent decades re-learning the same lessons over and over, as the novelty of mass-producing complex systems continued to elude us. Hell, the new auto companies still don't get it right.

silisili3 years ago

> It feels like the industry mostly just re-learns the same mistakes over and over like we're in Groundhog's Day

This is true, but also in life. I'm so sad I can't just pass my knowledge to my daughter to give her a 30 year head start. I remember telling her 'dont touch that stove it's hot', she nods, and soon after is crying she burnt her hand. I wanted to be mad but it made me realize the one universal truth: advice and information is great, but humans just seem to learn most from mistakes. Made me realize how much time and energy and knowledge is lost, as every human learns a lot of the same things over and over. I'm sure there was a caveman telling his kid not to touch the fire thousands of years ago, too. And here we are. So don't be too cynical and depressed about this cycle, it's pretty natural.

adjkant3 years ago

I just want to give a giant explicit +1 to this sentiment and understanding of how humans learn, and then add a bit from there to connect it back to the problem highlighted in the first comment.

IMO, there's a few ways to try and take a pragmatic approach to the problem above:

1. Make simple ways to do complex things. A library, a language, a framework, they all go after the same thing when designed well: batteries included complex functionality that the end user can't shoot themselves in the foot with. It seems like in this domain (microservices), we are still early and need better support for small companies to spin this up. I've found my current company runs microservices pretty well, but I know it took a TON of infrastructure, support, and education to get it to that point. Few companies can afford to have that.

The danger with this approach is that people who understand the internals become rare, and surface level only knowledge can lead to issues when you get to edge cases. I'll still take it, but wanted to call it out. The linux kernel is a great example of what that looks like down the line.

2. Make as many places as possible for developers in training to fail and make those 5 years of mistakes with low stakes consequences. In some ways, the way the industry hires already does this, but I'm sure there are better ways we can come up with versus the current sandboxes. If your frustration boils over, this is a place to channel it positively.

axaxs3 years ago

Thanks. I really like idea number 2. In training, have people fail nearly purposefully.

I think some of the problem is that there's no real standardized training for developers. Many have 4 weeks of 'bootcamp' and are cut into the wild.

And I can't be too negative about it, I didn't even have that. I would just say I learned from 20 years of reading and mistakes.

+1
throwawaaarrgh3 years ago
hyperman13 years ago

There is another way: Junior dev does support and micro bug fixing. After a year of hell, you'll have a medior dev that knows the value of logging, troubleshooting, testing, and above all simplicity. a.k.a. someone who can be trusted to write code.

np-3 years ago

People shouldn't always just blindly accept what is told to them as truth either - they need to have the ability to determine if something is true or not on their own, and experiencing/verifying things yourself is a powerful way to do that. So overall it's a balancing act. There's also a lot of information that's lost when you merely just tell someone something. There's just no way to know exactly how bad it feels to touch a stove unless you have been through the experience yourself, there's no way for a human to convey that level of feeling to another human verbally, and as such it's inevitable that humanity's curse is that we have to relive things over and over again.

ZephyrBlu3 years ago

I think you're slightly wrong here. I'd say most, or possibly almost all humans don't learn from other people's mistakes. But _some_ people do learn _some_ things from other people's mistakes instead of their own.

I don't think this is some sort of innate human thing, I think it's purely down to ego. You think you know more or you're better than the person who made the mistake, and therefore you don't listen to them because you would never do that.

strken3 years ago

It's easier to learn from other people making mistakes when it happens directly in front of you than when they tell you about it afterwards. I'm never going to use heroin, or try to move a broken leg without splinting it first, for example.

pfarrell3 years ago

  “There aren’t any new problems; only new engineers.”
  - Quote I read online a long time ago
Agree with all your points, but there is a third way: apprenticeship with older engineers. I advise new engineers to find their balance between not accepting the status quo but also realizing software is way more complicated than they think. For example: successfully versioning an API over time while supporting live deployments is (I think) not something you’re going to learn in school.

As a further aside... I’ve heard older attorneys complain about newly minted lawyers. They work to pass school and then the bar. But when they start lawyering, new grads don’t know the first thing about how to actually file something at the courthouse. I’m saying, I don’t think it’s constrained to our industry.

sangnoir3 years ago

> Agree with all your points, but there is a third way: apprenticeship with older engineers.

That's unlikely to work: our industry has an ageism problem, combined with the low prospects of promotions/raises which results in engineers having to frequently job-hop to get ahead. This results in most organizations having very limited institutional memory. Other compounding problems are a discussion for another day (title inflation: e.g. "Senior engineer" with 2 years total experience)

Spooky233 years ago

I think computing is following industry. I’d argue that most companies are younger and dumber than they were when I entered the professional world about 20 years ago.

A few months ago, i engaged to help fix a troubled system. The thing was built around kubernetes and the architecture was basically aligned with how a bunch of contractors were hired. Nobody really understood how it worked and it performed poorly. So poorly in fact that I reimplemented a core component in a shell script, mostly to quickly understand how it worked, and to my surprise, the thing ran faster on my laptop at moderate load than the actual system!

Sounds like a bad IT project, but to be honest whole companies run this way, saving pennies while setting dollars on fire.

mbreese3 years ago

> most companies are younger and dumber than they were when I entered the professional world about 20 years ago

Channeling a post from earlier today — I’m not sure if companies are younger and dumber today, but I’m sure that you are older and wiser now than you were then. And I’m sure you are better at seeing the idiocy in companies. Bad projects have always been around. But it isn’t until you have more experience that you can fully appreciate a bad project — particularly when you’re in the middle of it!

akiselev3 years ago

> Maybe the auto industry at the turn of the 20th century also spent decades re-learning the same lessons over and over, as the novelty of mass-producing complex systems continued to elude us. Hell, the new auto companies still don't get it right.

I think you have a misconception about what manufacturing is like. Waterfall development is just as unrealistic in manufacturing as it is in software, the scales are just different. People create CAD models of the machines in a factory all the time but those are architectural diagrams - when it comes time to build, the people on the ground (the equivalent of the software engineers) have to figure out all of the details, often using no more than trial and error guided by a little intuition. Experienced individuals know more about the pitfalls and gotchas but they're still creating something new even if they're going off some grand design. Becoming a competitive manufacturer in something simple like ball point pen balls is a years long process of mastering the "craft," gathering real world data, and incrementally improving. Factors like a single critical employee's physical disability can decide the layouts of entire factories so there are no cookie cutter factories for them to build.

The real world is a lot messier than software. Just like we spend most of our time chasing bugs and implementing features, they spend most of their time building or troubleshooting interfaces between machines and processes, optimizing, and so on.

collyw3 years ago

> Waterfall development is just as unrealistic in manufacturing as it is in software

Is it? Aren't a lot of the largest systems build using waterfall, or they certainly were in the past. When i was taught waterfall at university you were allowed to reiterate steps, yet these is a constant straw man that you only gte to move down the waterfall.

gonzo413 years ago

Well maybe the way to compare software waterfall to manufacturing waterfall is that the CI/CD pipeline that's in use is the waterfall. But even car companies have a development stage that's messy where they have to retool production lines. But they manage it the same way we do, by making things as standard as possible. Like how most SUV's have the same chassis and only really differ in internal fitout and body panels.

wildrhythms3 years ago

I attended a community college and learned Java, C#, PHP, HTML, CSS, Javascript... the classes were more or less an introduction to full stack web development. I remember when I transferred to a four-year school for my Bachelor's the other students in the CS program (we were all Juniors) struggled to put together a year-end project requiring a GUI- like a text input and some buttons, simple HTML/CSS, or Swing (popular at the time). They didn't even know where to start with GUI development. Is this kind of thing still not being taught?

Izkata3 years ago

My college, none of that was in the Computer Science (CS) curriculum - it was all part Information Technology and Management (ITM).

CS was focused mostly on the broad underlying structure of computing, things like how software interacts with hardware (this was gutted a year or two before me), algorithms, programming paradigms (functional/etc), the theory behind databases (3rd normal form sound familiar?), and so on. The idea was to give you a really good foundation such that you could self-learn much much easier after college, but also unfortunately that you had to do some of that selfglearning right away because it didn't prep you for getting things done.

I only took one or two ITM courses out of curiosity, so I'm less confident in describing it in broad strokes, but the impression I got is that it was for teaching specific skills while not "wasting" time on better understanding how they work. Very get-things-done oriented. The web development one, for example, used a lot of jquery, but never touched on what was going on under the hood. I can see it working as preparation for being able to do a job, but it left you kind of stuck if you hit a roadblock and didn't know where to look for answers.

If I had to stereotype them, a CS student would spend time writing an algorithm for something instead of finding an existing library, while an ITM student would look for a library and if they can't find one say they can't do it. But the ITM student wouldn't get lost implementing and end-to-end project as long as they could use tech they already know, whereas the CS student has never done such a thing before.

swrj3 years ago

I can only speak for my own experience (large, top 25 state school in the US) - this sort of thing is still not taught. There was maybe one or two optional courses that taught software engineering related material, the rest of it was just a lot of computer science theory. Definitely struggled the first couple of months at my first job out of college due to this.

barbarbar3 years ago

Welcome to agile development. We fix it in the next sprint. Except we don't - since we need the new button.

pm903 years ago

Computer Science curriculum in most universities is focused on theory. Mine had a large component of working with and understanding digital electronics, Operating Systems, Networking, Discrete Math. Most of the ‘popular’ electives were AI related. Very few courses on actually building maintainable and functional systems. Most of what I learned I had to learn from experience and from others I worked with.

jjav3 years ago

> But it appears that new developers have absolutely no idea how anything but algorithms work.

Possibly a case of teaching to the test (interview, in this case)?

My CS education is a couple decades back, but while we had an algorithm course, most credit hours were spent on more practical things like building compilers, building an OS, building a shell, SQL, networking stacks, encryption, software engineering complex systems and so forth.

Is the current CS landscape more algorithm-focused because the FAANGs have made that the only stick to measure by?

dagw3 years ago

When I went to school they offered both Computer Engineering and Computer Science degrees[1]. The Computer Engineering degree was more like how you described your degree (plus hardware) and the Computer Science degree was more math, algorithms and theory focused. I personally feel that this is the obvious way to do it.

[1]they later added a Software Engineering degree which focused more on architecture and designing large scale software system, software project management and so on.

diveanon3 years ago

Most software developers these days are just glorified technicians stringing together api's.

There is a big bubble in tech salaries that is going to get popped as more and more people realize you just don't need a developer for most things and can get away with no-code for the majority of your needs.

This will act as a force multiplier for some, but I think for most developers it will see a correction in their salaries to more reasonable levels.

deckard13 years ago

I was going to comment on this very thing in the other thread about software glue[1]. In that article there is a youtube video on Multics vs Unix[2] which really outlines why microservices were always doomed.

Someone should coin a new law for how programmers have to rediscover Brooks's law every 5-10 years. The issue with microservices, as it has always been, is that you need an enormous company to brute force the communication pathways and maintenance overhead for it to all work. And by work, I don't mean function efficiently (as the Multics vs Unix video shows). I mean just function. Just work at all. The Multics team had all the devs and the Unix team was two guys doing laps around the Multics team. Because they had the mathematics on their side.

Remember the bad old days of memory thrashing? That's what happens to teams that do not have enough bandwidth to properly maintain the dozens of services they are responsible for. Your organization gets frozen.

This is what we all get for taking advice from the Googles and Facebooks of the world. Google has like a billion lines of code in a monorepo. They do not do things remotely like 99% of the businesses out there. They are sitting on huge piles of money that lets them be incredibly inefficient for decades.

[1] https://news.ycombinator.com/item?id=27482832

[2] https://www.youtube.com/watch?v=3Ea3pkTCYx4

h43k3r3 years ago

Googler here. You are absolutely right. I would never ever do something the Google way outside of Google.

Google has a dedicated organisation for maintaining all the infrastructure required for and around microservices. We are talking about 500-1000 engineers just improving and maintaining the infrastructure.

I don't have to worry about things like distributed tracing, monitoring, authentication, logs, logs search, logs parsing for exceptions, anamoly detection on logs, deployment, release management etc. They are already there and is glued pretty well that 90 percent of engineers don't even have to spend more than a day to understand all that.

collyw3 years ago

To me a microservices application is a monolithic application, with unreliable network calls between the components. That can't possibly make things simpler to understand or maintain.

jillesvangurp3 years ago

People do microservices for the wrong reasons. There are only a few somewhat valid reasons:

1) something has different run time needs than something else. Think CPU, memory, network. Breaking stuff up allows you to make different choices here. Although I have dodged this by simply deploying the same monolith and configure it to do different things.

2) Something needs to be developed by a different team and for whatever reasons you don't want those teams to be too dependent. It's a bad reason but it's valid in a lot of companies where certain teams just need to be engineered around or where there is a fundamental lack of trust between different parts of the org chart. Conway's law is a thing. It's the most common reason to do micro services.

3) You have two things depending on each other (cyclical dependency) and you want to reuse that thing. Extracting it to a third thing is a common way out. It's true for almost any component technology. If you have two components, you'll find a reason to create a third. And a fourth. And so on. However, consider using something less dramatic. E.g. code libraries are a valid choice. Or having an extra module in your source tree.

Everything else is just needlessly/prematurely increasing overhead, deployment friction, etc. You get more things to monitor, deploy, manage the roadmap off, worry about, specialize in, etc. Big bloated organizations do micro services because they are big and bloated. Many smart startups keep this nonsense to a minimum. Of course some startups start out being over funded and bloat too early. VC money is great and sometimes requires over engineering like this (i.e. impress the suits). I've heard more than a few CTOs boast their multi cloud strategy and micro service architecture. In my mind that translates to we funnel a lot of VC money to Amazon and pay people full time to do just that. Ridiculous monthly bills and no users or traction is a common pattern in that world.

jmchuster3 years ago

Hmm, i don't think our needs fall into any of those three. But our reasons may just be, as you say, the wrong reasons.

We split up our services when the majority of changes made to a service can be made independently of everyone else. So then each of our services is a different codebase, and they each have a completely different rate of change from each other. You very rarely would ever make changes across all services as once, only when changing what's being communicated between some set of services. Some services are large, some are small, and many of them fall into the category of -- developed for a while then stabilized and now basically rarely every touched just works.

So then the advantage is that you always have a small mental working set. You're focusing on a service at a time, have less to worry about breaking everything else with your changes. And then when you deploy, even if everything goes horribly wrong, it's just your one service that is down, and everything else is up and running, and you'll just have to process your queued messages once you came back online.

And then of course each service is smaller, so less code, less tests, faster to compile, faster to run through the pipeline, faster to release. And you're only ever doing rolling deploys on a small sub-section of your infrastructure, never the whole thing at once.

AlexCoventry3 years ago

Modular interfaces are great, but why require communication over those interfaces to go via network traffic? Can't you just use a library/package?

gravypod3 years ago

Some reasons may be:

1. Multiple languages/tech stacks. Have something like protobufs and now any language can "import" your library instead of it just being things that can handle cffi.

2. State management. For example, redis couldn't be an in-process library. There are in-process kv stores but that's not the idea of redis.

3. Transparent monitoring for all languages. I can passively detect all retries, latencies, request counts, etc since there is a protocol.

4. Centralized fixes. Similarly to dynamic linking I can now update a single component and "fix the world" rather than redeploying every binary.

5. Canaries: I can deploy version N+1 to a mirror of production traffic and make sure it doesn't explode. Then I can route 1%, 10%, 50%, 100% of traffic and check those metrics as well.

6. If something goes horribly wrong I can rewrite the entire service for performance or maintenance or whatever within a tightly bounded amount of time. Discord has many blog posts about doing this for some of their services.

Great talk: https://www.youtube.com/watch?v=-UKEPd2ipEk

+1
erik_seaberg3 years ago
spaetzleesser3 years ago

That's what I am always. People can't manage libraries so they think services will solve this problem. It seems to me that a lot of microservices are creating technical debt that will come due when the requirements change in a way that requires a big overhaul of the system.

jmchuster3 years ago

Because there is state?

+2
anyfoo3 years ago
rualca3 years ago

>>So then the advantage is that you always have a small mental working set.

You don't need microservices architectures for that. Even a clean architecture on a monolith ensures that you can compartmentalize the problem domain.

At most, just refactor out that responsibility into a library/component/module.

jmchuster3 years ago

It's a little bit hard to cleanly separate each grouping of database+cache+background tasks+apis+web ui.

Diggsey3 years ago

There are plenty more reasons than the three you've mentioned...

However, the one thing that's not a reason to use microservices and which the article brought up a lot, is to increase resilience. Microservices do not increase resilience: at best they can avoid reducing it.

One of the most compelling reasons for me to use microservices is to limit the potential damage of bad decisions.

In a monolithic application, one developer can have a bad day and write some code that eg. leaks a database object into an area of the program which should deal with business logic only. Or perhaps they try out some new technology that turns out to have problems.

If that manages to get through code review, and is then copied elsewhere by other developers who are just following the example, you can quickly end up in a situation where fixing the problem requires everyone to stop feature work so that a significant rewrite or at least refactoring can take place.

In most cases, the business cannot afford to stop all feature work in this way, and so the problem will persist forever, becoming a permanent drop in productivity. It will also make your developers miserable.

In a microservices architecture, a single mistake can only grow to the size of a single service. In the worst case you have to stop feature work on that one service for a period of time, but work on other areas of the product can continue uninterrupted.

Of course, you could still make a mistake whilst defining the boundaries between services. Luckily those decisions are much less frequent and involve many more people, so there's less chance of a freak bad decision. And even if you do get it wrong, you're only looking at two or three affected services rather than your whole product.

macspoofing3 years ago

>you can quickly end up in a situation where fixing the problem requires everyone to stop feature work so that a significant rewrite or at least refactoring can take place.

But that's normal and expected on any non-trivial codebase that has been evolving over some amount of time. We call that 'maintenance'. You'll never have 100% of your development resources pushing new features out. And if you do, well... we call that 'incurring technical debt'. You should be investing in this kind of maintenance continuously, because as your product evolves, new problem areas will emerge.

>In most cases, the business cannot afford to stop all feature work in this way, and so the problem will persist forever, becoming a permanent drop in productivity.

Yes.. That's called 'technical debt'. If your business does not invest in maintenance today, then it'll will cost them in the future. This isn't limited to software. If you're maintaining physical infrastructure, same deal. You're always fixing things. Microservices are not a solution to this.

>In a microservices architecture, a single mistake can only grow to the size of a single service.

Not necessarily. Microservices tend to be tightly coupled to other microservices. I've worked on a system that suffered from sporadic 'failure storms' where a failure in one service propagated across the entire system. It took days for us to track down the root cause. Microservices aren't a panacea. In fact, everything is easier with monolithic systems.

Diggsey3 years ago

> But that's normal and expected on any non-trivial codebase that has been evolving over some amount of time.

Having to stop all feature work is not normal. A healthy business should be able to have a certain proportion of its engineers working on tech debt at all times, it shouldn't have to stop all feature work to address tech debt.

+1
macspoofing3 years ago
candiddevmike3 years ago

A single mistake can only impact one service? If a microservice depends on said broken microservice, you could experience a cascade of failures, or possibly data corruption, because microservice development is typically predicated on trusting the other components following their API spec.

Diggsey3 years ago

See my other reply: I'm not talking about bugs, I'm talking about tech debt and design mistakes.

rualca3 years ago

> In most cases, the business cannot afford to stop all feature work in this way, and so the problem will persist forever, becoming a permanent drop in productivity. It will also make your developers miserable.

This scenario is only possible if the project has no automated testing or invests any effort in QA or no one pays any attention whatsoever to whatever is running in prod.

Meanwhile, one of the basic features of CICD is automated rollbacks.

If this sort of problem happens on a project, the problem was not caused by a developer having a bad day. The problem is caused by an entire team enduring a culture that allows problems to fester without anyone doing anything about it.

Diggsey3 years ago

I'm not talking about bugs, I'm talking about tech debt and design problems. Tests can't catch those things, and you can't rollback six months of work...

yxhuvud3 years ago

Well there are different cases of resilience. Creating a separate service for some part that have the potential for network traffic spikes can be fine and can make the system more resilient against that.

But you are right in that it makes the code resilience worse, as it makes logic discovery and overview harder.

DasIch3 years ago

Big organizations do microservices because it reduces the number of people who you need to coordinate changes with to a reasonably small number per service. This is critical for the ability to effectively maintain services and make progress at a reasonable pace.

Important to note here is that a single microservice can actually be quite large with anywhere from 3-12 people working on just one or a few services. A single service could be larger than a startups monolith.

I would argue that this is really the only reasonable use of microservices. If you can fully understand your monolith, you shouldn’t change to microservices. If you fully understand or even know about all microservices in your organization, chances are that you’re doing it wrong.

yakshaving_jgt3 years ago

As far as I can tell, there's no reason why any of the constraints you have described could not be solved with libraries, and a function call is always going to be less complex than a network request.

DasIch3 years ago

There are many reasons for why a library might not be sufficient. The most common one is that what you’re doing is stateful and requires some sort of data store.

That has a huge impact because now you need to set up the data store and potentially perform migrations, which means you may need to change how you do deployments. It may introduce a bottleneck such as number of connections to that store, so you now have limitations to think about when scaling.

+2
Tabular-Iceberg3 years ago
raffraffraff3 years ago

How do you ensure that everybody uses the correct version of the library? If a library change comes with a database schema change, how do you coordinate? With microservices, the owner of the service is responsible for the database schema change and service upgrade. Nobody else needs to be involved. (Genuine question, I'm an infrastructure guy, not a software engineer)

yakshaving_jgt3 years ago

Where I work, we use Haskell's type system to keep all of these boundaries in sync. All of the "services" we have extracted are indeed libraries. These libraries use type classes to interface with the core of our application which manages data persistence. This also means that each "service" can be written mostly in isolation and use a different data store (like an in-memory DB or something), depending on our needs.

spaetzleesser3 years ago

"Important to note here is that a single microservice can actually be quite large with anywhere from 3-12 people working on just one or a few services. A single service could be larger than a startups monolith.".

This seems quite reasonable In some companies it seems it's more like 1 developer managing 3-12 services :-(

rualca3 years ago

> 1) something has different run time needs than something else.

This is supposedly the #1 technical reason to move to microservices: horizontal scaling + distributed system (in the rare cases the app needs low latency across the world)

The other one is being able to reuse services, whether provided by third parties or that you can download from docker hub/GitHub and run yourself.

> 2) Something needs to be developed by a different team and for whatever reasons you don't want those teams to be too dependent.

This is supposedly the single core reason to adopt microservices architecture. This is what says on the tin. If you want to have multiple teams work independently on separate components, they need to be loosely coupled deployed independently.

This is not brain surgery.

> Everything else is just needlessly/prematurely increasing overhead, deployment friction, etc.

What else is there? The rationale for microservices was from the start organizational, with a lesser scope on operations/technical aspects.

jaredcwhite3 years ago

The problem with the hype around microservices was that it's the web app deployment equivalent of writing device drivers in C. Sure, some people can do it. Some people have to do it. Yet most people shouldn't even make the attempt.

I've been on teams where we can't even reliably deploy code over time to one service. The idea of our team maintaining multiple services is madness. That's not a knock on any individual's technical merits. Deploying code to the cloud is just hard, period—and that's even when talking about a traditional monolith!

I'm glad the pendulum is swinging back. The microservices pattern is useful for the areas in which it's useful…it just so happens that problem space is way smaller than the hype cycle cared to admit a few years ago.

dwaite3 years ago

> I've been on teams where we can't even reliably deploy code over time to one service. The idea of our team maintaining multiple services is madness. That's not a knock on any individual's technical merits. Deploying code to the cloud is just hard, period—and that's even when talking about a traditional monolith!

If devops has a good philosophy here it is - if something is hard, you should do it more often. When something is intermittent pain it can be avoided or considered a one-off; if you are say deploying your web app once a week or more people will start to figure out how to automate the manual processes and optimize the slow ones, rewrite troublesome components and come up with strategies for things like incremental database migrations.

More generally - everything is a trade-off, and you shouldn't blindly accept more complexity when you don't need the benefits that it is supposed to provide. But sometimes you need to embrace the complexity of doing things the right way when you are hitting up against limitations of doing it the wrong way (in this example - slow, painful, error-prone deployments)

Microservices are useful when the monolith becomes too complex for local reasoning and management, so you instead compartmentalize things into components with contracts and reason about those, while you are responsible for managing just your own component. You're taking on complexity (and latency, and additional resource utilization) because your system started to hit up against the limits of everyone working within the same project.

If you haven't hit the point where your database is falling over in production because of concurrent loads, or where development is hindered by the infrastructure/dependency or time requirements of doing a local test, or various other problems - you likely don't need to invest in doing microservices.

ownagefool3 years ago

I'd argue it's just the word micro.

People draw the boundries out in very odd places when they're following microservices like a religion, but here's an example where it might not suck.

Pull out the auth service.

Should a startup with a single app do this? Probably not.

But as soon as your big enough that someones yelling at you to power multiple services and you have compliance requirements that requires you solve protective monitoring, you probably have a project for a small team of experts, a reason to run this indepedently, and a stable enough API that you can do it well.

Alternatively you could pay Okta & co in blood. :)

pm903 years ago

The author doesn’t say the pendulum is swinging back. They’re providing many (valid) failure modes which were not evident at the start of the “revolution”. We’re just building tools that make it easier to build and operate micro services and being deliberate about when services should be split.

kevmo3143 years ago

> Some teams were suffering from servicitis. Even worse than that, it generated a lot of friction while developing. One could not just look into a project in their IDE, but it required to have multiple projects open simultaneously to make sense of all that mess.

This is real: I've worked in projects where purely transformational code was offloaded into a "service". Refactoring it into a library reduced lines of code, computational cost, and code complexity dramatically.

But wouldn't it be cool if there were a framework where the developer didn't have to demarcate where services started and ended? In principle, any pure asynchronous function could be abstracted out to a service. It would be neat if the compiler did that for me and deployment of the application was more like "deploy the cluster" instead of deploying each individual service.

aprdm3 years ago

Lol, that sounds like Java in 2011... put an annotation on a function and it's a service :) SOAP and all that.

Maybe they had something right! It did cause problems when users put something on a tight loop that was actually a remote call but not easy to see.

Anyho... what's old is new!

kitd3 years ago

The one I was thinking of was SCA. Define the interface and implement the backend, either as a service or library, and the SCA layer in between would handle how to call it.

rsynnott3 years ago

Also Java in 1996; remember RMI?

ourmandave3 years ago

I used RMI but put a queue in front of it.

One at a time, please wait your turn...

brown3 years ago

Next up, leftpad.io, leftpad as a service.

kevmo3143 years ago

Haha someone actually made this: http://left-pad.io/

coding1233 years ago

This is comedy gold:

> `left-pad.io` is 100% REST-compliant as defined by some guy on Hacker News with maximal opinions and minimal evidence.

slver3 years ago

> But wouldn't it be cool if there were a framework where the developer didn't have to demarcate where services started and ended?

Erlang/Elixir, for JVM Akka, for .NET Akka.NET.

jerf3 years ago

No, it's a bit of a myth that those techs just magically scale everything. They don't make that problem go away at all. You still need to find homes for the services to run on. It does make it a single function call to potentially start up a new service on a remote node, but if you need to be careful about resource use you're hardly any better off than you are with anything else. In a way that can end up being too easy to just add work to systems rather willy-nilly, the ceremony in kubernetes or any other more explicit system can be a good thing.

slver3 years ago

The question wasn't "how to magically scale everything".

The question was why not eliminate the difference between in-process async call and a service, and that's basically what Erlang and actors do.

wrnr3 years ago

That is what lots of actor frameworks are trying, first specify a DAG of tranformations and then instantiate that graph on a physical network of computers such that the async boundaries can be scaled independently. It's cumbersome, like writing the same program squared just to get scaling for "free".

Personally I think the Persistency-as-a-library and Consistency-as-a-library architecture is a saner alternative.

MichaelMoser1233 years ago

servicitis sounds like the absence of an experienced system architect, or the absence of any kind of system architecture as such.

ai_ja_nai3 years ago

Erlang?

spaetzleesser3 years ago

I recently got talked into developing a new project with Kubernetes and microservices. It's an interesting journey but the complexity this adds is just enormous. Debugging is hard, refactoring is hard once it touches service boundaries, coordinating releases between services is hard and so on. I highly doubt that we will ever scale to a size where the complexity pays off.

I feel this kind architecture of especially appeals to people who like to write only new code instead of understanding existing code. They don't like to read old code to see where new functionality may fit in so they spin up new services.

We are complaining about maintenance of old COBOL code but God be with the poor people who in 20 years will have to maintain the monstrosities we are creating today.

mech4223 years ago

>>We are complaining about maintenance of old COBOL code but God be with the poor people who in 20 years will have to maintain the monstrosities we are creating today.

I'm hoping for some major tax changes right before my retirement so I can dust off my COBOL and get phat lewt!

I wonder how many people that have actually written COBOL will be left by 2030? No one mentions it here, but I'm sure there must be programmers deep in the bowels of banks and insurance companies that are learning COBOL even now?

Oh - And RPG... wonder if there's any of that left ?

yurishimo3 years ago

There are definitely new devs learning COBOL. I'm not sure where I saw it, but I could have sworn I saw an article or blog post about these mission critical companies paying insane wages and signing bonuses to devs who would sign-on to learn & maintain their infrastructure. And since these jobs would likely be on prem for security reasons, they don't have to pay insane Silicon Valley wages either.

Imagine living in Cincinnati making $250k maintaining COBOL? Some people are totally fine with that paradigm and will keep the system running as long as it needs to.

mech4223 years ago

I was down until the on-prem thing..

Still, I wouldn't be surprised about insane wages for COBOL. You make money at the bleeding edge of tech, or way back on the tail end - scarcity breeds profit :-)

spaetzleesser3 years ago

i think most of the highly paid COBOL devs are a myth. I looked into this a while ago and after talking to some people I got the impression that these jobs don't pay well at all. This may be the reason why nobody wants to do it. I bet otherwise they would have no problem finding plenty of people who are OK with making 250k in Cincinnati.

cratermoon3 years ago

> Timeouts, retries, and resilience

At a previous employer I was responsible for a critical service that was starting to show strain as traffic ramped up. It used Hystrix as the circuit breaker for calls to backend services, including the DB, and at peak times the thread pool would fill and start rejecting additional requests. I was tasked with fixing that.

There's a very simple formula for tuning the number of threads:

> requests per second at peak when healthy × 99th percentile latency in seconds + some breathing room

The catch is that getting good RPS and latency numbers is in a distributed system deployed across three geographically separated datacenters is the opposite of simple. In particular, the legacy of the system meant that we had one write instance of the DB, in one datacenter, meaning that latency was different depending on which DC was the source of the call, so there was no one setting that worked for all instances.

sackerhews3 years ago

I was once working with an engineer so hell bent on splitting everything up into microservices that at one point logging became incompatible with his solution.

He then argued that our banking solution didn't need logging, because it was so well tested that failure rates would be extremely low.

I'm not making this up.

e67f70028a46fba3 years ago

Putting a network connection between your application abstractions was always a dicey proposition. (See EJB 1.0)

Making it the entire basis for application abstraction is lunacy, the sort of extremely clever idiocy that can only occur in the tech world.

discreteevent3 years ago

The funny thing is that Martin Fowler had a First Law of Distributed Objects: Don't distribute your objects. But then he switched and helped popularize microservices (he's rowing back a bit nowadays).

I can only assume it's because he saw his clients doing it and for a consultant the client is always right. (And I think the client was doing it to indulge their hard to get developers who wanted complexity and the freedom to use their programming language of the week)

It feels like a cowboy industry when this is the 'leadership' we have.

e67f70028a46fba3 years ago

Yep.

Look at the REST/JSON API debacle that has unfolded over the last two decades. It is trivially obvious that REST is both difficult to implement and largely pointless outside of a hypermedia system, but the thought leaders never said a thing about it.

edgyquant3 years ago

How is REST difficult to implement? And as for pointless, what is a better way to present an API in your opinion?

e67f70028a46fba3 years ago

JSON isn’t a hypermedium so creating a uniform interface is hard.

I don’t have a strong opinion on alternatives. GraphQL seems ok for server to server stuff.

tootie3 years ago

There are some valid cases for microservices. Specifically when you have divergent scaling needs for different types of services. But almost everything else about them can be solved with good code modularity. I think microservices grew out crummy package managers that either didn't work very well (npm, everything python has tried) or ones that were powerful but no one knew how to use properly (maven, nuget).

gravypod3 years ago

How do your applications persist data? If you're running a compute resource today there's very likely a TCP connection between your application and your data somewhere.

Also, in things like kube, it's possible to schedule jobs next to each other so your TCP connection is essentially going over localhost.

There are many tricks you can do to have this even out perform a single process application.

e67f70028a46fba3 years ago

That’s why I said application abstractions rather than system abstractions.

slver3 years ago

Disaster #1 is too small services, and Disaster #4 is huge, shared databases (between many services).

Which reaffirms my overall opinion that most people writing services have no idea what's a service and what it encapsulates (it encapsulates its own state, for example, it's basically distributed OOP).

BTW, remember when "your service should be 100 lines of code top" was considered a best practice?

Why is it so hard for most to resist this hype nonsense wave when its oncoming and it's so hard to resist the anti-hype wave that inevitably follows it? Because hype and anti-hype are simply the oscillation of a sea of empty minds in look for a solution to a problem they don't understand.

I've been writing service oriented apps for decades. In my "world" nothing has changed.

unlimit3 years ago

> Why is it so hard for most to resist this hype nonsense wave when its oncoming and it's so hard to resist the anti-hype wave that inevitably follows it?

Because then how will one do career development? We are building now for a client and the client wants it, I can feel it in my bones that what we are building is far more complex and will be difficult to maintain. I am no expert in microservices but this hunch is just coming from common sense.

slver3 years ago

In your situation it seems like a non-technical person in charge of technical decisions. Those decisions are by definition poor quality.

But the really bad moment is when the developers themselves make those bad choices entirely on their own.

unlimit3 years ago

> In your situation it seems like a non-technical person in charge of technical decisions. Those decisions are by definition poor quality.

The client is to blame. The client let go of all the people who knew the technical side of the product and has hired an architect to re-architect everything. And the client is in a hurry.

yxhuvud3 years ago

I think one major contributing reason is that the amount of programmers is still rising so fast, with each new generation being larger than the previous. Then every generation see the tail end of the previous generation and will then rebel against it to solve the issues. That they will have moderated after a couple oscillations doesn't matter as there is so few of them in most work places.

jeffbee3 years ago

No, I don't remember when 100-line services were the "best practice" because I don't elevate every stupid idea that some inexperienced kid blogs about to "best practice". I know 1000s of people in the industry and exactly none who would agree that is a best practice. Successful large-scale systems are built around much larger services, like "deliver this e-mail message" or "store this blob of data".

slver3 years ago

You legit called 1000s of people in the industry to ask them what they think about the 100 lines meme. Great, appreciate your effort there. /s

aprdm3 years ago

Really good write up! I think the suite spot in a < 50 eng organization is either a monolithic or a microservice per domain (instead of per functionality)

Once you have thousands of engineers, then you either need extreme discipline and a huge team maintaining the "devops" pipeline that everything goes through, or, it's basically everyone for themselves and a "devops" team trying to help others setting standards, best practices and whatnot.

FpUser3 years ago

In places where micro/services were *really* needed people/orgs were implementing those even decades back. I personally was doing it in the 90s. For myself the criteria for making something as a service was simple: it will really cost the organization not to have it as a service.

Now as with many things that nothingburger got suddenly overhyped and ended up being shoved into every hole disregarding of any technical rationale.

trixie_3 years ago

Another one for http://microservices.fail/

There are so many developers who think microservices are the answer to every ill in the world it is infuriating. It’s like one of those ‘nosql is webscale we should use it’ conversions.

helge92103 years ago

> I've seen many engineers ignoring these because it's "an edge case", to realize later they have a massive data integrity problem.

Massive optimism detected at the part of "to realize later".

Also, if you have "edge case" with high (close to 1 but lower than 1) probability of successful outcome, with more and more tries probability of all outcomes being successful moves to zero (multiplication of lower than 1 values) and probability of at least single failed outcome goes to 1 (addition of lower than 1 values). You still have to handle low probability edge cases.

tjpnz3 years ago

The E2E testing one is driving me nuts. I've told people time and time again that we'll never be able to have hundreds (or even a dozen) of them working reliably. Yet people still try and to date we've lost even more hours in trying to make them reliable. The idea is seductive but in my experience an exercise in futility. At the same time I've proposed alternatives such as semantic monitoring but people typically recoil in horror when they learn what it means.

throwawaaarrgh3 years ago

It depends on what your goals are. At Cisco, we absolutely needed to do E2E testing because we were shipping people physical devices that they relied on to work as intended. We built automation test frameworks and reusable/configurable tests. We built labs full of equipment and time-sharing systems to keep equipment constantly in use. Quality engineers wrote tens of thousands of (mostly) no-code tests, that would each do tens of thousands of permutations of tests. We wrote new test tools and bought high-performance test gear. In the end, we were able to do via automated testing what it would have taken thousands more people to do individually.

Your organization and product goals may be totally different and not need that level of testing. But if it is needed, you can do it. It's up to your business to decide how much to invest in it.

mavelikara3 years ago

A response someone wrote to this article: https://medium.com/productboard-engineering/countering-micro...

seibelj3 years ago

> Three languages

> ”At the heart of our Engineering Strategy is a rather simple document. A manifesto if you’d like. We call it Core Engineering Principles.”

> Lists 7 fairly complex rules

> ”There are about 30 more touching stacks, architecture, and org structure.”

I’m dying. This could only be written with a straight face by a microservices enthusiast.

zizee3 years ago

I read the full post linked to by the grand-parent, and I think you are mischaracterizing things.

Settling on 3 programming languages is a good compromise between "one language to rule them all", and "anything goes". Being limited a single language's talent pool can be quite limiting, as is only having one language to problems in. Having no limits on language choice also has obvious drawbacks.

The "7 fairly complex rules" are not _that_ complex considering they are rules for trying to tame a domain that is inherently complex. If you try and scale any software engineering team much beyond 30 engineers you will need to have similar rules regardless of whether you have a majestic monolith, microservices, or some mix of the two.

ferdowsi3 years ago

It's interesting to hear about stability concerns. Overall I think my organization moving to microservices improved our resiliency story. It allowed us to freeze sensitive legacy services and gradually build other surrounding services that incrementally replaced those legacy services with better-performing Go services. Rolling out new services is not onerous due to our Kubernetes platform (which was nowhere near as difficult to build on as some might suggest).

Strong service boundaries helped us, they didn't hold us back.

mbrodersen3 years ago

If you are not a good enough developer to build monoliths then you are not a good enough developer to build micro-services.

gravypod3 years ago

> One could not just look into a project in their IDE, but it required to have multiple projects open simultaneously to make sense of all that mess

Why not both? You can set things up so your go-to-def can understand your API calls and head to the right place. This is very easy to do with a monorepo + protobuf setup.

> How much does it cost to spin 200 services in a cloud provider? Can you do it? Can you also spin up the infrastructure needed to run them?

Assuming 256mb RAM/service you're still well within 1 machine territory. Once you get above 1 machine territory you can set things up so that you can:

1. Build really good integration testing tooling so that devs don't really need to interface with all services. In a test spin up everything you need + deps, run an API call, tear everything down. This can be cached if your build system does that. You can run into issues if you have situations where 1 API call hits every service but if you've done that you've already messed up. In those cases the best you can do is mock a step in the chain there, run a few test hits the entire chain before release, but then have devs run against the mock in their integration tests.

2. Hybrid environments. You run a dev cluster that has all of your basic infrastructure that doesn't change much and provide a way for developers to launch new tasks that don't get routed to unless the driver has a feature flag flipped. Essentially you have a "dev" cluster that is continuously delivered from your master repo, each developer has the ability to launch new tasks in this cluster, and they can say "all traffic from alice for FoobarService should go to `{namespace=bob,service=Foobar}`.

> As you can imagine, end-to-end tests have similar problems to development environments. Before, it was relatively easy to create a new development environment using virtual machines or containers. It was also fairly simple to create a test suite using Selenium to go through business flows and assert they were working before deploying a new version.

Why is it not simple anymore? I've implemented this at more than one company.

> Aside from being an obvious single-point-of-failure, defeating some of the service-oriented architecture's principles, there's more. Do you create a user per service? Do you have fine-grained permissions so service A can only read or write from specific tables? What if someone removes an index unintentionally? How do we know how many services are using different tables? What about scaling?

Come up with a convention for your company and stick to it. If you can automate it that's better. If you build some way for your task you are running to know "who" it is they can inject that information into other libraries. For example you can inject the following environment variables into a container:

    FOOBAR_DB_ABCD=pg
    FOOBAR_DB_ABCD_PASSWORD=...
    FOOBAR_DB_ABCD_HOST=...
    FOOBAR_DB_ABCD_PORT=...
You can then have some library you write expose a `OpenDatabase("abcd")` that connects in and injects everything. A security operator can then provision accounts and everything transparently. If you generate those env vars from some automated config management tool you don't even have to see the passwords.

> Instead of having a monolith getting all of the traffic, now you have a home-made Spring Boot service getting all of it! What could go wrong? Engineers quickly realize this is a mistake, but as there are many customizations, sometimes they cannot substitute this piece for stateless, scale-friendly ones.

I don't think this is a single point of failure. This is a single point of failure for a specific subset of your infrastructure and at that it should be a very simple (mostly-pass-through) component. If your mobile gateway dies, your backend one shouldn't. If all of your API gateways die then your integrations to third parties that are required for legal compliance should stay up, etc.

> I've seen teams using circuit breakers and then increase the timeouts of an HTTP call to a service downstream.

You should always decrease timeouts for operations if you're attempting to retry calls. You can also use a load balancer that already knows the liveness state of all of your instances.

Aside from this bit I mostly agree with this section about timeouts, retries, etcs is actually correct. If you are tackling a single problem that's simple don't break it into a distributed system. If you are saying "I want to have X but we should implement a Y" where Y is a completely different thing that doesn't need to talk to X directly, then why not implement it in a separate binary? There's no reason they can't share code to make the burden of operation low.

jameshart3 years ago

I guess you just won’t get traction on HN writing ‘Disasters I’ve seen in a monolithic world’.

If you have never gone through a significant dependency upgrade on a large monolithic codebase, you might not appreciate the value of microservice architecture. I’ve been at companies where the number one technical achievement for a tech organization for an entire calendar year was ‘successfully moved the app from .net 2 to .net 4’. Have you ever gone through the process of integrating an acquired company’s monolith into an acquiring company’s monolith? It’s futile. With a microservice architecture though I’ve seen acquisitions integrate significant systems (things like billing and auth) in a matter of weeks where monolith projects dragged on for years.

Sure, there’s no silver bullets. But there are a lot of problems with monoliths which microservices eliminate. Not without trade offs, naturally.

dspillett3 years ago

> I guess you just won’t get traction on HN writing ‘Disasters I’ve seen in a monolithic world’.

Though I think mostly that is because the discussion there has been well trodden already. Microservice Architectures are relatively new, less completely explored, and less familiar to many.

There are new and interesting mistakes to make (the biggest two being to use this model where it is not at all appropriate, or cargo-culting and not really understanding what its advantages are so not actually taking advantage of them) where most of the "interesting" problems to be had in the monolithic world have been well documented for quite some time (which is why new models pop up, to try remove the potential for these problems in some, many, or even most, cases (anyone who says method X is better in all cases should be treated with deep suspicion)).

jeffshek3 years ago

In those organizations, I highly doubt microservices or monolithic decisions would have made a difference.

Disorganized management won't magically get fixed if the root problems of ASAP, priority, techinal debt, etc features aren't fixed.

The fact that the organization was trying to combine two large monolithic codebase into one is an obvious smell that technical decisions are made by someone non-technical.

jameshart3 years ago

Well yes, acquisitions are frequently decided by nontechnical people. That’s where the money comes from.

For better or worse, nontechnical people sometimes make decisions. One of the jobs of a software architecture is to be robust in accommodating the consequences of them. Monolithic architectures frequently aren’t.

jokethrowaway3 years ago

Microservices was a terrible buzzword and definitely made me re-evaluate the kool-aids I had previously drank from Martin Fowler.

We already had services and service oriented architecture. Finding the correct size and establishing boundaries between services is a huge part of a correct architecture.

Building a monolith was a first step, then companies were branching out different parts in other services, when and if needed (eg. when doing acquisitions, as you mentioned).

Microservices just pushed companies to write services that were too small (I still remembers thought leaders praising 20 lines microservices) and declare dinosaurs everyone doing monoliths.

Your ‘successfully moved the app from .net 2 to .net 4’ becomes `updated all versions of the logger in all microservices`.

jameshart3 years ago

Key difference for me between microservice architecture and service oriented architecture is the move away from a shared database. Switching to small, single purpose datastores is the much bigger shift in thinking than small, single purpose components.

ryanthedev3 years ago

I work in a 300+ project monolith. If you think microserivces are an issue. I can’t help you.

I have worked in both. Until you realize that both worlds have their pros and cons, just stop.

frays3 years ago

Great read, thanks.