Learnings from 5 years of tech startup code audits

jacquesm • 2 years ago

That's a very interesting set of findings. What is important to realize when reading this that it is a case of survivorship bias. The startups that were audited were obviously still alive, and any that suffered from such flaws that they were fatal had most likely already left the pool.

In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them. But business case problems, for instance being unaware of the true cost of fielding a product and losing money on every transaction are extremely common. Improper cost allocation, product market mismatch, wishful thinking, founder conflicts, founder-investor conflicts, relying on non-existent technology while faking it for the time being and so on have all killed quite a few companies.

Tech can be fixed, and if everything else is working fine there will be budget to do so. These other issues usually can't be fixed, no matter what the budget.

izacus • 2 years ago

I've found that slowdown from tech debt killed as many companies as any other issue. It's usually caused by business owners constantly pivoting, but being too slow on the pivot and too slow to bring customer wishes to fruition (due to poor technical decisions and tech debt) is probably one of the top 5 reasons for dead companies I've seen.

jacquesm • 2 years ago

That's a good point, tech debt can be a killer. But the more common pattern that I've seen is that companies that accumulate tech debt but that are doing well commercially eventually gather enough funds to address the tech debt, the companies that try to get the tech 'perfect' the first time out of the gate lose a lot more time, and so run a much larger chance of dying. The optimum is to allow for some tech debt to accumulate but to address it periodically, either by simply abandoning those parts by doing small, local rewrite (never the whole thing all at once, that is bound to fail spectacularly) or by having time marked out for refactoring.

The teams that drown in tech debt tend to have roadmaps that are strictly customer facing work, that can get you very far but in the end you'll stagnate in ways that are not easy to fix, technical work related to doing things right once you know exactly what you need pays off.

bcrosby95 • 2 years ago

Maybe once you get to that stage it doesn't really matter. Maybe if you're going for a billion dollar earth shaking idea, it doesn't really matter.

However, I've worked for a small company for quite a while now. We've had several successful projects and several failures.

In my experience, technical debt taken too early can easily double the time it takes you to find out if a project is a dud. That matters to us.

My general rule is: push off technical debt as late as you can. Aways leave code slightly better than you found it. Fix problems as you recognize them.

I think a big mistake developers make is thinking "make code better" should be on some roadmap. You should be making code better every time you touch it. Nothing about writing a new feature says that you have to integrate it in the sloppiest way possible.

jimbokun • 2 years ago

> I think a big mistake developers make is thinking "make code better" should be on some roadmap. You should be making code better every time you touch it. Nothing about writing a new feature says that you have to integrate it in the sloppiest way possible.

I vehemently agree.

One of my first jobs was working for a mathematician at a bank, who could code well enough to compute his mathematical ideas, but not a software engineer so hired me to do more of the coding for his team.

He would say "Jim, just get this done but don't spend time making it fancy." In other words, don't spend time refactoring or cleaning up the code, expressed in his own words.

I would say "sure" and then proceed to refactor and clean up the code as I went. It took less time than it would have to write the code "ugly" then deal with the accumulated tech debt, and I finished everything on time so he was happy.

bombcar • 2 years ago

Even just #commenting on weirdness you discovered while working on tech-debt code can be invaluable; a little note on the two hours you spent on it could save days later when trying to figure out why something isn't working. Sometimes the problems can't be fixed right then, but you can mark them so later when it does break you have a hint as to what is going wrong.

JamesBarney • 2 years ago

jakey_bakey • 2 years ago

Exactly this. People forget that the point of the metaphor is that debt is a tool you use to grow faster.

Credit card debt (e.g. sloppy code and test-free critical backend processes) is pretty bad and should be paid down ASAP.

Mortgage debt (e.g. no UI tests on the front-end) is quite safe and you can kick the can down the road.

azemetre • 2 years ago

In my experience when you don't design and deliver code with testing or accessibility in mind, you end rewriting entire components. This all drastically adds to the end costs. Most leadership thinks this is "efficient" but it's not really. If you do it correct the first time you can consistently deliver features throughout the entire year rather than having to take several months to quickly duct tape everything from falling apart.

I never liked the "debt" metaphor. If a housing developer neglected to build a proper foundation, would you call that "debt?" I feel like it's very similar, it's bad metaphor for a concept that has very little to do with finance.

p_l • 2 years ago

That's missing the case where the tech debt results in lowered commercial performance, as things necessary to keep customers happy enough to provide the cash flow are getting harder and harder.

jghn • 2 years ago

fullstackchris • 2 years ago

The double challenge here is doing this all whilst essentially keeping it in the background out of any customer sight. Even if you know exactly what you need after a while from a business perspective, you still need to reimplement it in a way that doesn't cause your product / service / platform to lose customers. I find this always to be an extreme challenge. It's a bit of a treadmill too: doing it this way (without causing breaking changes) certainly takes longer too. So it all piles up into a big messy stack of work :)

nonameiguess • 2 years ago

I feel like the biggest problem from a business strategy perspective is all we have are these personal opinions and gut feels. Even this article mentions having done 20 code audits, but presents nothing but qualitative findings. Ideally, some business school out there would be embedding researchers in randomly selected startups to know for sure how often you fail because of tech debt versus failing because of worrying too much about tech debt. That's an empirical question, yet all we get are informed expert opinions, but no auditable, reproducible research evidence. It's all so unscientific.

Not to say you're wrong, but we have no real way of even deciding. All I can do is lean on my own experience, but I've seen nowhere near every product team out there and the ones I have seen are nothing close to randomly sampled or blinded.

theptip • 2 years ago

It’s one of those systemic health type things. It’s really hard to die of tech debt on its own, but if you move slower, you’ll die more often from other shocks.

Another way of thinking about it is that you have N months of runway, and based on your velocity you can pull off a pivot in M months, and the more tech debt, the more time it will take to successfully pivot. If you don’t have a full pivot worth of runway remaining, and you need to pivot, you die. (Of course this oversimplifies by holding pivot magnitude equal but hopefully this illustrates the point.)

I do agree that away from the margin, companies that are incredibly successful can afford to punt harder on tech debt. I suppose “know thyself” might be useful advice here; it’s probably not good advice for the median startup to ignore tech debt completely IMO.

I think the main point though is to optimize for agility; tech debt can let you move faster in the short and even medium term, so sometimes it’s right to tactically take on some debt. But not so much that you get bogged down later; make sure you carve out time to fix the stuff that starts to be painful.

phphphphp • 2 years ago

I’m not sure I agree. Technical debt is a symptom, it’s the consequence of bad management that leads to working on the wrong things.

If you’re running a startup and haven’t yet found your feet in terms of a product offering, and you’re building your product(s) in such a way that technical debt builds up through continuously layering half-baked on half-baked, it’s indicative that you’re not actually pivoting and not actually evolving, you’re just adding new half-baked ideas to a half-baked system… and being able to do that at twice the speed isn’t going to address the real problem: half-baked ideas don’t make a product, whether that’s 10 half-baked ideas or 100.

My experience is that any company in which evolution/experiments/pivoting is constrained within the boundaries of what already exists because of the sunk cost fallacy has made a grave error at a leadership level, not at a code level. If you can’t validate something without mashing more code into your system, that’s the problem to address.

I’ve seen companies with horrendous tech debt die, and you could certainly frame their death as being a consequence of the tech debt (“if they had just got the perfect system…”) but that assumes the perfect system would somehow prevent them from making the mistakes that got them there in the first place. It wouldn’t. The technical debt is an expression of their mistakes, not the cause. You could dump the perfect system at their feet and they’d be surrounded by garbage again a few years from now.

icedchai • 2 years ago

I worked at a company that was mired in tech debt. At least 4 different UI frameworks were in use, one of which was totally not supported. Multiple versions of the app were left accessible, with links from the new to the old, because the new version was not feature complete. "Feature flags" were expressed in at least 3 different ways. It was a nightmare to figure out if something was on or off, and why. The back end was based on a non supported language version, with several old, deprecated third-party packages as a result. The company appeared organized, superficially, but at the lower levels of implementation it was a total dumpster fire.

They were constantly "pivoting", but leaving the old junk around.

redisman • 2 years ago

goto11 • 2 years ago

Technical debt is a sensible strategy when you are a startup aiming for growth. If you become successful, you can hire enough developers to pay back the depth in due time. If you fail, the debt doesn't matter.

Take Facebook: They build an empire on PHP. Now they have built some clever compilers on top on PHP in order to make it safe and performant without breaking their existing piles of legacy code. Overall this is probably ridiculously inefficient compared to just using a safe and performant platform from the beginning. But using PHP in the beginning allowed them to move fast in the critical growth phase.

phphphphp • 2 years ago

icedchai • 2 years ago

Tabular-Iceberg • 2 years ago

>If you become successful, you can hire enough developers to pay back the depth in due time.

How many successful companies actually do this?

At one point in my career I transitioned from a startup to one that had been acquired some 12 years before, and found it to be even more chaotic than the startup. Instead of playing a frantic game of whack-a-mole with all the pivots and feature ideas of the founders, you had a few dozen teams playing whack-a-mole with the pet projects of their respective product managers who were trying to make a name for themselves. Which was much worse because you had to coordinate with every other of those teams, and of course work with all the integrations with the parent company.

Charitably speaking, maybe these older successful companies are bad simply because the field of software engineering was still too immature when they came about, and today's startups will actually pay back their debt when they become successful in the future. Sure, we have better tools now than then, but we still don't have a static analysis tool that can determine if we built the right or wrong thing for an ever changing market.

quickthrower2 • 2 years ago

Sounds like a version of the mythical man month. Throwing 100 or 1000 developers at it will not reduce tech debt alone. It is probably harder to eliminate debt with more developers.

randomdata • 2 years ago

I have not found working on the wrong things to be problematic so long as you take the time to eliminate the wrong things once they have established themselves as being wrong.

Not taking time is, at heart, where tech debt is born. That can manifest debt across all areas of the development process. Pressure to not take time can certainly come from management, but I have also witnessed on numerous occasions the reverse, where management asks developers to slow down and take the time to do the work well; sometimes to no avail.

Either way, your underlying thesis is quite true that given the perfect system an imperfect team will quickly reintroduce said problems into the system. This is why many software companies have become hyper-picky (even before the tech crash) about hiring. They want to try and avoid building a team that wants to shortcut that time.

JamesBarney • 2 years ago

What killed them was they never found PMF. Eventually the tech debt slowed them down so they couldn't take as many swings at finding PMF.

But in the counterfactual if they'd tried really hard to avoid tech debt that would have slowed them down at the beginning, not to mention there are plenty of organizations that will write very complex abstract code to avoid tech debt, and end up making the code base incredibly painful to work with. So overall did get they get less swings?

I've worked on a lot of old code bases and the biggest issues I've run into, issues that crippled development velocity, were 95% boneheaded decisions and overengineering. And never the types of code quality issues someone like Uncle Bob talks about in Clean Code.

spdionis • 2 years ago

Well, why are those companies pivoting so often in the first place? Isn't the root cause probably in GP's list?

new_stranger • 2 years ago

And keeping 100% of all features instead of removing the least-used features as you add new ones to keep tech debt from growing indefinitely and reaching a point where new features take months to ship.

brainzap • 2 years ago

we need data on this

trivialsoup • 2 years ago

> What is important to realize when reading this that it is a case of survivorship bias.

This is totally true, but taken too seriously it leads to inability to learn anything from almost any information whatsoever. What’s more, whatever you do (whether you take the advice of those who have gone before or not), you will not be able to decide whether you made good decisions or merely “survived”.

How does one proceed when anything can be survivorship bias and determining cause and effect for large scale operations like running a business is essentially impossible.

(When I say “anything can be survivorship bias” I specifically mean that no matter the cohort you cannot decide whether you’ve accidentally excluded unknown failures, and hence you have no assurance of the actual strength of any analysis you do).

leroman • 2 years ago

> In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them.

Not my experience..

What does a “tech failure” look like? Do the servers catch fire? Is the web site down? Maybe people are unable to login to their stations?

Hi-tech business is “Tech”, so the failure of the business is in fact the tech failing. More specifically, the business was unable to direct the tech to solve real problems and solve them well enough.. New hires took too long to onboard.. Engineers were only superficially productive.. Communication between the stake holders and engineers was lacking.. etc.. etc..

Take note that in all scenarios above “work” is being done, “progress” is being made.. ceremonies are everywhere and success is seemingly around the corner.. Or is it?

It’s just very hard to see these issues, they are hidden under layers of meetings, firings, hiring, pivots, milestones with little progress in actual business value.

marginalia_nu • 2 years ago

I think the harder you scrutinize the distinction between a tech problem and a business problem, the harder it becomes to find it.

When there appears to be such a distinction, that's usually a manifestation of something like Conway's law, a symptom that there exists an unhealthy organizational divide between business and technology.

leroman • 2 years ago

I suppose that is my point, sayings such as "I haven't seen a single start-up failing due to tech" is not possible to defend..

jackblemming • 2 years ago

>relying on non-existent technology while faking it for the time being and so on have all killed quite a few companies

This doesn’t count as a tech problem?

jacquesm • 2 years ago

Obviously not, it's a problem to decide to fake non-existing tech but more of a management decision than a problem with the technology itself, there is an infinite number of things that don't exist, no matter how much you want them to exist and if you are not capable of coming up with a working solution and rely on the world around you to move fast enough to bail you out then I would say that is a psychological problem more than anything else.

A common theme right now is 'AAI', using people to fake an AI that may not come into being at all, let alone before your runway (inevitably) runs out.

lozenge • 2 years ago

I saw one "secretary AI" that schedules meetings over email in your calendar. Just cc it to start using it (once you signed up). The idea seemingly being, fake it with low cost outsourcing to prove there's a demand for this and then make it.

The developers you'd hire to make it an actual AI and the developers you'd hire to make it a Mechanical Turk are very different skill sets.

stonemetal12 • 2 years ago

I wouldn't count Theranos' failure a tech problem. I would consider it a fraud problem.

pphysch • 2 years ago

I'd hazard a guess that many cases of startup fraud start out as good-faith delusions of grandeur, and only pivot to bad-faith fraud when the founders realize it's the only option to keep the lights on. Because the product results aren't there.

That is, plan A is Stripe, plan B is Theranos.

jacquesm • 2 years ago

You may well be on to something here. But I've seen a couple where plan A was Theranos.

mathattack • 2 years ago

I was in a startup that failed in part due to tech issues. The AI model just didn’t work. There were a lot of other problems but if the tech worked, they could have easily gotten paying customers.

rr808 • 2 years ago

> yet to come across a company where the tech was what eventually killed them

I would think that a poor quality product, or one not as good as competitors would be a big killer. Google, Facebook, Amazon have amazingly superior products. I think you're missing something.

aiisjustanif • 2 years ago

> In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them.

How about the cases where it caused fines due to failed security compliance that didn’t help the situation. Thinking fintech companies especially.

jacquesm • 2 years ago

I've yet to come across a company killed by fines, are there any examples of those? If anything I think the fines are still much too light.

lupire • 2 years ago

The fines are trivial in the US.

thewarrior • 2 years ago

Interesting. Have you written more about this somewhere ? If not you should.

jacquesm • 2 years ago

Never really thought about it, I'm typically under NDA but in aggregate I could probably do something with this without breaking those NDAs.

GoodJokes • 2 years ago

dzonga • 2 years ago

> Simple Outperformed Smart. As a self-admitted elitist, it pains me to say this, but it’s true: the startups we audited that are now doing the best usually had an almost brazenly ‘Keep It Simple’ approach to engineering.

I wrote about this before that as an industry, we have made writing software complex for complexity's sake.

> imagine, in a world where there wasn't a concept of clean code but to write code as simply as possible. not having thousands of classes with indirection.

what if your code logic was simple functions and a bunch of if statements. not clever right, but it would work.

what if your hiring process was not optimizing for algorithm efficiency but that something simply works reliably.

imagine a world where the tooling used by software engineers wasn't fragile but simple to use and learn. oh the world would be a wonderful place, but the thing is most people don't know how to craft software. but here we're building software on a house of cards [0]

:[0] - https://news.ycombinator.com/item?id=30166677

TrackerFF • 2 years ago

Hot take: The current trend of writing code, AND hiring engineers, is the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start - with respect to scalability.

Have you seen the personal blogs of devs today? What should be a simple HTML + CSS website with the simplest hosting option possible, is now written in a framework with thousands of dependencies, containerized, hosted on some enterprise level cloud service using k8s.

That's great and all if you suddenly need to scale your blog to LARGE N number of readers, but the mentality is still persistent - when one should be focused on core features and functionality - in the simplest way possible, you're bogged down with trying to configure and glue together enterprise-level software.

Maybe it's a bit unfair to put it that way - a lot of engineers know the various systems and services in and out, and prefer to do even the simplest things that way. But I've lost count how many times I've encountered devs. that BY DEFAULT start with the highest level of complexity, to solve the simplest problems, for no other reason that "but what if" and "it feels wrong that it should be that easy".

kevin_nisbet • 2 years ago

So a couple of thoughts.

> Have you seen the personal blogs of devs today?

I don't know that this is a fair comparison, because side projects can and are often a way to explore ideas, understand tech, play around, etc. So I don't know that I'd agree that it's a great extrapolation to the way an engineer works based on side projects or a blog that may have different objectives.

I do agree with the sentiment though, that we want to be watching for indicators to how a team member approaches problems.

> the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start

I don't know it's fair to say everyone, but is something I agree companies, especially startups should filter for. When I acted as hiring manager, and was trying to build SRE as an example, I would remind candidates, and the team continuously that we're not google. So while we want to bring ideas and approaches in from what google has published as "SRE", we do need to consciously leave large parts out that are appropriate to our needs and stage of maturity.

Calamitous • 2 years ago

I disagree that they go down the path they do because they think they’re going to be FAANG-sized, but rather it’s a case of cargo-culting, “we’ll use these tools/architectures because the best companies use them, therefore they must be the best tools/architectures.”

brianlweiner • 2 years ago

I don't even know if it's cargo culting as much as engineers using their day jobs as opportunities to learn marketable skills for job hopping.

Nearly ever new technology introduced at places I've worked was because someone was keen to get it onto their resume.

eugenekolo • 2 years ago

I believe this might fall under "resume driven development"

lupire • 2 years ago

giraffe_lady • 2 years ago

I think your general point is true but the personal blogs of devs angle is maybe not the most illustrative one.

We tend to apply industrial strength tools to our personal projects because it's some combination of what we already know, or we're trying to learn or refine an unfamiliar skill.

If you just gave me a linux shell I would not be able to confidently provision a secure webserver for static hosting. But I do know how to write cloudformation and deploy it. Sure this is a personal moral weakness by the standards of HN whatever, but it's where my career has led me so these are the tools I have.

ryandrake • 2 years ago

> If you just gave me a linux shell I would not be able to confidently provision a secure webserver for static hosting. But I do know how to write cloudformation and deploy it. Sure this is a personal moral weakness by the standards of HN whatever, but it's where my career has led me so these are the tools I have.

I wouldn't say it's a moral weakness, maybe more of a failing of the tech education ecosystem. It seems bizarre to me that in software, we teach complex high-level things before we teach simple low-level things. Programming students learn very complex high level languages in year 1, and then maybe by year 3 or 4 learn assembly, or what a CPU register is, or how RAM and cache works. It's like teaching a carpenter how to build a high-rise apartment building before teaching them how to measure or use a hammer.

giraffe_lady • 2 years ago

Well I didn't have any formal tech education, just what I picked up on the job and through my own curiosity.

But I mean you don't teach a car mechanic metallurgy and aerodynamics, except to the extent they'll need to apply that knowledge towards specific goals. At some point the discipline is mature enough that people genuinely don't need to, and can't, know every level of it from the ground up.

I think coding is approaching or already at the point where "cs/fundamentals of computation" should be a different degree from "professional software development."

lupire • 2 years ago

I don't think anyone using Jekyll or whatever for their blog is doing it because they use Jekyll at work.

photochemsyn • 2 years ago

side note: FAANG is an obsolete acronym according to The Economist, it's now Microsoft - Amazon - Meta - Apple - Alphabet, leading to the new acronym, MAMAA, which has the nice result that we can now talk about the outsized influence of Big MAMAA in the tech world.

jamil7 • 2 years ago

I never understand this idea of picking an arbitrary set of language features and saying, “what if your code logic was simple functions and a bunch of if statements”. The complexity won't magically go away, it'll just appear in a different set of problems.

ryanbrunner • 2 years ago

I think it's helpful to divide complexity into complexity in the business logic / problem you're trying to solve, which cannot be eliminated from a pure technical perspective (you should still try to simplify it through discussions with stakeholders though!), and complexity that isn't necessary to solve the problem.

Oftentimes the latter category could be necessary if you were at much higher scale, or if the business evolved in some way, etc., which is where this sort of stuff tends to originate. Just yesterday we were talking at my company about extracting a service in Go, since it's very high scale, very simple, and doesn't change much. On one hand, it's pretty likely we'll need to do that at some point, but on the other, it's not causing any issues right now, so there's not much point in doing it at the moment. Had we gone forward, that would have added complexity for a theoretical concern that may or may not happen in the future.

dzonga • 2 years ago

if this was the case we wouldn't have the problem of AbstractFactory that has plagued the Java ecosystem. if this was the case Golang wouldn't be here seeking to simplify things by not having classes. And having __err__ handling like it does. it's not pretty but it works. I pick on the Java because it's ecosystem is broad. However, the over-engineered complexity that resides there makes you wanna stay away.

Too • 2 years ago

One could also use this example for arguing the other way. AbstractFactory pattern would not be needed if Java had more rich feature set to begin with, in this particular example anonymous functions. (Which I believe it nowadays have). Patterns emerge when the foundation isn’t solid enough by itself to stand on.

People needed modularity, DI and callback functions (essential complexity) but since the only way to do that with the language was classes, you had to invent AbstractFactory pattern (accidental complexity).

goto11 • 2 years ago

Everyone does lip service to simplicity, but in reality simplicity is really difficult.

If you have seven conditions driving a decision, a bunch if's might be the simplest implementation. If you have hundreds of conditions, a tree of if's becomes impenetrable. There is no one-size-fits-all when it comes to simplicity.

Some problems are inherently complex. You can't design a payroll system or tax calculation system which is simpler than the set of rules and regulations it has to implement.

lupire • 2 years ago

> If you have hundreds of conditions, a tree of if's becomes impenetrable.

I mean, it worked for Amazon. I saw the code.

goto11 • 2 years ago

Fair enough, but you probably wouldn't call is simple.

commandlinefan • 2 years ago

> a tree of if's becomes impenetrable

Even in that case, a tree of if's isn't that bad (it's not great), but far worse is when you have the same set of if statements copied and pasted around dozens of places. Because you will forget to update one of them at some point.

thinkharderdev • 2 years ago

Thousands of classes with indirection is not clean code and write code as simply as possible is tautology. Of course it should be as simple as possible. The interesting question is what counts as simple.

Setting that aside though, the author seemed to mostly be talking about architectural simplicity in the article. He specifically called out "premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs" which I think is spot on. Distributed systems are fundamentally hard and involve a lot of difficult tradeoffs. But somehow we have convinced ourselves as a profession that distributed systems are somehow easier.

papito • 2 years ago

The hardest job as a software engineer is to come up with simple and obvious solutions to a hard problem.

Or you can stitch together eight different cloud services and let someone else debug that crap in prod. Not to mention subpar performance and an astronomical cloud bill.

naijaboiler • 2 years ago

It takes a lot of knowledge, experience and smarts to find simple solutions for hard problems

samhw • 2 years ago

> write code as simply as possible is tautology

What? That has nothing whatsoever to do with tautology. It's just a statement you agree with. If everyone else agreed with it, it might at most be a truism or an uninteresting statement, but evidently they do not. (They might claim to, but reality shows they optimise for other things - in my experience the simplest work, which does not always mean the simplest code, especially when you're accustomed to the mystic rituals of the Javanese tribes.)

thinkharderdev • 2 years ago

Fair enough, maybe tautology is the wrong word, but I do think everyone agrees with it. Who ever says "we need to have more complicated code"? The question is how do you define simplicity, because it is not always obvious. Every overly-abstracted mess I've ever seen was done in the name of "simplicity". Basically, let's add an abstraction so we can "simply" swap in another database in the future, or handle X hypothetical use case by only changing configurations. Likewise, I've seen 1500 line methods with dizzying, incomprehensible control flow that was nevertheless composed entirely of "simple" if/then/else statements. And a well chosen abstraction or two made things much simpler to read, understand and modify.

cgdub • 2 years ago

Thousands of classes with indirection is absolutely Clean Code. It's in the book.

nyanpasu64 • 2 years ago

PipeWire is an example of building a Linux audio daemon on "microservices, architectures that relied on distributed computing, and messaging-heavy designs":

- It takes 3 processes to send sound from Firefox to speakers (pipewire-pulse to accept PulseAudio streams from Firefox, pipewire to send audio to speakers, and wireplumber to detect speakers, expose them to pipewire, and route apps to the default audio device).

- pipewire and pipewire-pulse's functionality is solely composed of plugins (SPA) specified in a config file and glued together by an event loop calling functions, which call other functions through dynamic dispatch through C macros (#define spa_...). This makes reading the source less than helpful to understand control flow, and since the source code is essentially undocumented, I've resorted to breakpointing pipewire in gdb to observe its dynamic behavior (oddly I can breakpoint file:line but not the names of static TU-local functions). In fact I've heard you can run both services in a single daemon by merging their config files, though I haven't tried.

- wireplumber's functionality is driven by a Lua interpreter (its design was driven by the complex demands of automotive audio routing, which is overkill on desktops and makes stack traces less than helpful when debugging infinite-loop bugs).

- Apps are identified by numeric ids, and PipeWire (struct pw_map, not to be confused with struct spa_dict) immediately reuses the IDs of closed apps. Until recently rapidly closing and reopening audio streams caused countless race conditions in pipewire, pipewire-pulse, wireplumber, and client apps like plasmashell's "apps playing audio" list. (I'm not actually sure how they resolved this bug, perhaps with a monotonic counter ID alongside reused IDs?)

I feel a good deal of this complexity is incidental (queues pushed to in one function and popped from synchronously on the same thread and event callback, perhaps there's a valid reason or it could be removed in refactoring; me and IDEs are worse at navigating around macro-based dynamic dispatch than C++ virtual functions; perhaps there's a way to get mixed Lua-C stacktraces from wireplumber). I think both the multi-process architecture and ID reuse could've been avoided without losing functionality. Building core functionality using a plugin system rather than a statically traceable main loop may have been part of the intrinsic complexity of building an arbitrarily-extensible audio daemon, but I would prefer a simpler architecture with constrained functionality, either replacing pipewire, or as a more learnable project (closer to jack2) alongside pipewire.

layer8 • 2 years ago

> we have made writing software complex for complexity's sake.

I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore (like a virus killing its host and thus preventing further spread). This, in turn, means that the software industry tends to continually live on the edge of maximum complexity its members can (barely) handle.

ryandrake • 2 years ago

> I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore

I disagree that this is something that "naturally" happens. A lot of this thread is about how adding complexity is either a deliberate choice made by software developers or just that the developer simply was never taught how to do it the simple way--both of which illustrate a gap in software development education. When the tutorial about How To Create a TODO App starts with "Step 1: Install Kubernetes", I'd argue we have an education problem.

layer8 • 2 years ago

I’d argue that the fact these choices are being made is natural (otherwise you’d have to explain what the “unnatural” root causes are), and preventing or counteracting them exactly requires the conscious and continuous effort mentioned.

bacza2 • 2 years ago

The problem is simple means diffrent thing in a small codebase than in a big one. A bunch of if statements in a code that is small enough to understand everything is ok but when it become big it's hard to understand flow of data.

I do favor simple code but some complexity/abstracion is needed to make it easier to understand

emn13 • 2 years ago

But picking the right abstractions that aren't leaky in any of the aspects you really care about is critical, hard to measure (leakiness isn't obvious, nor what kind of aspects you care about), hard to get right, and hard to maintain (because your abstraction may need to evolve, which is extra tricky).

Obviously, getting that right makes subsequent developments much, much easier, but it's hardly a simple route to success.

ItsMonkk • 2 years ago

I see tech debt and simplicity as a mixture between 'tyranny of small decisions' and each individual coders 'cleanliness' level.

Each individual coder has a code cleanliness level, similar to how every friend's Mom growing up would always remark "Sorry the house is a mess", when it was spotless. If your used to 9/10 and it's a 7, that looks like a wreck. If you are used to 5 and it's a 7, that looks great. I urge other coders to increase their cleanliness level, and to look for others with high cleanliness for guidance. If you are coding next to people that 5 looks good to them, no matter how much they try to pay down technical debt, they never will.

I think tyranny is ultimately showing us that the tooling that we currently have is making is much trickier than should be to evolve those abstractions. Partially this is because of bad abstractions that caused bad tooling and bad tooling that caused bad abstractions. Because it's so difficult, we don't do it. We take the small decision and work slightly harder in a slightly buggier environment to get the new thing done. But of course now the problem is bigger which means its even less likely for us to ever actually pay down that debt.

> “I’m sorry I wrote you such a long letter. I didn’t have time to write you a short one.” – Blaise Pascal

commandlinefan • 2 years ago

> there wasn't a concept of clean code but to write code as simply as possible

Sounds good "on paper" - in fact, is tautologically true - but it's hard to find two people who agree on the definition of "simple". You say "not having thousands of classes with indirection", and I've definitely seen that over-design of class hierarchies create an untouchable mess, but I've seen designs in the other direction (one giant main routine with duplicated code instead of reusing code) that were defended as "simple".

llanowarelves • 2 years ago

A lot of complexity comes from premature scaling due to cargo cult or ergonomics.

But I argue a lot of complexity and bugs comes from poor/unclear/conflicting thinking. Especially when it crosses boundaries between multiple developers who had to modify it but didn't truly internalize that part/design of the code.

roflyear • 2 years ago

Bunch of if statements can be described as not simple. Some things in code can only be described as simple. Do those things.

mattbillenstein • 2 years ago

I've seen most of the architectural problems in consulting - it's amazing how a team of clever engineers can take a simple thing and make it sooo convoluted.

Microservices, craptons of cloud-only dependencies, no way to easily create environments, ORMs and tooling that wraps databases creating crazy schemas... The list goes on and on; if you're early, you could do a lot worse than create a sensible monolith in a monorepo that uses just postgres, redis, and nginx and deploys using 'git pull' ...

claytonjy • 2 years ago

The worst architecture I ever saw came from consultants, who built the initial bits of a startup I was hired into. It was nice to have a no-longer-present scapegoat to shake fists at when frustrated, but over time I came to realize their most maddening choices were at the behest of one of our founders, who had no software experience.

teaearlgraycold • 2 years ago

I saw the same thing. Founders asking the world of consultants who would try to deliver and then fail to be a responsible engineer. I started my previous job by telling the founders they were asking for the wrong things and the consultants work needed to be thrown out. Thankfully they listened and we ended up with a TypeScript monorepo monolith deployed to Heroku.

bornfreddy • 2 years ago

Nitpick: no need for Redis if you have Postgres. It can have comparable performance when similar tradeoffs are used.

dalyons • 2 years ago

That’s just not true as a categorical statement. Performance aside redis has all sorts of interesting data types, operations and primitives that pg doesn’t that you might want to leverage. It fulfills a different role

moneywoes • 2 years ago

Can you elaborate? Is postgres viable for caching?

commandlinefan • 2 years ago

> Microservices, craptons of cloud-only dependencies, no way to easily create environments, ORMs and tooling that wraps databases

So, Spring Boot you mean?

noisy_boy • 2 years ago

A Spring Boot service doesn't have to Microservice - you can happily fatten it up into a monolith. Cloud-only dependencies would come into play for Spring cloud (or something that is using cloud specific features) - for a "vanilla" CRUD app, they are not needed. Creating virtual/physical environments is out of Spring Boot's scope and better left to external tools though it has support for separate environments via profiles. ORMs/tooling that wraps database doesn't have to be part of Spring Boot - using Hibernate/JPA isn't mandatory; plain JDBC Template with hand-written SQLs would work fine.

lifeisstillgood • 2 years ago

>>> Business logic flaws were rare, but when we found one they tended to be epically bad.

oh yes ...

I always bang on to my junior staff that their job was known as "analyst programmer" for a reason. The analyst part matters probably even more than the programmer part. In large companies just discovering what needs to be coded is 90% of the job, (the securely coding it in the constraints of the enterprise the other 90% while the final 90% is marketing your solution internally)

Anyway .. yes

watwut • 2 years ago

> In large companies just discovering what needs to be coded is 90% of the job

Yes, but that is quite massive dysfunction of those companies. Meaning, we can yell at analysts-programmers as much as we want, what really needs to be fixed is the process that makes finding out requirements so ridiculously hard.

And yes, I work in one of those companies, it very clearly is dysfunction.

carlmr • 2 years ago

I think this can only change when we, as a society, expect code literacy from every person that finishes high school.

I don't mean expert programmers, but at least being able to read basic pseudocode algorithms.

It's hard to describe a problem if you don't even understand any language.

lifeisstillgood • 2 years ago

oh hell yes. Software literacy in my book (30,000 words still no end in sight) is literally, literacy.

Look I automate almost everything i can see. And where I put effort and focus the software that is a force multiplier for my brain (or a bicycle of the mind if you like).

But so often in a large company or normal life, there is a great gulf that the virtual world cannot - yet - cross. ut more and more we shall.

One thing that's just silly is I take photos on my iphone of bills and letters. I cannot be arsed to navigate the awful dropbox API but I would like to store them under "insurance" or whatever. Fuck having some AI monster read the bill. so I played with Pythonista and can just run an action after a photo - and it gets moved. It's my solution, not an app. And that's software literacy - where you can, write, not on paper, but on the world.

sicp-enjoyer • 2 years ago

The "magic AI" has undone years of coaching management about software expectations.

commandlinefan • 2 years ago

> discovering what needs to be coded is 90% of the job,

But you still have to predict based on a two-sentence description in a JIRA ticket how many "story points" it's going to take with 95% accuracy a dozen times within the span of a single "sprint planning session" every two weeks.

grvdrm • 2 years ago

Oh my god - I hadn’t heard the phrase “story points” in a few weeks and now I will have nightmares tonight!

zeristor • 2 years ago

This goes with doing the first 90% of the work, then the second 90% of the work then the last 90% of the work.

And engineers multiplying their initial estimate by 3, the project manager then multiplying that by 3 and rounding it up to be ten times more than the initial estimate.

carlmr • 2 years ago

>I always bang on to my junior staff that their job was known as "analyst programmer" for a reason.

I can't help but think about Tobias Fünke. Especially with you banging on your junior staff.

OJFord • 2 years ago

I suspect it's a British (perhaps commonwealth) colloquialism - 'to bang on [about something]' is to go on and on and on talking about it, with some implication of 'too much' or obsessiveness.

(Also, notice it's 'bang on to' the staff, not 'bang on' them. That is, the staff are the indirect object; the thing which is being said - banged on about - is the direct object.)

lifeisstillgood • 2 years ago

Yes, I bang on to my staff (talk endlessly to them) rather than bang my staff (have sex with them) ... or another colloquialism, to "bang my staff" which is a solitary activity that frankly you can guess from here.

carlmr • 2 years ago

etblg • 2 years ago

The world's first combined analyst and programmer -- an Analrammer for short.

carlmr • 2 years ago

Nice, I was thinking Analpro, but that's also good!

chiefalchemist • 2 years ago

> ...just discovering what needs to be coded is 90% of the job,...

Absolutely. The tech part is relatively easy. Deciding what to build, that's where the friction and magic happens.

lupire • 2 years ago

Your wording is ambiguous.

Are senior staff also analysts? Why or why not?

huma • 2 years ago

> Generally, the major foot-gun that got a lot of places in trouble was the premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs.

Finally, someone said it

wuliwong • 2 years ago

It is interesting, I've been at a company for a few years now and we've been slowing trying to break apart the two monoliths that most of the revenue generating traffic passes through. I am on board with the move to microservices. It's been a lot of fun but also a crazy amount of work and time and effort has been spent to do this.

I've pondered both sides of this argument. On one hand if this move had been done earlier it might not have been as difficult a multi-year project. On the other hand, when I look at the Rails application in particular, it was coded SO poorly that I if it was just written better, initially, it wouldn't even need to be deconstructed at this point. Also, if the same engineers that wrote that Rails app tried to write a bunch of distributed, even-driven microservices instead of the Rails app, we would probably be in even worse shape. (ᵔ́∀ᵔ̀)

BackBlast • 2 years ago

Have you considered a serious refactor instead of a migration?

I mean, just start with a cleanup session and proceed from there. Work on bit at a time and don't get too far from a working system.

ravedave5 • 2 years ago

Are you me? o_0. Shockingly similar situation.

thatsnotmepls • 2 years ago

You two might be colleagues lol

galdosdi • 2 years ago

Usually a link to a humorous YT video would be inappropriately uninteresting on HN, but this classic and brief satire of microservices is actually quite on point about precisely what is so dangerous about a microservices architecture run amok

https://www.youtube.com/watch?v=y8OnoxKotPQ

Summary: really trivial sounding feature requests like displaying a single new field on a page can become extremely difficult to implement and worse, hard to explain to stakeholders why.

suzzer99 • 2 years ago

This was 100% true for that startup I worked for as a side job. They would have been so much better off just building a standard java, PHP or .NET back end and calling it a day.

The head engineer (who had known the guy funding the thing since childhood) had no clue how node, stateless architecture, or asynchronous code worked. He had somehow figured out how to get access to one particular worker of a node instance, through some internal ID or something, and used that to make stateful exchanges where one worker in some microservice was pinned to another. Which goes against everything node and microservices is about.

I tried to talk some sense into them but they didn’t want to hear it. So for the last six months I just did my part and drained the big guy’s money like everyone else. I hate doing that - way more stressful than working your ass off.

7speter • 2 years ago

Its kind of discouraging to see the part where he says almost no one gets web tokens right the first time. Working on projects as someone entering the industry, its pretty clear that security is the most important part of a web app, and its so seemingly easy to get woefully wrong, especially if you’re learning this stuff on your own and trying to build working crud projects

photon12 • 2 years ago

It's a chicken egg problem. Developers use JWTs because it's what they think they know. Companies build libraries to support what developers are using. Security engineers say JWTs are easy to screw up [1]. Newer frameworks offer ways to move off of JWTs. New programming language comes out. New frameworks built for that programming language. What is someone most likely to build first as an integration? What developers are using. JWTs become defacto for a new framework. Security engineers report the same bugs they've seen. Even more languages and frameworks come out. Rinse. Lather. Repeat. Write up the same OAuth bug for the 15th time.

[1] http://cryto.net/~joepie91/blog/2016/06/19/stop-using-jwt-fo...

Edit: I was actually writing this code tonight myself for a project instead of it already being baked into the platform framework because SSO is only available as an "enterprise" feature and it's $150 a month for static shared password authentication. So market forces incentivize diverging standards.

throwaway2037 • 2 years ago

That flow chart in the shared link is very funny! Just this year, I was forced to migrate to a new internal authentication framework that... drumroll... uses JWTs for session management. Google tells me that it was already discussed on HN here: https://news.ycombinator.com/item?id=18353874

doublerebel • 2 years ago

JWTs solve problems about statelessness. Most companies don’t have these problems and are better off with stateful basic auth tokens/cookies that are widely understood and supported and can be easily revoked.

Also, signed and/or encrypted communication is usually easier to implement without involving JWTs.

Best thing to do in security is to not roll your own and instead use trusted libraries that have industry-reviewed sane defaults. One way to check: look at the issues and PRs in the public repo and see if security-focused issues are promptly addressed, especially including keeping docs up-to-date. Security professionals are pedantic (for good reason).

samhw • 2 years ago

Asymmetric cryptography solves problems of statelessness: i.e. encrypt your sensitive|read-only data with your public key, decrypt it with your private key, beep boop, you can now use your client as a database. JWTs are a whole other unnecessary lasagne of complexity – not good complexity but random complexity, like the human appendix – which invites bugs and adds nothing above the former in most implementations. (Hell, my current company generates JWTs and then uses them as plain old 'random' keys to look up the session data in a database. It's hilarious but also awful.)

patrakov • 2 years ago

Well, asymmetric cryptography is not even needed in the most common case, i.e. when you are using the client as a database. Symmetric crypto is enough, because it's your server that both encrypts/signs and decrypts/verifies. Asymmetric crypto may be strictly needed only if the sender and the recipient are different. And there is still an issue that the malicious client can return old and outdated but validly signed data - which you can't solve without either a server-side database or accepting old data up to a certain limit.

samhw • 2 years ago

Yeah, that's true, actually. As best I recall, I just meant that that is what people use JWT for regardless, and I wanted to convey that the only part doing the useful work there is the 'asymmetric crypto' part. I didn't want to get into the territory of providing alternative suggestions, only breaking down what is useful about JWTs when used for that purpose.

As for old and outdated data, I should think that's easily solved by having a 'created' and 'modified' stamp in the encrypted data, much like you have on an inode.

ravenstine • 2 years ago

> Most companies don’t have these problems

Can anyone cite a single real world example of a fully stateless system being run for the purpose of business? I ask this every time JWTs come up and no one can answer it.

As soon as you tap the database on a request for any reason, whether it's for authorization or anything else, you might as well kiss JWTs goodbye.

Then again, just don't use them anyway, because they have no benefit. Zero. Disagree? Prove it. I'm sure there's some infinitesimally small benefit if you could measure it, but the reality is that JWTs are saving you from an operation that databases and computers themselves are designed to be extremely good at.

Don't use JWTs. They're academic flim-flam being used to sell services like Auth0.

nijave • 2 years ago

They can be helpful if you have services that need to call other services on behalf of a user request.

For instance, user A calls Product service for Product information but that response also includes Recommended Products and Advertisements from those two services. Product service can pass the JWT from the client to Recommended Products and Advertisements which removes the need to establish trust between those internal services (since authentication and authorization info are just passed around from what the client provided).

You can also use them in federated auth schemes where the issuing system is separate from the recipient. I think the use cases are pretty similar to SAML for this type of system but with a smaller "auth token" size.

Just because you're accessing a database on a request doesn't mean you're accessing the database that stores the authorization and authentication info.

patrakov • 2 years ago

The problematic word is "THE" database. The subsystem that you hit can be not stateless, but can use a separate database that doesn't contain authentication data.

Nihilartikel • 2 years ago

I can only provide verification of the counter example.

Having worked on some VERY large web services, the session was tracked on the back end and instantly and trivially revocable.

ryanbrunner • 2 years ago

It's nuts to me that so many companies have moved off cookies for web app auth state. They're simple, they're well supported, they require very little work on the browser side, and the abstractions around them on the server side are basically bulletproof at this point.

I see all this talk about authentication, and it's just literally never been a problem or concern for my company.

treis • 2 years ago

Aren't JWTs just fancy cookies?

freeqaz • 2 years ago

JWTs are frequently stored in LocalStorage which means that any XSS is able to leak the JWT.

Cookies, on the other hand, can be configured to be HTTP-Only and inaccessible to JavaScript on the page. That prevents somebody with XSS from leaking the value without a second server-side vulnerability or weakness.

In addition, JWTs are impossible to revoke without revoking _all_ sessions. This is the biggest weakness, imo, and the reason that they shouldn't be used client-side.

I'm a huge fan of the approach the Ory is taking with Oathkeeper and Kratos: https://www.ory.sh/docs/kratos/guides/zero-trust-iap-proxy-i...

nijave • 2 years ago

ryanbrunner • 2 years ago

Sure, but browsers have done a lot of work to make cookies far more convenient (they're automatically sent with requests, you have browser APIs to work with them), and secure (Secure, HttpOnly, SameSite, etc.)

nijave • 2 years ago

Cookies are a storage and transport mechanism and JWTs are signed JSON blobs. You could put a JWT inside a cookie.

stragio • 2 years ago

Why not look into an open source auth solution such as supertokens? It's almost free and you can self-host. That way you implement your own auth system but the security issues are mostly dealt by them.

danjc • 2 years ago

Yesterday I was working on updating code that implements Microsoft Open ID Connect (produces a JWT).

Their documentation [1] is exceptional - all the gotchas and reasons for practices are clearly explained and there’s a first class library to handle token validation for you. I even ended up using the same library to validate tokens from Google.

Perhaps not all vendors produce equally well written documentation but I think it’s a lot easier to get it right today than it was 5 years ago.

1. https://docs.microsoft.com/en-us/azure/active-directory/deve...

yourapostasy • 2 years ago

That's usually because security is a bolt-on instead of bake-in within the control and data structures themselves. Too many people interpret "Make It Work Make It Right Make It Fast" to mean security is implemented at the "Make It Right" stage, when it should be at the "Make It Work" stage. That's if they're the lucky ones who get security designed in from the beginning into the architecture.

We're paying for the sins of that in Unix these days, the kernel attack surface is in-feasibly large to remediate to correctness anytime soon (if ever?).

makeitdouble • 2 years ago

I think there is still more to it that just not taking it seriously or planning for it.

JWT in particular has the weird quirks you need to know to prevent encryption swapping attacks, and I'm sure there's more traps I myself am not aware of. At this point I think security can be seen on the same plan as legal: assuming a random dev will be able to plan and navigate out all the issues by sheer common sense hasn't been a viable approach for long now.

yourapostasy • 2 years ago

> At this point I think security can be seen on the same plan as legal:...

Considering how Uber ignored legal ramifications of ride sharing intersecting with incumbent regulations until they were dragged into courts, that paints a potentially rather grim picture of the equivalent in software security. But your gist sounds more along the lines of, "include the experts along at the beginning of the ride".

When I said security as a "bolt-on", I should have been more clear. Most of the time when I see it happening, it has been at the behest of the business stakeholders overriding the earnest developers trying to include the security teams from the beginning, but waved off with "it can be added later".

The business stakeholders see in their real life housing contractors walk into finished houses, attach some doodads, pop in some batteries to wireless sensors and the central base station, and ta da!, they "have security"! And think, "just how hard can it be to do the same in software?", dismissing what their tech leads try to tell them.

There is a large element of the principal-agent problem here as well. Shiny proofs of concepts and shallow implementations get immediate bonuses and promotions. Taking 1.1-2.0X as long to implement the right way, the result of which is no drama and no discernible difference to the casual business user, get no or even negative recognition. The incentives structure the choices. There are no incentives that structure payouts over the long-haul tying back to original historical choices, with an increasing gradient of the payout the longer the original choices prove sound. Naturally, since measuring that accurately would be impossible.

The closest I've come to an analogy that works in these discussions but not as often as I'd like is this. I don't throw together four tilt-walls, top off with a roof, move in with a 20-ton safe, open the doors for business and call it a regional bank depository. There are bedrock anchors, sensors, inner reinforced concrete walls, SOP's, audits, man traps, insurance reviews, and on and on, that get designed in before the foundation is even poured.

Clients who didn't find this convincing wave it off with a, "haha, this isn't that important lol". I want a better analogy.

makeitdouble • 2 years ago

That's an interesting angle. Uber ignoring legal ramifications had wildly different effects depending on the countries, some completely shutting out Uber as a result, and more lax places accepted dealing with the consequences that surfaced one after the other.

I'm in a country from the former block, and see a bunch of naive projects pitched by the business side that gets shut down pretty fast by the legal team as nightmares in the making (e.g. stuff that boils down to "shouldn't it be easier to take money from a variety of sources and move it to other users ?") that would sink the whole company when shit hits the fan.

My hopes would be on more security issues slowly becoming legal issues (not unlike GDPR, breach disclosure duty and associated penalties etc.) but I can understand how dire it feels in countries where legal grounds were shaky in the first place.

sboomer • 2 years ago

That's how a software implementation by a newbie works. You can't expect a newbie to take security into account before the software is implemented. Instead, there should be a custom to rectify all the security errors in the end before the software is pushed to the server.

josephg • 2 years ago

That’s an almost impossible task. Code gets immensely more expensive to understand or modify based on its age. If you don’t bother thinking about security until the 11th hour, it’s too late. Things will slip through.

zer01 • 2 years ago

This is an interesting write up!

The only question I have is around your point on monorepos - every monorepo I’ve seen has been a discoverability nightmare with bespoke configurations and archaic incantations (and sometimes installing Java!) necessary to even figure out what plugs in to what.

How do you reason about a mono repo with new eyeballs? Do you get read in on things from an existing engineer? I struggle to understand how they’d make the job of auditing a software stack easier, except for maybe 3rd party dependency versions if they’re pulled in and shared.

pianoben • 2 years ago

Monorepos do require upkeep beyond that of single-product repositories. You need some form of discipline for how code is organized (is it by Java package? by product? etc). You need to decide how ownership works. You need to decide on (and implement) a common way to set up local environments. Crucially, you need to reevaluate all these decisions periodically and make changes.

On the other hand... this is all work you'd have to do anyways with multiple repositories. In the multi-repo scenario, it's even tougher to coordinate the dev environment, ownership, and organization principles - but the work isn't immediately obvious on checkout, so people don't always consider it.

Regarding auditing, I have always found that having all the code in one place is tremendously useful in terms of discoverability! Want to know where that class comes from? Guaranteed if it's not third-party, you know where it is.

Not to minimize the pain of poorly-managed monorepos - it's not a one-size-fits-all solution, and can definitely go sideways if left untended.

yardstick • 2 years ago

Probably because:

1) It's easy to miss a repo, if you don't have a list of them all somewhere.

2) It's easy to get out of sync with what version of your software corresponds to what branch/tag in each repo.

yourapostasy • 2 years ago

> 2) It's easy to get out of sync with what version of software corresponds to what branch/tag in each repo.

I'd like to hear how others solve this. The way I've addressed this is I bake into the build pipeline some way to dump to a text file all the version control metadata I could ever want to re-build the software from scratch. Then this text file is further embedded into the software primary executable itself, in some platform-compatible manner. Then I make sure the support team has the tooling to identify it in a trivial manner, whether a token-auth curl call to retrieve it over a REST API, or what have you. This goes well beyond the version number the users see, and supports detailed per-client patching information for temporary client-specific branches until they can be merged back into main without exposing those hairy details into the version number.

While this works for me and no support teams have come to me yet with problems using this approach, it strikes me as inelegant and I'm for some reason dissatisfied with "it ain't broke so don't fix it".

yardstick • 2 years ago

In our case we abandoned individual repos and went back to a monorepo to solve this issue. In theory the separation of code was nice, but in practice it was a real pain when a service added new APIs you wanted to update another service to use it.

All of our services do also print out in their startup logs what version they are based on git branch name and commit. Monorepo or not this was useful.

treis • 2 years ago

We have a releases repo that takes in the git version SHA for each application and handles deploys. It's... ok I guess. Just another example of complexity to meet the growing complexity of the system.

marcosdumay • 2 years ago

> 2) It's easy to get out of sync with what version of your software corresponds to what branch/tag in each repo.

That's what the `[dependencies] my-lib = "1.0"` was supposed to solve.

ge96 • 2 years ago

The thing I'm working on has 5 main repos that all run (yarn start) for the app to be fully functional.

I need to put that down somewhere the order/matching branches.

cerved • 2 years ago

  find / -type d -name .git

yardstick • 2 years ago

As an auditor you don't have anything checked out locally yet, so no .git will exist. If you ask an individual developer or randomly picked developers, they will only have their specific repos checked out. If you look at the server hosting the repos then yes you may get them all. Assuming they are all on one server...

gusbremm • 2 years ago

Once I worked on a team that none of the engineers knew that jwt payload was readable on the frontend. They were in shock when I extracted the payload and started asking questions about the data structure.

lmc • 2 years ago

It's kinda baffling that JWTs are unencrypted by default, to be fair.

bpicolo • 2 years ago

It's the whole point - they're signed, not encrypted.

You should use opaque tokens instead if you don't want the frontend or other services that have access to the token to read it.

lmc • 2 years ago

In many cases, the front end doesn't need to read the JWT, just pass it on to some API.

An encrypted JWT is still convenient as it can be decrypted and deserialized into a common data structure using existing libraries.

bpicolo • 2 years ago

One benefit of JWT as specced is that those APIs you pass it on to don't need to share an encryption key, which makes rolling the key without causing downtime impractical. With OIDC, for example, frequent key rotation helps you create a better security posture.

The benefit of signing versus encryption is many services are able to verify the authenticity without needing a shared secret. That includes untrusted services, which is frequently the case with OAuth 2.

You can encrypt a JWT token, but at that point it's not semantically a JWT anymore. It can be any JSON at all and doesn't need to match the JWT structure. The first and last parts of a JWT are a signing algorithm and signature, respectively.

mosdave • 2 years ago

I, for one, enjoy not needing to coordinate an encryption key between my service and my IdP.

lmc • 2 years ago

I also enjoy not worrying about how the next field I add to my JWT can be exploited after a base64 decode :-)

bornfreddy • 2 years ago

How else could frontend read them? If you don't need this then regular cookies are better.

lmc • 2 years ago

It's the other way round - the front-end shouldn't need to read JWTs, just pass them on.

mosdave • 2 years ago

if your frontend is interrogating the jwt you're doing it wrong

nijave • 2 years ago

samhw • 2 years ago

I mean, I'd be rather surprised too. What were you using JWTs for, if not asymmetric crypto? Presumably you weren't using it to sign the tokens, if they were surprised the client could access them? And I can't see many contexts where you would use it with a shared secret, where just sending JSON over HTTPS wouldn't suffice. (I'm assuming 'frontend' here denotes a client on the other side of the trust boundary.)

FlorianRappl • 2 years ago

I'm not getting your comment. The payload is not encrypted. I think you refer to the signature. The payload can always be decoded. It's just JSON into base64.

samhw • 2 years ago

Ah, sorry, that was what I was referring to when I said "Presumably you weren't using it to sign the tokens, if they were surprised the client could access them?". I classed that as too obvious for it to be what you meant.

chrisandchris • 2 years ago

For SSO? The biggest advantage (besides being stateless) about a JWT is that it is signed with an asymetric key and the client can validate the authenticity of the content. You can encrypt the content of the token, but that does not make to much sense (because the client anyway needs to decrypt it).

lstamour • 2 years ago

> For example because it’s so fast, [MD5 is] often used in automated testing to quickly generate a whole lot of sudo-random GUIDs.

Actually, it’s because programmers are lazy. GUIDs or UUIDs are 128-bits and MD5 produces 128-bits. A string like “not-valid” is not a valid UUID, but MD5(“not-valid”) is both possible to format like a UUID when output as hex (with dashes) but also self-descriptive - so you can name the token when generating it in a fixture function and know how to regenerate it later in a test, for example.

All the normal ways of generating UUIDs, including v6 and v7, are about trying to make them unique and collision resistant. But that’s nonsense when you want deterministic, reproducible tests. Hard-coding 32 characters is too much work, ain’t nobody got time for that. Magic numbers? Pfft. Just MD5 and write your own text…

Pro tip: have data model creator helper functions include a counter that resets every test (every time the database resets) and then assign a UUID like MD5(`InsertTableName-${counter}`) that way you have a unique ID that’s also easy to predict/regenerate.

That said… I’ve always personally preferred simple database IDs to be generally preferable over using UUIDs. It’s easier to understand THING 20 as an ID than 32-odd characters. But UUIDs are an industry standard, so they end up in your test code everywhere anyway…

jve • 2 years ago

> I’ve always personally preferred simple database IDs to be generally preferable over using UUIDs.

Unless you start migrating data between environments and want references to be alive.

Anyway, if you need a hardcoded GUID for tests or what, paste this into PowerShell: [Guid]::NewGuid()

Not arguing, just developing for a system that uses guids as primary IDs and writing tests for that system. I don't even need to hardcode GUID, as within test bootstrap I'm creating objects with generated IDs I can reference later for comparison.

lstamour • 2 years ago

I’ve done that before too - but it’s always possible if you run tests often enough that you’ll get an ID collision that randomly fails a test and causes a developer some grief. Easier to not use random sources of data as a rule of thumb within your unit tests.

contingencies • 2 years ago

Re. Security, predictable identifiers are often a vulnerability. Hence, don't present database IDs in public (ie. anywhere). Instead, generate unique non-predictable identifiers at creation time, and use a UNIQUE constraint (or similar). https://cwe.mitre.org/data/definitions/340.html

lstamour • 2 years ago

It’s true that in production, if it’s a security risk that IDs can be guessed, don’t make them predictable. But by that same logic you would have to stop using REST because it can let you guess an ID?

This advice is classified as varies by context because it doesn’t always apply. In test cases, predictable behaviour is better than randomness. There are exceptions, of course. Chaos monkey, fuzzing, and literally testing algorithms for uniform randomness, etc.

That said, you could get the best of both worlds if you used MD5 HMAC to create a UUID from a predictable number and a secret preventing guessing. If that’s your goal…

Of course, the secret could be trivially reverse engineered with MD5 if someone knew the ID number and algorithm to generate it, but I’m not sure we have the patience or need to use PBKDF2 or similar to create predictable, unguessable ID numbers… after all, it would be just as easy to use regular guessable numbers and put strong authentication so it doesn’t matter if you guess correctly.

contingencies • 2 years ago

Clean separation of concerns is good architectural practice. Whilst you are of course correct that you can potentially rely on mitigations (eg. authenticated APIs) if those subsystems change in future you have an emergent scenario producing undocumented vulnerabilities. Security people call this 'defense in depth' - ie. make sure you cover your ass religiously, all the time.

lifeisstillgood • 2 years ago

At what point is there something beyond a framework - a SaaS in a box perhaps, that just avoids many of these basic problems (oh and the HR, legal, etc problems) of starting a startup. Startups are not snowflakes apart from that one little core competency. In short most serial founders say the second one was easier, simply because they followed the template ground out in the first.

Would it be easier to start with that template?

czue • 2 years ago

There are a ton of products like this out there that build on popular frameworks:

Saas Pegasus (https://www.saaspegasus.com/) for Python/Django, Bullet Train (https://bullettrain.co/) and JumpStart (https://jumpstartrails.com/) for Rails, Spark (https://spark.laravel.com/) for Laravel, Gravity (https://usegravity.app/) for JS

You can find an even bigger list here: https://github.com/smirnov-am/awesome-saas-boilerplates though those are the market leaders (I make one of them and follow things closely)

gunnr15 • 2 years ago

There are some good open source options like https://getzero.dev/

scottharveyco • 2 years ago

In the Rails world we have https://bullettrain.co and https://jumpstartrails.com which both have open source templates for building SaaS services.

MathCodeLove • 2 years ago

I've seen boilerplate applications for <insert tech stack> but the open-source ones tend not to be great, and the closed source ones could be great - but I'm not willing to pay $XXX for code I haven't seen.

alfonsodev • 2 years ago

all that exists as SAAS products that target non-technical cofounders, and it is very hard to justify to co-founders, investors, advisors .. any investment in time in something that is not your core problem, and I think is for a good reason.

criddell • 2 years ago

Does learnings ever mean anything different than lessons? How did this enter corporate-speak?

quesera • 2 years ago

It pains my prescriptivist instincts to say so, but FWIW I do interpret them differently, frequently as the complementary sides of a single event:

A learning is a successfully learned thing. Or a received lesson.

A lesson is a taught thing. When effective, this would be one path to a learning for the receiver.

criddell • 2 years ago

I suppose that makes sense. Personally, I would write lessons learned rather than learnings in part to get rid of the red squiggle. My dictionaries flag "learnings" as a typo.

nescioquid • 2 years ago

"Lesson" also bears this meaning of something learned, though that would make "learnings" more precise, and therefore distinct.

My experience seems to be that people who use "learnings" are referring to the lessons learned by others, usually subordinates and is used instead of "lesson" because of -- being sensitive to how harsh it sounds to say "group X learned several lessons".

phren0logy • 2 years ago

Ugh, seriously. Like utilize instead of use.

quesera • 2 years ago

In earlier days, I thought it'd be fun to have two versions of my resume. They had parallel content, but one was fluffed up in corporate-speak, and the other was human English.

I included links, e.g. resume-fluffy.html or resume-direct.html, and (somewhat seriously) suggested that hiring managers read the first and tech evaluators the second.

It made for some light humor in discussions with hiring groups. And also some effectively-paralyzed recruiters, which added to the fun of the former.

vemv • 2 years ago

Learnings can be more easily interpreted as "something I learned', while lessons can come across as 'lessons for you'.

fluctor • 2 years ago

I came here for this comment.

_pete_ • 2 years ago

it's almost - almost - as bad as 'vinyls'

alexfoo • 2 years ago

Nit: "...or example because it’s so fast, it’s often used in automated testing to quickly generate a whole lot of sudo-random GUIDs."

ITYM: "pseudo-random"

Although I do like the mash-up concept of "sudo random"-ness.

mtVessel • 2 years ago

It's higher-privileged randomness. As in, all GUIDs are random, but some are more random than others.

neilv • 2 years ago

> All the really bad security vulnerabilities were obvious.

All the really bad security vulnerabilities that were found were obvious?

One is more likely to find things that are obvious?

lbriner • 2 years ago

But the auditers were experts and used all the latest and greatest tools. I think they are implying that if they couldn't find it with code inspection then a hacker wouldn't find it by probing.

Of course, they might not find zero-days but most hackers wouldn't find those either.

blenderdt • 2 years ago

When a team is so focused on the todo list they sometimes forget the obvious mistakes they still needed to fix.

itsdrewmiller • 2 years ago

Yeah this was a great article overall but that stood out as sus. Also the “last few hours found the most stuff”. Seems like they could probably stop the audit once they found enough problems, which skewed hard to easy to find and or last time to look.

mpcannabrava • 2 years ago

Although I strongly agree with it in principle, I'm growing seriously tired of the "simpler is better" argument. It hides all the nuance, hard work and, guess what, complexity, that goes into making something simple.

Simplicity is different to each person. What seems like unnecessary abstractions with complex inner workings often exist to actually hide other complexity away.

Know the in and outs of Kubernetes? Maybe it's easier (simpler) for you than directly provisioning different pieces of infra.

Have a team of over 10 [1] working on the same monolithic codebase? Productivity while maintaining sane separation of concerns might increase going for a more domain-service-oriented architecture [2].

How can we teach what simplicity is instead of just calling it better or saying arrogant platitudes like KISS?

[1] yes, the number is that low, and often lower [2] yes, "micro" services does seem like a mistake in most cases

lordofmoria • 2 years ago

(op here) I actually completely agree - you're right: "simple outperformed smart" doesn't point to a useful, nuanced solution. I wrote more in-depth here about slightly-more-specifically where there are problems, curious your thoughts, feel free to DM me or comment on the blog (this thread is kinda dead)! https://kenkantzer.com/5-software-engineering-foot-guns/.

idunnoman • 2 years ago

As developers, I've come to believe that complexity is the worst sin we commit. Everything we talk about can be traced back to this issue.

This is largely due to paying attention to Rich Hickey and learning Clojure.

https://www.youtube.com/watch?v=SxdOUGdseq4

lars512 • 2 years ago

Ah, yes, I’ve felt the pain of an unnecessary microservices migration. It ate time for years and the core was still a mess

anticristi • 2 years ago

I think people really exaggerated with the microservices trend. Today, I recommend to keep code in the same executable unless there is a good reason not to. Good reasons include:

- Stateful vs stateless: databases and message queues should be your first (hopefully off-the-shelf) "microservices".

- Different lifecycles: API serving vs background task

- Different security needs: Frontoffice vs Backoffice code

- Different teams: But make sure to introduce a clear customer-vendor relationship.

yardstick • 2 years ago

> Custom fuzzing was surprisingly effective. A couple years into our code auditing, I started requiring all our code audits to include making a custom fuzzers to test product APIs, authentication, etc.

Any recommendations for a good fuzzing tool for testing both web-based APIs and language specific APIs (C and Java in my case)?

asicsp • 2 years ago

These might help:

* https://github.com/Endava/cats

* https://github.com/google/oss-fuzz

deptm • 2 years ago

Paid but integrates with CI/CD - https://www.code-intelligence.com/

agumonkey • 2 years ago

I'm thinking about introducing fuzzing too. And property based testing. Manual testing only is too limited.

jrochkind1 • 2 years ago

> Surprisingly, sometimes the most impressive products with the broadest scope of features were built by the smaller teams.

Would probably not be surprising to Fred Brooks author of the _The Mythical Man-Month_, but as much as we think that book is famous/impactful, it still surprises us!

arethuza • 2 years ago

"All the really bad security vulnerabilities were obvious."

I used to work for a company that did a lot of acquisitions and I often involved in working with teams at newly acquired companies - although it wasn't my main focus I did used to ask some simple security questions and it was remarkable what these uncovered. I literally had people run from the meeting to fix services after I had asked a simple question....

spuz • 2 years ago

Can you give some examples of some simple questions you would ask?

arethuza • 2 years ago

By the nature of that particular domain a lot of systems delivered important documents (often containing data of rather extreme commercial sensitivity) to customer organisations.

A standard question I always asked was "given a URL that links to a document how do you authorise access" i.e. what happens if someone who is logged in to the site in question gets a link to a document and passes it to a friend via instant messaging.

davedx • 2 years ago

Ha recognisable. A very annoying problem to solve with web tech too - there’s no perfect solution to this problem (that I know of).

MattPalmer1086 • 2 years ago

Interesting that he feels the default state of software security has improved a lot in the last few years.

Anecdotally I'd also agree with that. Certainly better defaults and more secure libraries is a major factor. I haven't noticed a huge increase in developer security awareness, although I'd say it is also better than 10 years ago.

anticristi • 2 years ago

Unfortunately, I get the feeling that that is compensated by increasing risk. Attackers have found clever ways to monetize their work beyond just "fun". Hence, I feel the overall "security damage" has kind of stayed constant.

MattPalmer1086 • 2 years ago

For sure, the threat level hasn't dropped. What is different is that attackers have to use different techniques, since the software isn't as easily exploitable as it used to be. Ten years ago, any pen test of a web application revealed loads of vulnerabilities. These days I rarely find anything really significant (although maybe I work at better places!).

This is not to say that software isn't exploitable any more, only that the cost has been raised sufficiently to make cheaper attacks more attractive (e.g. phishing).

JoeNr76 • 2 years ago

yes. When I get called in as a senior consultant for some business app, it's always for the same reason: development speed has crawled to an almost stop. And it is always caused by unnecessary complexity.

I blame the fact that design patterns and specific architectures are being taught to people who don't understand the problem those things are trying to solve and just apply them everywhere.

Any senior dev or architect should always live by this maxim: make it as simple as possible.

Tomis02 • 2 years ago

A lot of unnecessary complexity comes from the use of library-like objects instead of plain functions + data.

A recurring theme is "refactoring" specific functionality away into a generic object, and the consequence is a disconnect between the problem you are solving and the problem the object is solving. I often see objects that handle every possible input, ignoring that the business is only concerned with a small subset of inputs. You end up with a lot of "if impossible_condition_if_you_actually_look_at_your_data { /*some_dead_code*/ }".

Another side-effect can be similar/identical input validation done at different levels of the stack. If you have object A calling object B calling object C, you sometimes notice how each one of those does the same exact thing in isolation of the others. You end up with a lot of extra checks and error handling because developers insist on writing their code in complete isolation from the context, pretending they don't know how it will be used.

Of course, everything I described can also be "achieved" with plain functions + data, but (anecdotally) they usually produce better results, perhaps because it helps the devs not think in terms of objects.

davedx • 2 years ago

Great article.

To expand a little on why “Keep It Simple” is so powerful: less code = less bugs and less security issues. Less code = easier to change.

anticristi • 2 years ago

Also cleverness is overrated. You might be able to be clever once, but mid-term you will struggle to keep up with "collective cleverness". Sure today you might implement a better authentication code than the one offered in your favorite framework, but will you keep up with the new cleverness that will pour into the framework tomorrow?

muglug • 2 years ago

I don't think less code = less security issues. Often using those secure-by-default frameworks require more code.

The simplest example in PHP (highlighted in the article for its default-insecurity):

    echo '<h1>Hello ' . $_GET['name'] . '</h1>';

is vulnerable to XSS.

    echo '<h1>Hello ' . htmlentities($_GET['name']) . '</h1>';

is not vulnerable

TheCondor • 2 years ago

I’ll go much further…

I think it creates severe cultural problems. It creates the belief that problems are more difficult than they might be, it creates the belief that a particular solution may be more valuable than it actually is, and then it biases future team expansion and retention. Perhaps more ultimately, if the complexity creeps in before the real challenge gets do, it radically affects the team’s ability to reason about it.

UK-Al05 • 2 years ago

Less code and simple are not often related.

r2sk5t • 2 years ago

Thanks for writing this down @Ken. You're another example that learning the failure modes is the main benefit of being a consultant for many clients. Since I'm sure you began each audit meeting with the CTO/VPE and possibly others like senior devs/architects, how much of what you ended up finding in the audits was predictable based on those meetings? (I'm guessing almost everything).

My follow-up question is that once you heard about their snazzy microservices architecture, were you ever surprised by it being a good decision based on the product type and how well it was engineered?

lordofmoria • 2 years ago

Honestly, early on in our code auditing days, there were surprises - a lot of the more meta-lessons in here fomented in the last few years, looking back, and would NOT have been something I’d have thought early on.

On the other hand, regarding micro-services question: no, not even one surprised us positively. Now keep in mind, we didn’t audit absolutely massive FANG companies where mice services are probably necessary for org reasons(though a few unicorns/near-unicorns).

r2sk5t • 2 years ago

Tangentially, I'm also guessing you can learn a lot by asking if they have an API for partners/customers, and if their application developers use the API internally, and then by looking at the API to see how well it is architected. When we integrate with 3rd party systems it's pretty easy to detect the well engineered systems from the ones built with baling wire and duct tape.

maupin • 2 years ago

I've been a part of 3 startups, 2 of which failed and are no longer around. What they all could have benefited from was a business audit.

peppermircat • 2 years ago

Interesting to see the JWT issue. I have recently found a vulnerability in a publicly traded CRM SaaS that was also about JWT claims validation. It’s also quite amazing that popular Auth SaaS rely so heavily on JWTs with 1 hour expiry times, making it impossible to log users as you can’t invalidate the token for the next hour.

lbriner • 2 years ago

I think this causes so much confusion but it really shouldn't. A bearer token means just that, if you have this token (JWT or otherwise) then it proves you have access to something period. Unlike opaque tokens, JWTs have a built-in expiry mechanism so they can be used for time-limited operations, which is why people use them for authentication.

Yes, if you issue a long-lived token, you cannot normally revoke that after-the-fact but that is the point of the token, to avoid multiple lookups to an auth service for every single API access. In a distributed/scaled/microservices architecture, this would be unmanageable.

Now people often proffer some kind of backend system to try and maintain expired lists etc. but what is the problem you are trying to solve that couldn't be mitigated with a reasonably short-lived JWT like 1-2 minutes? Issuing a new one every 2 minutes while the user needs to do something is relatively painless compared to, perhaps 100+ calls to APIs each needing an auth call in the same time.

When you logout, the tokens should be deleted by your system. If someone copies the token before it is deleted, then they had access to the system anyway so that doesn't present a risk imho. If they gave the token to someone else, they are delegating their access so they lose out.

All of that said, if you do not have a heavily API-based system, it might be easier to just use creds that need checking with each call and do it the traditional way.

anticristi • 2 years ago

Cool write-up of centralized vs decentralized access control!

dodgerdan • 2 years ago

https://www.howmanydayssinceajwtalgnonevuln.com/

blenderdt • 2 years ago

"...making it impossible to log users as you can’t invalidate the token for the next hour."

I have no idea what you are talking about here, can you explain this?

I work with systems that have a minute expire time. The only issue is that the clocks on all clients should be in sync with the auth server.

habosa • 2 years ago

I believe they are referring to the fact that most JWT-based auth systems use one-hour token expiry and have no ability to remotely revoke tokens. You can only revoke the user's ability to get the next token. This often leaves a one hour window between when you want the user locked out of your system and when they are practically logged out.

The only way I know of to implement instant revocation in a system like this is to keep a blocklist of users/tokens that is constantly checked, which can be slow and removes some of the benefits of JWTs in the first place (that they carry all the auth information you need).

blenderdt • 2 years ago

Ah! Yes this is why we use an expiration of one minute. For us the extra load that the refreshes give is not a problem.

Keeping a blocklist seems unnecessary to me, you can just lower the expiration time.

lpolovets • 2 years ago

My hunch is that this is related to the nature of product-market fit. If a company is very successful, there's a decent shot that market demand became overwhelming at some point early on. That demand, in turn, becomes a strong motivator to keep things simple and ship quickly, instead of writing code The Right Way.

Facebook using PHP might be one of the best examples of this: if their user base didn't explode, maybe they would've taken the time to carefully rewrite their code in Java or Python. But the fact that they would've had time to do that would've made it less likely that they'd become $500b company today.

debacle • 2 years ago

You're inverting the relationship. Simple solutions (technologically) can approach product/market fit much faster.

Businesses fail for reasons besides tech, but on the tech side when businesses fail (in my experience), it's usually either from unwillingness to serve the sales cycle, or creating a technological solution that is not malleable.

jillesvangurp • 2 years ago

Guilty as charged. I bootstrapped a our startup with just myself and 2 junior engineers in the last year during Covid. Junior in the sense that they are young. But actually they outperform many older engineers I've worked with. We are starting to close some pretty big deals and I'm dreading the moment where I have to turn this into a normal development team. In my experience velocity drops when you do that and you lose a lot of momentum. 3 people can do a lot. 6 people don't do that much more. I'm not so young myself and quite experienced. But I lean heavily on my team for doing the work. I'm the CTO, the CPO, and I need to worry about team management, sales, and a few other things. So, less than half of my time is spent coding. This it the reality of startups. You have to do all of it.

I made some technology choices early on. We use docker but not Kubernetes. There is one server, it's a monolith. There is one language, it is Kotlin. And we even use it on our frontend (web only).

The latter is not something I would do normally or recommend. But both my junior engineers only knew Kotlin and we just went with it and never actually ended up regretting this. This surprised me and at this point I don't feel React/typescript have anything that I need or want. We're doing a lot of asynchronous stuff, websockets, maps (via libremaps), etc. And it all runs smoothly and responsively. Kotlin is kind of awesome for this actually.

Originally our frontend was Android only. We ditched that app in favor of a Kotlin-js based web app that started out as a proof of concept that just more than proved the concept and became the actual thing. At the time we had a demand for IOS and web and no IOS or web developers on the team. Hence Kotlin for the web. When this looked like it was workable we actually lost our Android developer. So the decision to forget about that app was pretty easy. At that point it was half working and full of bugs and technical debt. Fixing that would only half fix our problem because we'd still need IOS and Web. So we did web first. And we are packaging it up with Cordova for those people that want something from an app store.

It's a good lesson on prototyping. If it works, do more of it. At the same time, I normally recommend minimizing risk and not building too many things in parallel. Like building 3 apps for 3 platforms instead of just a web app.

Our server is Kotlin/Spring boot and we use a lot of Elasticsearch because that's what I've been using for the last decade. A little bit of Redis and I've so far found no execuse to use a relational database. But I'd probably end up with mysql if that ever comes up. Done right, Elasticsearch makes for a nice key value store without transactions but with optimistic locking on documents. If I get some time, I add a database for safety at some point. But less moving parts means less headaches. Having just one language means the distinction between backend and frontend is a bit blurry. We have an api client that we use in our spring tests that also compiles to kotlin-js. That library contains a lot of code that we use in our front-end. Model classes, caching layers, functions that call various of our APIs, etc. And it's all covered in tests. All the business logic basically. If we ever need to do native apps, we'll use that there as well.

On the devops front I'm a combination of very pragmatic but also focused. We use stuff that works that doesn't distract us. So, no terraform for a setup we only create once; in 20 minutes. Not worth spending weeks automating but worth documenting. But we do have CI/CD via github actions. So we don't manually deploy anything. And we have lots of API integration tests. If it builds, it ships. No rollbacks; roll forward only. Keeps things simple.

We use Google Cloud and keep our cost low. A couple of VMs, a loadbalancer, a managed redis, and a managed elastic cloud cluster. That's it. Nice and simple.

collaborative • 2 years ago

Hiring and building fast can lead to huge costs, loads of bugs and performance issues

Hiring and building slow leads to multiple rounds of performance tuning early on, which can also lead to lower costs, and gives you a chance to focus on improving the product by focusing on your user experience because you're not in panic mode to raise funds, overhire and conquer the world

We could have many more good software products if companies were focused on long term quality and didn't obsess over growth

jordanbeiber • 2 years ago

Conways law.

The teams and management structure will immediately become a technical debt.

If we let the product ”decide” where boundaries actually exist and team up accordingly there’s a chance to scale and maintain a bit of velocity.

It requires constant introspection, monitoring and scrutiny though. Something I’m constantly thinking about is how to scale that beyond 20-25 developers. Gitlab have a nice section[0] in their handbook on releases and flow of small bits and pieces, and internalizing something like that together with clear domain boundaries could be a ticket.

Basically - never try to resource optimize, always figure out what good flow looks like and find ways to keep it flowing.

[0]https://about.gitlab.com/company/culture/#freedom-to-iterate

roguas • 2 years ago

> So, no terraform for a setup we only create once; in 20 minutes.

While this may be right. If you do not have a way to "bootstrap" from scratch in a small enough unit of time (minute, hour, day -> whatever you find as acceptable disruption) then you are gonna get screwed badly.

You dont have to have infrastructure up/down everyday. Just this one time will freak you enough to not just have it in the docs. Now this doesnt mean you have to have crazy infra, I just have 3 docker hosts, running a compose.yml each -> but if I lose docker/compose files its gonna take 2 weeks for me to get back.

jillesvangurp • 2 years ago

It's about 30 minutes. All relevant files live in git of course. And I tend to be diligent about documenting things because having to figure out the same shit months later really sucks.

I have plenty of experience doing this stuff; so I know what I'm opting out of. IMHO the price of devops automation can be unreasonably high for small teams. You quickly hit the point where you start considering having somebody do this full time. IMHO that is too high of a price in most small startups. In my case, either I do feature development or devops. Meaning that if I have to pause development on a project for some massive open ended devops project, I might lose weeks/months on a tight schedule. It's never simple. You always get blocked on weird shit for hours/days on end. So, I try to take as much of the pain away. Terraform is a bit of pain that doesn't solve a problem I have. Having to manually recreate something in the case that it somehow blows itself up is OK with me. Unlikely to happen very often. Not worth spending 3 months automating something that might take me hours to figure out. I have better uses for those 3 months.

jseban • 2 years ago

If people would just admit, and adapt to, the fact that the browser won over native, and Oracle won over Sun, we would avoid this situation with armies of Java developers making rube goldberg variations of basic relational, sysadmin, and gui programming tasks. But then again, how would we employ all the people with these extreme productivity multipliers, as long as our politics and economic system is still pretending that we just had the industrial revolution, and need to man the assembly lines, then Parkinson's law will apply for tech work just like everything else.

l33t2328 • 2 years ago

When you say browser won over native, are you referring to the fact that software is more commonly accessed via web instead of software actually downloaded and installed on a user’s machine?

ghiculescu • 2 years ago

I assume this isn't content marketing because PKC doesn't seem to exist anymore. But this post made me really want to get Ken to audit our code.

Are there any vendors that do similar work that people here recommend?

bambax • 2 years ago

> the major foot-gun that got a lot of places in trouble was the premature move to microservices

I sometimes wonder if the move to microservices isn't just a weird consequence of Conway's law in reverse: make a department of each developer, let them have their thing.

(See also this amazing video about Conway's law: https://www.youtube.com/watch?v=5IUj1EZwpJY )

mjdiloreto • 2 years ago

This is absolutely what microservices are about. It's arguably their strongest strength, because (at least in theory) I can decouple my team from your team and we can _only_ communicate over a strict interface.

treis • 2 years ago

You can do that without introducing a HTTP/RPC boundary.

mejutoco • 2 years ago

You can, but it requires discipline and/or tooling. With microservices you are very incentivized (I would say forced).

treis • 2 years ago

mjdiloreto • 2 years ago

Exactly! To be clear to parent commenter, I'm not endorsing microservices to solve this organizational problem, just pointing out it's part of the reason to choose microservices.

Jwarder • 2 years ago

At what point should I push for a code audit?

I don't think any of the codebases I worked on ever had a "real" audit. Best case was reviews pre/post acquisitions. An external audit seems like a good thing, but I have no idea how to argue for such a thing.

AtNightWeCode • 2 years ago

Point 2. I think this is a common misunderstanding in what engineering is. To come up with good simple solutions to complex problems often takes a lot of experience and domain knowledge.

rco8786 • 2 years ago

> the major foot-gun (which I talk about more in a previous post on foot-guns) that got a lot of places in trouble was the premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs.

Preach! Micro services are a solution to a problem that affects effectively 0 startup sized systems. That problem is scale. Micro services are hard. WAY harder than monoliths. They become necessary only once your physical hardware can no longer keep up in a monolithic fashion and parts of your system need dedicated compute and/or storage.

And no, they are not automatically necessary once your engineering team reaches N size either. Introducing network boundaries as a way to scale your engineering organization is a bad idea.

lucideer • 2 years ago

This is a great list.

One minor criticism on...

> Monorepos are easier to audit.

> Speaking from the perspective of security researcher ergonomics, it was easier to audit a monorepo than a series of services split up into different code bases. There was no need to write wrapper scripts around the various tools we had. It was easier to determine if a given piece of code was used elsewhere. And best of all, there was no need to worry about a common library version being different on another repo.

This is much more dependent on the auditor's personal workflows (as well as the relative hygiene of any team's monorepos), rather than being universal. I've found the opposite to be true for e.g. the current orgs that I am auditing: individually split up repos tend to be idiomatically structured, and "just work" as expected more often than monorepos, which more often than not have a lot of custom glue or unusual monorepo-management init scripts.

Comments on the other (generally very good) points in the list:

> Writing secure software has gotten remarkably easier in the last 10 years. I don’t have statistically sound evidence to back this up

I suspect compiling such statistical evidence would also be impossible as detection of security issues has also improved, so any data would never be comparable over time.

> The counterargument to this is that heavily weighting discoverability perpetuates ”Security by Obscurity,” since it relies so heavily on guessing what an attacker can or should know. But again, personal experience strongly suggests that in practice, discoverability is a great predictor of actual exploitation.

This is a tough circle to square because security by obscurity works. It's probably the best security measure you can have in place. But it's bad for two reasons:

(1) The process of obscuring often (doesn't need to, but very often) obscures auditing, which means you end up relying upon obscurity solely. It's not a worthwhile trade-off.

(2) In a simplistic marketing world, the idea of obscurity as a standalone measure is so tempting to non-technical decision makers that I believe it requires a bit of innocent dishonesty about it's effectiveness to dissuade.

> (on auditing dependencies) Node and npm were absolutely terrifying in this regard—the dependency chains were just not auditable.

I agree with the overarching bullet point this is said within, but I see this point about NPM said a lot, and I'm not sure how people are going about auditing or how many language ecosystems they're looking at. I have found Node/NPM to be the best / second best popular system for auditing dependency chains. I have significant experience in this area: the relative consistency of package management config across the JS/TS ecosystem is enormously helpful for software composition analysis - the only package manager configs I've found that may be slightly better is Composer, but the inconsistent usage of Composer by many PHP devs still makes it a little worse than NPM in practice. PIP /PyPi / setuptools is an inconsistent moving target of requirements.txt (is it a lockfile?), pipfile.lock, setup.cfg -vs- setup.py, pyproject.toml, and whatever else. Maven is a nightmare of multiple registry endpoints, and issues parsing custom <dependencyManagement> directives, extensions (without even starting on maven wrappers and pom.xml templating strings). Don't get me started on Gradle. Go's idea of package management is: just pull it from Git; good luck automating it if you've got private repos with any kind of secure ssh auth. I have less personal experience with Rust/Cargo.

> for some reason, PHP developers love to serialize/deserialize objects instead of using JSON

PHP serialize/deserialize predates the existence of the JSON spec., so that might have something to do with it. A lot of PHP code is old.

> Almost no one got JWT tokens and webhooks right on the first try.

Nor the second try...

BrissyCoder • 2 years ago

> All the really bad security vulnerabilities were obvious.

Isn't that just tautological? They are bad because they are obvious?