Back

Monorepo – Our Experience

107 points10 hoursente.io
__MatrixMan__34 minutes ago

Every monorepo I've ever met (n=3) has some kind of radioactive DMZ that everybody is afraid to touch because it's not clear who owns it but it is clear from its quality that you don't want to be the last person who touched it because then maybe somebody will think that you own it. It's usually called "core" or somesuch.

Separate repos for each team means that when two teams own components that need to interact, they have to expose a "public" interface to the other team--which is the kind of disciplined engineering work that we should be striving for. The monorepo-alternative is that you solve it in the DMZ where it feels less like engineering and more like some kind of multiparty political endeavor where PR reviewers of dubious stakeholder status are using the exercise to further agendas which are unrelated to the feature except that it somehow proves them right about whatever architectural point is recently contentious.

Plus, it's always harder to remove something from the DMZ than to add it, so it's always growing and there's this sort of gravitational attractor which, eventually starts warping time such that PR's take longer to merge the closer they are to it.

Better to just do the "hard" work of maintaining versioned interfaces with documented compatibility (backed by tests). You can always decide to collapse your codebase into a black hole later--but once you start on that path you may never escape.

CharlieDigital7 hours ago

    > Moving to a monorepo didn't change much, and what minor changes it made have been positive.
I'm not sure that this statement in the summary jives with this statement from the next section:

    > In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.
    > 
    > Now, it is a single one. Easy to review, easy to merge, easy to revert.
IMO, this is a huge quality of life improvement and prevents a lot of mistakes from not having the right revision synced down across different repos. This alone is a HUGE improvement where a dev doesn't accidentally end up with one repo in this branch and forgot to pull this other repo at the same branch and get weird issues due to this basic hassle.

When I've encountered this, we've had to use another repo to keep scripts that managed this. But this was also sometimes problematic because each developer's setup had to be identical on their local file system (for the script to work) or we had to each create a config file pointing to where each repo lived.

This also impacts tracking down bugs and regression analysis; this is much easier to manage in a mono-repo setup because you can get everything at the same revision instead of managing synchronization of multiple repos to figure out where something broke.

danudey5 hours ago

I prefer microservices/microrepos _conceptually_, but we had the same experience as your quoted text - making changes to four repos, and backporting those changes to the previous two release branches, means twelve separate PRs to make a change.

Having a centralized configuration library (a shared Makefile that we can pull down into our repo and include into the local Makefile) helps, until you have to make a backwards-incompatible change to that Makefile and then post PRs to every branch of every repo that uses that Makefile.

Now we have almost the entirety of our projects back into one repository and everything is simpler; one PR per release branch, three PRs (typically) for any change that needs backporting. Vastly simpler process and much less room for error.

taeric6 hours ago

My only counter argument here, is when those 4 things deploy independently. Sometimes, people will get tricked into thinking a code change is atomic because it is in one commit, when it will lead to a mixed fleet because of deployment realities. In that world, having them separate is easier to work with, as you may have to revert one of the deployments separately from the others.

derefr2 hours ago

That's just an argument for not doing "implicit GitOps", treating the tip of your monorepo's main branch as the source-of-truth on the correct deployment state of your entire system. ("Implicit GitOps" sorta-kinda works when you have a 1:1 correspondence between repos and deployable components — though not always! — but it isn't tenable for a monorepo.)

What instead, then? Explicit GitOps. Explicit, reified release specifications (think k8s resource manifests, or Erlang .relup files), one per separately-deploy-cadenced component. If you have a monorepo, then these live also as a dir in the monorepo. CD happens only when these files change.

With this approach, a single PR can atomically merge code and update one or more release specifications (triggering CD for those components), if and when that is a sensible thing to do. But there can also be separate PRs for updating the code vs. "integrating and deploying changes" to a component, if-and-when that is sensible.

ramchip33 minutes ago

> With this approach, a single PR can atomically merge code and update one or more release specifications (triggering CD for those components), if and when that is a sensible thing to do.

How do you avoid the chicken-and-egg problem? Like if the k8s manifest contains a container tag, and the tag is created by CI when the PR is merged to main, it would seem you can’t add code and deploy that code in the same PR.

taeric1 hour ago

I mean... sure? Yes, if you add extra structure on top of your code that is there to model the deployments, then you get a bit closer to modeling your deployments. Isn't that the exact argument for why you might want multiple repositories, as well?

scubbo2 hours ago

...I can't believe I'd never thought about the fact that a "Deployment Repo" can, in fact, just be a directory within the Code Repo. Interesting thought - thanks!

hinkley28 minutes ago

If the version numbers of all services built from the PR are identical, you at least have a pretty clear trail to figuring out WTF happened.

Even with a few services, we saw some pretty crunchy issues with people not understanding that service A had version 1.3.1234 of a module and Service B had version 1.3.1245 and that was enough skew to cause problems.

Distinct repos tend to have distinct builds, and sooner or later one of the ten you're building will glitch out and have to be run twice, or the trigger will fail and it won't build at all until the subsequent merge, and having numbers that are close results in a false sense of confidence.

lmz2 hours ago

Isn't a mixed fleet always the case once you have more than one server and do rolling updates?

xyzzy12322 minutes ago

Sort of; at medium scale you can blue/green your whole system out of the monorepo (even if its say 20 services) in k8s and flip the ingresses to cut over during release.

Of course k8s not required, you can do it in straight IaC etc (i.e deploy a whole parallel system and switch).

It's still "mixed fleet" in terms of any shared external resources (queues, db state, etc) but you can change service interfaces etc with impunity and not worry about compatibility / versioning between services.

Throwing temporary compute at the problem can save a lot of busywork and/or thinking about integration problems.

This stops being practical if you get _very_ big but at that point you presumably have more money and engineers to throw at the problem.

taeric1 hour ago

Yes. And if you structure your code to explicitly do this, it is a lot easier to reason about.

ericyd2 hours ago

I felt the same, the author seemed to downplay the success while every effect listed in the article felt like a huge improvement.

Attummm1 hour ago

The issue you faced stemmed from the previous best practice of "everything in its own repository." This approach caused major issues. Such as versioning challenges and data model inconsistencies you mentioned. The situations it could lead to are comedy sketches, but it's a real pain especially when you’re part of a team struggling with these problems. And it’s almost impossible to convince a team to change direction once they’ve committed to it.

Now, though, it seems the pendulum has swung in the opposite direction, from “everything in its own repo” to “everything in one repo.” This, too, will create its own set of problems, which also can be comedic, but frustrating to experience. For instance, what happens when someone accidentally pushes a certificate or API key and you need to force an update upstream? Coordinating that with 50 developers spread across 8 projects, all in a single repo.

Instead we could also face the problems we currently face and start out wirn a balanced approach. Start with one repository, or split frontend and backend if needed. For data pipelines that share models with the API, keep them in the same repository, creating a single source of truth for the data model. This method has often led to other developers telling me about the supposed benefits of “everything in its own repo.” Just as I pushed back then, I feel the need to push back now against the monorepo trend.

The same can be said for monoliths and microservices, where the middle ground is often overlooked in discussions about best practices.

They all reminded me of the concept of “no silver bullet”[0]. Any decision will face its own unique challenges. But silver bullet solution can create artificial challenges that are wasteful, painful, and most of all unnecessary.

[0]https://en.m.wikipedia.org/wiki/No_Silver_Bullet

lolinder11 minutes ago

> what happens when someone accidentally pushes a certificate or API key and you need to force an update upstream

The correct approach here is typically to invalidate the certificate or API key. A force push usually doesn't work.

If you're using GitHub, the dangerous commit lives on effectively forever in an awkward "not in a repository" state. Even if you're not on GitHub and your system actually garbage collects, the repo has been cloned onto enough build machines and dev machines that you're better off just treating the key or cert as compromised than trying to track down all the places where it might have been copied.

wongarsu4 hours ago

It's not as much of a pain if your tooling supports git repos as dependencies. For example a typical multi-repo PR for us with rust is 1) PR against library 2) PR against application that points dependency to PR's branch, makes changes 3) PR review 4) PR 1 is approved and merged 5) PR 2 is changed to point to new master branch of commit 6) PR 2 is approved and merged

Same idea if you use some kind of versioning and release system. It's still a bit of a pain with all the PRs and coordination involved, but at every step every branch is consistent and buildable, you just check it out and hit build.

This is obviously more difficult if you have a more loosely coupled architecture like microservices. But that's self-inflicted pain

audunw6 hours ago

There’s nothing preventing you from having a single pull request in for merging branches over multiple repos. There’s nothing preventing you from having a parent repo with a lock file that gives you a single linear set of commits tracking the state of multiple repos.

That is, if you’re not tied to using just Github of course.

Big monorepos and multiple repo solutions require some tooling to deal with scaling issues.

What surprises me is the attitude that monorepos are the right solution to these challenges. For some projects it makes sense yes, but it’s clear to me that we should have a solution that allows repositories to be composed/combined in elegant ways. Multi-repository pull requests should be a first class feature of any serious source code management system. If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.

CharlieDigital5 hours ago

    > Multi-repository pull requests should be a first class feature of any serious source code management system. 
But it's currently not?

    > If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.
It's called a directory copy. Cut + paste. I'd add a tag with a comment pointing to the old repo (if needed). But probably after a few weeks, no one is going to look at the old repo.
dmazzoni2 hours ago

> It's called a directory copy. Cut + paste. I'd add a tag with a comment pointing to the old repo (if needed). But probably after a few weeks, no one is going to look at the old repo.

Not in my experience. I use "git blame" all the time, and routinely read through commits from many years ago in order to understand why a particular method works the way it does.

Luckily, there are many tools for merging git repos into each other while preserving history. It's not as simple as copy and paste, but it's worth the extra efford.

pelletier6 hours ago

> Multi-repository pull requests should be a first class feature of any serious source code management system.

Do you have examples of source code management systems that provide this feature and do you have experience with them? repo-centric approach of GitHub often feels limiting.

jvolkman5 hours ago

Apparently Gerrit supports this with topics: https://gerrit-review.googlesource.com/Documentation/cross-r...

eikenberry2 hours ago

I thought one of the whole points behind separate (non-mono)repos was to help enforce loose coupling and if you came to a point where a single feature change required PRs on 4 separate repos then that was an indicator that your project needed refactoring as it was becoming to tightly coupled. The example in the article could have been interpreted to mean that they should refactor the functionality for interacting with the ML model into it's own repo so it could encapsulate this aspect of the project. Instead they doubled down on the tighter coupling by putting them in a monorepo (which itself encourages tighter coupling).

marcosdumay26 minutes ago

The issue is that you can't "enforce" loose coupling. The causality is reversed here.

Your software artifacts will have loose coupling if you divided them well enough on their creation. As soon as they are created, you can't do anything else to change it, except for joining or splitting them.

notwhereyouare7 hours ago

ironically was gonna come and comment on that same second block of text.

We went from monorepo to multi-repo at work and it's been a huge set back and disappointment with the devs because it's what our contractors recommended.

I've asked for a code deploy and everything and it's failed in prod due to a missing check in

CharlieDigital7 hours ago

    > ...because it's what our contractors recommended
It's sad when this happens instead of taking input from the team on how to actually improve productivity/quality.

A startup I joined started with a multi-repo because the senior team came from a FAANG where this was common practice to have multiple services and a repo for each service.

Problem was that it was a startup with one team of 6 devs and each of the pieces was connected by REST APIs. So now any change to one service required deploying that service and pulling down the OpenAPI spec to regenerate client bindings. It was so clumsy and easy to make simple mistakes.

I refactored the whole thing in one weekend into a monorepo , collapsed the handful of services into one service, and we never looked back.

That refactoring and a later paper out of Google actually inspired me to write this article as a practical guide to building a "modular monolith": https://chrlschn.dev/blog/2024/01/a-practical-guide-to-modul...

eddd-ddde7 hours ago

At least google and meta are heavy into monorepos, I'm really curious what company is using a _repo per service_. That's insane.

marcosdumay10 minutes ago

Compared to all the issues with keeping a service running, pushing code to a different repo is trivial. If you think that's insane, the insanity is definitively not on the repository separation.

+1
jgtrosh6 hours ago
pc866 hours ago

It can make sense when you have a huge team of devs and different teams responsible for everything where you may be on multiple teams, and nobody is exactly responsible for all the same set of services you are. Depending on the security/access provisioning culture of the org, "taking half a day to manually grant access to the repos so-and-so needs access to" may actually be an easier sell than "give everyone access to all our code."

If you just have 20-30 devs and everyone is pretty silo'd (e.g. frontend or backend, data or API, etc) having 75 repos for your stuff is just silly.

+1
psoundy4 hours ago
+1
bobnamob6 hours ago
+2
dewey6 hours ago
+2
wrs4 hours ago
stackskipton4 hours ago

>So now any change to one service required deploying that service and pulling down the OpenAPI spec to regenerate client bindings. It was so clumsy and easy to make simple mistakes.

Why? Is your framework heavily tied to client bindings? APIs I consume occasionally get new fields added to it for data I don't need. My code just ignores it. We also have a policy you cannot add a new mandatory field to API without version bump. So maybe REST API would have new field but I didn't send it and it happily didn't care.

jayd166 hours ago

If prod went down because of a missing check in, there are other problems.

notwhereyouare3 hours ago

did I say prod went down? I just said it failed in prod. it was a logging change and only half the logging went out. To me, that's a failure

mgaunard5 hours ago

Doing modular right is harder than doing monolithic right.

But if you do it right, the advantage you get is that you get to pick which versions of your dependencies you use; while quite often you just want to use the latest, being able to pin is also very useful.

lukewink5 hours ago

You can still publish packages and pull them down as (pinned) dependencies all within a monorepo.

mgaunard2 hours ago

that's a terrible and arguably broken-by-design workflow which entirely defeats the point of the monorepo, which is to have a unified build of everything together, rather than building things piecemeal in ways that could be incompatible.

For C++ in particular, you need to express your dependencies in terms of source versions, and ensure all of the build artifacts you link together were built against the same source version of every transitive dependency and with the same flags. Failure to do that results in undefined behaviour, and indeed I have seen large organizations with unreliable builds as a manner of routine because of that.

The best way to achieve that is to just build the whole thing from source, with a content-addressable-store shared with the whole organization to transparently avoid building redundant things. Whether your source is in a single repo or spread over several doesn't matter so long as your tooling manages that for you and knows where to get things, but ultimately the right way to do modular is simply to synthesize the equivalent monorepo and build that. Sometimes there is the requirement that specific sources should have restricted access, which is often a reason why people avoid building from source, but that's easy to work around by building on remote agents.

Now for some reason there is no good open-source build system for C++, while Rust mostly got it right on the first try. Maybe it's because there are some C++ users still attached to the notion of manually managing ABI.

xyzzy_plugh7 hours ago

Without indicating my personal feelings on monorepo vs polyrepo, or expressing any thoughts about the experience shared here, I would like to point out that open-source projects have different and sometimes conflicting needs compared to proprietary closed-source projects. The best solution for one is sometimes the extreme opposite for the other.

In particular many build pipelines involving private sources or artifacts become drastically more complicated than their those of publicly available counterparts.

bunderbunder3 hours ago

I've also seen this with branching strategies. IMO the best branching strategy for open source projects is generally the worst one for commercial projects, and vice versa.

gregmac7 hours ago

To me, monorepo vs multi-repo is not about the code organization, but about the deployment strategy. My rule is that there should be a 1:1 relation between a repository and a release/deployment.

If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.

The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).

Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.

TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.

CharlieDigital6 hours ago

It doesn't have to be that way.

You can have a mono-repo and deploy different parts of the repo as different services.

You can have a mono-repo with a React SPA and a backend service in Go. If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?

Falimonda6 hours ago

This is spot on. A monorepo can still include a granular and standardized CI configuration across code paths. Nothing about monorepo forces you to perform a singular deployment.

The gains provided by moving from polyrepo to monorepo are immense.

Developer access control is the only thing I can think to justify polyrepo.

I'm curious if and how others who see the advantages of monorepo have justified polyrepo in spite of that.

oneplane6 hours ago

You wouldn't, but making a repo collection into a mono-repo means your mono-deploy needs to be split into a multi-maybe-deploy.

As always, complexity merely moves around when squeezed, and making commits/PRs easier means something else, somewhere else gets less easy.

It is something that can be made better of course, having your CI and CD be a bit smarter and more modular means you can now do selective builds based on what was actually changed, and selective releases based on what you actually want to release (not merely what was in the repo at a commit, or whatever was built).

But all of that needs to be constructed too, just merging some repos into one doesn't do that.

CharlieDigital6 hours ago

This is not very complex at all.

I linked an example below. Most CI/CD, like GitHub Actions[0], can easily be configured to trigger on changes for files in a specific path.

As a very basic starting point, you only need to set up simple rules to detect which monorepo roots changed.

[0] https://docs.github.com/en/actions/writing-workflows/workflo...

bryanlarsen6 hours ago

If you don't deploy in tandem, you need to test forwards & backwards compatibility. That's tough with either a monorepo or separate repos, but arguably it'd be simple with separate repos.

CharlieDigital6 hours ago

It doesn't have to be that complicated.

All you need to know is "does changing this code affect that code".

In the example I've given -- a React SPA and Go backend -- let's assume that there's a gRPC binding originating from the backend. How do we know that we also need to deploy the SPA? Updating the schema would cause generation of a new client + model in the SPA. Now you know that you need to deploy both and this can be done simply by detecting roots for modified files.

You can scale this. If that gRPC change affected some other web extension project, apply the same basic principle: detect that a file changed under this root -> trigger the workflow that rebuilds, tests, and deploys from this root.

aswerty6 hours ago

This mirrors my own experience in the SaaS world. Anytime things move towards multiple artifacts/pipelines in one repo; trying to understand what change existed where and when seems to always become very difficult.

Of course the multirepo approach means you do this dance a lot more: - Create a change with backwards compatibility and tombstones (e.g. logs for when backward compatibility is used) - Update upstream systems to the new change - Remove backwards compatibility and pray you don't have a low frequency upstream service interaction you didn't know about

While the dance can be a pain - it does follow a more iterative approach with reduced blast radiuses (albeit many more of them). But, all in all, an acceptable tradeoff.

Maybe if I had more familiarity in mature tooling around monorepos I might be more interested in them. But alas not a bridge I have crossed, or am pushed to do so just at the moment.

msoad2 hours ago

I love monorepos but I'm not sure if Git is the right tool beyond certain scale. Where I work doing a simple `git status` takes seconds due to the size of the repo. There has been various attempts to solve Git performance but so far this is nothing close to what I experienced at Google.

The Git team should really invest in tooling for very large repos. Our repo is around 10M files and 100M lines of code and no amount of hacks on top of Git (cache, sparse checkout etc etc) is not really solving the core problem.

Meta and Google have really solved this problem internally but there is no real open source solution that works for everyone out there.

dijit2 hours ago

I’m secretly hoping that google releases piper (and Mondrian); the gaming industry would go wild.

Perforce is pretty brutal, and the code review tools are awful - but its still the undisputed king of mixed text and binary assets in a huge monorepo.

siva77 hours ago

Ok, but the more interesting part - how did you solve the CI/CD part and how does it compare to a multirepo?

devjab7 hours ago

I don’t think CI/CD should really be a big worry as far as mono-repositories go as you can setup different pipelines and different flows with different configurations. Something you’re probably already doing if you have multiple repos.

In my experience the article is right when it tells you there isn’t that big of a difference. We have all sorts of repositories, some of which are basically mono-repositories for their business domain. We tend to separate where it “makes sense” which for us means that it’s when what we put into repositories is completely separate from everything else. We used to have a lot of micro-repositories and it wasn’t that different to be honest. We grouped more of them together to make it easier for us to be DORA compliant in terms of the bureaucracy it adds to your documentation burden. Technically I hardly notice.

JamesSwift7 hours ago

In my limited-but-not-nothing experience working with mono vs multi repo of the same projects, CI/CD definitely was one of the harder pieces to solve. Its highly dependent on your frameworks and CI provider on just how straightforward it is going to be, and most of them are "not very straightforward".

The basic way most work is to run full CI on every change. This quickly becomes a huge speedbump to deployment velocity until a solution for "only run what is affected" is found.

devjab6 hours ago

Which CI/CD pipelines have you had issues with? Because that isn’t my experience at all. With both GitHub (also Azure DevOps) and gitlab you can separate your pipelines with configurations like .gitlab-ci.yml. I guess it can be non-trivial to setup proper parallelisation when you have a lot of build stages if this isn’t something you’re familiar with. A lot of other more self-hosted tools like Gradle, RushJS and many others you can setup configurations which does X if Y and make sure only to run things which are necessary.

I don’t want to be rude, but a lot of these tools have rather accessible documentation on how to get up and running as well as extensive documentation for more complex challenges available in their official docs. Which is probably the, only, place you’ll find good ways of working with it because a lot of the search engine and LLM “solutions” will range from horrible to outdated.

It can be both slower and faster than micro-repositories in my experience, however, you’re right that it can indeed be a Cthulhu level speed bump if you do it wrong.

JamesSwift5 hours ago

I implied but didnt explicitly mention that I'm talking from the context of moving _from_ existing polyrepo _to_ monorepo. The tooling is out there to walk a more happy-path experience if you jump in on day 1 (or early in the product lifecycle). But its much harder to migrate to it and not have to redo a bunch of CI-related tooling.

bluGill6 hours ago

The problem with "only run what is affected" is it is really easy to have something that is affected but doesn't seem like it should be (that is whatever tools you have to detect is it affected say it isn't). So if you have such a system you must have regular rebuild everything jobs as well to verify you didn't break something unexpected.

I'm not against only run what is affected, it is a good answer. It just has failings that you need to be aware of.

+1
JamesSwift5 hours ago
CharlieDigital7 hours ago

Most CI/CD platforms will allow specification of targeted triggers.

For example, in GitHub[0]:

    name: ".NET - PR Unit Test"
    
    on:
      ## Only execute these unit tests when a file in this directory changes.
      pull_request:
        branches: [main]
        paths: [src/services/publishing/**.cs, src/tests/unit/**.cs]
So we set up different workflows that kick off based on the sets of files that change.

[0] https://docs.github.com/en/actions/writing-workflows/workflo...

victorNicollet7 hours ago

I'm not familiar with GitHub Actions, but we reverted our migration to Bitbucket Pipelines because of a nasty side-effect of conditional execution: if a commit triggers test suite T1 but not T2, and T1 is successful, Bitbucket displays that commit with a green "everything is fine" check mark, regardless of the status of T2 on any ancestors of that commit.

That is, the green check mark means "the changes in this commit did not break anything that was not already broken", as opposed to the more useful "the repository, as of this commit, passes all tests".

plorkyeran6 hours ago

I would find it extremely confusing and unhelpful if tests in the parent commit which weren't rerun for a PR because nothing relevant was touched marked the PR as red. Why would you even want that? That's not something which is relevant to evaluating the PR and would make you get in the habit of ignoring failures.

If you split something into multiple repositories then surely you wouldn't mark PRs on one of them as red just because tests are failing in a different one?

+1
victorNicollet2 hours ago
ants_everywhere6 hours ago

isn't that generally what you want? the check mark tells you the commit didn't break anything. if something was already broken it should have either blocked the commit that broke it or there's a flake somewhere that you can only locate by periodically running tests independent of any PR activity.

daelon6 hours ago

Is it a side effect if it's also the primary effect?

hk13376 hours ago

Even AWS CodeBuild (or CodePipeline) allows you to do this now. It didn't before but it's a fairly recent update.

CharlieDigital2 hours ago

As a prior user of AWS Code*, I can appreciate that you qualified that with "Even" LMAO

victorNicollet6 hours ago

Wouldn't CI be easier with a monorepo ? Testing integration across multiple repositories (triggered by changes in any of them) seems more complex than just adding another test suite to a single repo.

bluGill6 hours ago

Pros and cons. Both can be used successfully, but there are different problems to each. If you have a large project you will have a tool teams to deal with the problems of your solution.

KaiserPro2 hours ago

Monorepos have their advantages, as pointed out, one place to review, one place to merge.

But it can also breed instability, as you can upgrade other people's stuff without them being aware.

There are ways around this, which involve having a local module store, and building with named versions. Very similar to a bunch of disparate repos, but without getting lost in github (github's discoverability was always far inferior to gitlab)

However it has its draw backs namely that people can hold out on older versions than you want to support.

dkarl2 hours ago

> But it can also breed instability, as you can upgrade other people's stuff without them being aware

This is why Google embraced the principle that if somebody breaks your code without breaking your tests, it's your fault for not writing better tests. (This is sometimes known as the Beyonce rule: if you liked it, you should have put a test on it.)

You need the ability to upgrade dependencies in a hands-off way even if you don't have a monorepo, though, because you need to be able to apply security updates without scheduling dev work every time. You shouldn't need a careful informed eye to tell if upgrades broke your code. You should be able to trust your tests.

stackskipton4 hours ago

As DevOps/SRE type person that occasionally gets stuck with builds, Monorepos world well if company will invest in the build process. However, many companies don't do well in this area and Monorepo blast radius becomes much bigger so individual repos it is. Also, depending on the language, building private repo is easy enough to keep all common libraries in.

paxys2 hours ago

All the pitfalls of a monorepo can disappear with some good tooling and regular maintenance, so much so that devs may not even realize that they are using one. The actual meat of the discussion is – should you deploy the entire monorepo as one unit or as multiple (micro)services?

bobim1 hour ago

Started to use a monorepo + worktrees to keep related but separated developments all together with different checkouts. Anybody else on the same path?

h1fra7 hours ago

I think the big issue around monorepo is when a company puts completely different projects together inside a single repo.

In this article almost everything makes sense to me (because that's what I have been doing most of my career) but they put their OTP app inside which suddenly makes no sense. And you can see the problem in the CI they have dedicated files just for this App and probably very few common code with the rest.

IMO you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library.

fragmede7 hours ago

> you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project)

that's not a monorepo!

Unless the singular "project" is stuff our company ships, the problem you have is of impedance mismatch between the projects, which is the problem that an actual monorepo solves. for swe's on individual projects who will never have the problem of having to ship a commit on all the repos at the "same" time, yeah that seems fine, and for them it is. the problem comes as a distributed systems engineer where, for whatever reason, many or all the repos need to be shipped at the ~same time. or worse - A needs to ship before B which needs ship before C but that needs to ship before A, and you have to unwind that before actually being able to ship the change.

h1fra24 minutes ago

my implicit point was that most people don't want monorepo; when they talk about monorepo they talk about consolidating a project together, that can span many different repos and technology.

I'm not convinced that making completely different teams work on the same repo is making things better. In the case of cascading dependencies what usually works better than a convoluted technical solution is communication.

hk13376 hours ago

> that's not a monorepo!

Sure it is! It's just not the ideal use case for a monorepo which is why people say they don't like monorepos.

vander_elst5 hours ago

"one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library."

They are literally saying that multiple repos should be used, also for sharing the code, this is not monorepo, these are different repos.

magicalhippo7 hours ago

We're transitioning from a SVN monorepo to Git. We've considered doing a kind of best-of-both-worlds approach.

Some core stuff into separate libraries, consumed as nuget packages by other projects. Those libraries and other standalone projects in separate repos.

Then a "monorepo" for our main product, where individual projects for integrations etc will reference non-nuget libraries directly.

That is, tightly coupled code goes into the monorepo, the rest in separate repos.

Haven't taken the plunge just yet tho, so not sure how well it'll actually work out.

dezgeg4 hours ago

In my experience this turns to nightmare when (not if, when) there is need to make changes to the libraries and app at the same time. Especially with libraries it's often necessary to create a client for an API at the same time to really know that the interface is any good.

magicalhippo2 hours ago

The idea is that the libraries we put in nuget are really non-project-specific. We'll use nuget to manage library versions rather than git submodules, so hopefully they can live fine in a separate repo.

So updating them at the same time shouldn't be a huge deal, we just make the change in the library, publish the nuget package, and then bump the version number in the downstream projects that need the change.

Ideally changes to these libraries should be relatively limited.

For things that are intertwined, like an API client alongside the API provider and more project-specific libraries, we'll keep those together in the same repo.

If this is what you're thinking of, I'd be interested in hearing more about your negative experiences with such a setup.

memsom7 hours ago

monorepos are appropriate for a single project with many sub parts but one or two artifacts on any given release build. But they fall apart when you have multiple products in the monorepo, each with different release schedules.

As soon as you add a second separate product that uses a different subset of any code in the repo, you should consider breaking up the monorepo. If the code is "a bunch of libraries" and "one or more end user products" it becomes even more imperative to consider breaking down stuff..

Having worked on monorepos where there are 30+ artifacts, multiple ongoing projects that each pull the monorepo in to different incompatible versions, and all of which have their own lifetime and their own release cycle - monorepo is the antithesis of a good idea.

vander_elst5 hours ago

Working on a monorepo where we have hundreds (possibly thousands) of projects each with a different version and release schedule. It actually works quite well, the dependencies are always in a good state, it's easy to see the ramifications of a change and to reuse common components.

memsom4 hours ago

Good for you. For us, because we have multiple projects going on, pulling the code in different ways, code that runs on embedded, code that runs in the cloud, desktop apps (real ones written in C++ and .Net, not glorified web apps), code that is customer facing, code used by third parties for integrating our products, no - it just doesn’t work. The embedded shares a core with other levels, and we support multiple embedded platforms (bare metal) and OS (Windows, Linux, Android, iOS) and also have stuff that runs in Amazon/Azure cloud platform. You might be fine, but when you hit critical mass and you have very complicated commercial concerns, it doesn’t work well.

tomtheelder3 hours ago

I mean it works for Google. Not saying that's a reason to go monorepo, but it at least suggests that it can work for a very large org with very diverse software.

I really don't see why anything you describe would be an issue at all for a monorepo.

munksbeer6 hours ago

No offense but I think you're doing monorepos wrong. We have more than 100 applications living in our monorepo. They share common core code, some common signals, common utility libs, and all of them share the same build.

We release everything weekly, and some things much more frequently.

If your testing is good enough, I don't see what the issue is?

bluGill6 hours ago

> If your testing is good enough, I don't see what the issue is?

Your testing isn't good enough. I don't know who you are, what you are working on, or how much testing you do, but I will state with confidence it isn't good enough.

It might be acceptable for your current needs, but you will have bugs that escape testing - often intentional as you can't stop forever to fix all known bugs. In turn that means if anything changes in your current needs you will run into issues.

> We release everything weekly, and some things much more frequently.

This is a negative to users. When you think you will release again next so who cares about bugs it means your users see more bugs. Sure it is nice that you don't have to break open years old code anymore, but if the new stuff doesn't have anything the user wants is this really a good thing?

memsom4 hours ago

No offence, but you might be a little confused by how complex your actual delivery is. That sounds simple. That sounds like it has a clear roadmap. When you don’t, and you have very agile development that pivots quickly and demands a lot of change concurrently for releases that have very different goals, it is not possible to make all your ducks sit in a row. Monorepos suck in that situation. The dependency graph is so complex it will make your head hurt. And all the streams need to converge in to the main dev branch at some point, which causes huge bottlenecks.

tomtheelder2 hours ago

The dependency graph is no different for a monorepo vs a polyrepo. It's just a question of how those dependencies get resolved.

stillbourne4 hours ago

I like to use the monorepo tools without the monorepo repo. If that makes any god damn sense. I use NX at my job and the monorepo was getting out of hand, 6 hour pipeline builds, 2 hours testing, etc. So I broke the repo into smaller pieces. This wouldn't have been possible if I wasn't already using the monorepo tools universally through the project but it ended up working well.

syndicatedjelly7 hours ago

Some thoughts:

1) Comparing a photo storage app to the Linux kernel doesn't make much sense. Just because a much bigger project in an entirely different (and more complex) domain uses monorepos, doesn't mean you should too.

2) What the hell is a monorepo? I feel dumb for asking the question, and I feel like I missed the boat on understanding it, because no one defines it anymore. Yet I feel like every mention of monorepo is highly dependent on the context the word is used in. Does it just mean a single version-controlled repository of code?

3) Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project

datadrivenangel6 hours ago

Monorepo is just a single repo. Yup.

Git submodules have some places where you can surprisingly lose branches/stashed changes.

syndicatedjelly6 hours ago

One of my repos has a dependency on another repo (that I also own). I initialized it as a git submodule (e.g. my_org/repo1 has a submodule of my_org/repo2).

    Git submodules have some places where you can surprisingly lose branches/stashed changes.
This concerns me, as git generally behaves as a leak-proof abstraction in my experience. Can you elaborate or share where I can learn more about this issue?
klooney6 hours ago

> Does it just mean a single version-controlled repository of code?

Yeah- they idea is that all of your projects share a common repo. This has advantages and drawbacks. Google is most famous for this approach, although I think they technically have three now- one for Google, one for Android, and one for Chrome.

> They seem like a great solution to me

They don't work in a team context because they're extra steps that people don't do, basically. And did some reason a lot of people find them confusing.

nonameiguess5 hours ago

https://github.com/google/ contains 2700+ repositories. I don't know necessarily how many of these are read-only clones from an internal monorepo versus how many are separate projects that have actually been open-sourced, but the latter is more than zero.