Benefits and drawbacks of Git monorepos
At one of my previous jobs we started a greenfield project with small team (about 10 developers) with an anticipation to grow to 30–40. It was what I like to call an ‘enterprise-grade web application’ with heavy server-side logic. I was in the backend team in a technical lead role. There were some key strategic decisons about technologies in the beginning. We would go for a microservices architecture, using Java EE specifications (no, the point is not whether Java EE can be considered microservice :), Maven, a couple of Python-based services. Our services would be delivered as Docker images. We also heavily invested into CI from the beginning with Jenkins, and we envisioned and used quite a robust automated testing strategy including unit, integration and component tests towards different kinds of system tests.
With new projects there are lots of decisions to make upfront. One of these choices is whether or not to use monorepo (for the backend). What people often don’t realize in the beginning how these decisions will affect the development process and team communication with all it’s nuances. And what I mean is really the small details, how to handle edge cases.
There is no free lunch: both monorepos and multirepos have their costs and benefits, they require the ability to consantly adapt the development process, team communication and CI/CD model
Things to asses
What kind of software you are making
Because this affects your deployment model, number of modules, and technology stack. If you are working with microservices, you are probably aiming for rolling deployment. If you are aiming for a distributed monolith, you may deploy everything at once. Web frontend, mobile and desktop applications are usually deployed (delivered) at once in packages with no moving parts.
- With an all-at-once deployment, it’s key to keep all the modules consistently working together. Although with a modular architecture you could still use feature flagging of unfinished functionalities in frontend, mobile or desktop applications
- Multiple versions deployment: It’s tipically with microservices architecture, that losely coupled services need to be backward and forward compatible. Only this way you can achieve lose time coupling between dev teams and rolling deployment i.e. service A calls service B; service B rolls out a new feature today into production; service A does not yet support the feature, though, their interaction must not break
Development team size
If you hear Google, Netflix, Amazon etc. use one approach with it’s project of a 200 dev headcount, it does not mean your company will benefit the same approach with its 20 developers. Spoiler alert: above a certain team size you should split your monorepo.
Throughout this post I will make the assumption that in a multirepo environment the repositories are owned and maintained by their own teams. But one team = one repository is not always the case. In a monorepo environment usually multiple teams work in the same repository. But with a multirepo setup it’s an agreement which teams own which repositories.
Do you aim for heterogeneous or homogeneous technical stack? With more technologies e.g. programming languages with their corresponding build tools and environments you may better off separating them on repository level. This is more of a question of taste. I like if the whole project in one repo can be built with one single command, having multiple languages and build tools make it difficult.
Software engineering requires huge mental work. You need to remember technology, techniques, conventions, engineering principles, team communication, development process etc. Therefore making simpler tools and methods will relieve developer’s mental capacity.
Where monorepos win
Simpler change tracking
It’s easy to stroll through the commits, branches and merge requests of one repository if you are looking for changes belonging of a certain ticket. But you can make it easier with dev tool integrations in multirepos as well e.g. Jira-Bitbucket integration, where all the changes can appear under one ticket
Maintaining Cohesion between modules (or microservices)
No matter what you do, if you are building one software system, there is always coupling between the parts. Maintaining consistency between modules of the application or microservice system is a lot easier in one single repository. Yes, you may say: don’t introduce breaking changes in your API, but it takes time to achieve that maturity level.
With monorepos you have your buildtime dependencies right away, you don’t need to version your common libraries. Your runtime dependencies is also at one place for your daily local development environment or CI/CD pipelines.
Overall time of change propagation across modules
If you change one API, you can already adapt your dependents right away in short time in the same repo, same branch. With multirepo (and multiple teams) the change propagation can take days until every team updates their module in their repository. Needless to say, that with multiple repos, one can simply forget to change the dependent repositories. This affects the whole development process because cross-team communication is slower. On the other hand if you have strong commitment about team ownership, you may not want to change the module of another team even in the same repository.
Development of the CI pipeline
Having seen the development of CI/CD pipelines, I can say implementing the pipelines is a full time job therefore you need to put a lot of thoughts to keep them simple. With monorepos you have your whole project at hand to run full integration and system tests. With a multirepo setup you need a different pipeline for each repository, then you need the pipelines that check out all the repositories to run system tests. The whole setup is a lot more complex.
Truth to be told the other major factor that affects pipeline complexity is versatility of technology stack. You should aim for homogeneity of programming languages, frameworks and build tools, so you can reuse pipeline implementations even in a multirepo setup.
Enforcement of conventions and consolidation of tech stack
With everything in one repo it’s more likely the whole development team will feel the overall responsibility of enforcing the conventions established. It’s also easier to point out problematic code. With multirepo setup it’s easy to have a ‘not my repo, not my responsibility, not my problem’ attitude that slowly (or fast) will erode overall quality.
Git pull, branch, push, merge request; code review, building the project locally, testing the project locally etc. takes considerable amount of time every day. If you force your developers to work with multiple repositories simultaneously, you just multiply this problem, not to mention the mental burden to keep everything in mind.
Where monorepos lose
Collisions of manual merges
As the headcount is growing and you are doing manual merges to ‘develop’ branch, you will run into a lot of collisions, especially that you have to wait for your CI pipeline to finish before merge. While your pipeline is running, another merge may have occured, you may need to rebase and rerun the pipeline.
To remedy the problem you can develop and/or integrate a merge queue and auto-merging capabilities into your develop branch.
Isolating problematic modules
Some modules may poison the whole monorepo with their substandard code quality, too long build time, too difficult testing strategy, too big memory consumption during tests, or too many/big resource files. If you cannot remedy these problems, it’s probably the best to outsource these modules into a different repository.
Source code itself usually does not take too much space, but all the necessary resource files accompanying the code. You need to take big care, and if possible outsource those resource files into different repository: either source code repository or binary repository e.g. Nexus.
CI becomes time consuming
Running a build and CI automated tests on the whole repository may become time consuming. If a developer have to wait 15 minutes to merge the code, it can become a burden.
The remedy can be to run CI only for those modules that changed with a commit or merge. You may do a custom development in your CI pipeline for this.
Promotes tight coupling
Lose coupling is a good software engineering practice. With a monorepo it’s more difficult to enforce lose coupling and backward/forward compatibility between modules.
I like to make opinionated posts, and you may get that I favor monorepo over multirepo setup. But it’s not like this or that. You can always combine approaches.
I recommend that to start with a monorepo, which helps in establishing and keeping standards, setting up your CI/CD pipelines fast, and keeping the cohesion between modules especially in a greenfield project. Once the teams and development process get more mature, or you identify problematic or thightly coupled modules, you can start moving these elements into different repositories.