March 07 2026
Money isn’t going to solve the burnout problem
The xz-utils backdoor situation brought the problem of FLOSS maintained burnout into the daylight. This in turn lead to numerous discussion on how to solve the problem, and the recurring theme was funding maintenance work.
While I’m definitely not opposed to giving people money for their FLOSS work, if you think that throwing some bucks will actually solve the problem, and especially if you think that you can just throw them once and then forget, I have bad news for you: it won’t. Surely, money is a big part of the problem, but it’s not the only reason people are getting burned out. It’s a systemic problem, and it’s in need of systemic solution, and that’s involves a lot of hard work undo everything that’s happened in the last, say, 20 years.
But let’s start at the beginning and ask the important question: why do people make free software?
Why make FLOSS?
Well, I can’t speak for everyone, but I can tell you something about people like me. We didn’t get into FLOSS because we expected to monetize our work. We didn’t publish it because we expected to get some well-paid job, even if we believed it might help. We’ve been doing FLOSS because we actually enjoyed doing it.
Perhaps we just had a creative spark that needed to find an outlet. Or perhaps we found out that we have a talent for it, and we wanted to use it to make the world a little better for others. Or maybe we just needed a tool, and we figured out: why not share it?
Of course, that might have been a while ago. Perhaps we were students with lots of free time. Or perhaps we had a good job. Perhaps we were young, healthy, fascinated by technology, and relatively free. And we enjoyed the community.
So what changed?
Job-related burnout
In my opinion, the biggest source of burnout is actually our dayjobs. I’m not saying all dayjobs are bad (and I’m really grateful for mine), but a lot of IT jobs have been becoming substantially worse over the years.
It may be hard to find time and energy to work on FLOSS while working a full-time job. It is even harder if that full-time job also involves sitting in front of a monitor. But it’s especially bad if your job is also turning to shit in every possible way: maybe you’re being overworked, or people around you are toxic, or perhaps you’re employer is becoming increasingly evil, or maybe you just have a depressing bullshit job. Some people actually can cope with that, and find FLOSS a joyful counterbalance. Others burn out.
Can money solve this? Perhaps. But can you give people enough money to survive, so they could quit their job? And can you guarantee that you won’t cut it in a few years, and they’ll suddenly find out that they are without source of income and suddenly need to find a good job?
And perhaps even more importantly, can you actually manage to do that without effectively turning their FLOSS experience into another bad job? Because even if you promise you don’t expect them to work on the project full-time, even if you promise you don’t expect any specific results, it will still feel like an obligation. And that may take away all the fun.
Of course, some donations can help. Especially if they are “unmarked” donations that are clearly tokens of appreciation for all the job a person has done. Perhaps they will suffice for someone to reduce their work hours (provided their employer actually allows that) and have more energy for FLOSS.
Unfortunately, dayjobs aren’t the only problem people are facing.
FLOSS-related burnout
Software projects don’t exist in a void. They create communities around them. And these communities form a part of a bigger community. Many of us enjoy FLOSS not only because of what we can create, but also because we become part of a big community of people who love FLOSS. Or at least that’s how I used to feel.
Obviously, things weren’t always perfect. Every community accumulates some people with strong opinions, and things could get quite toxic at times. Still, from retrospective you realize that in the end you all shared a common goal: you wanted to make the world a better place. Surely, you may have vehemently disagreed on how to get there, but you could generally assume you both had a constructive goal.
Nowadays, I’m not so sure anymore. Perhaps it’s just negative experiences accumulating, but I feel that the community is progressively becoming worse. On one hand, a number of corporations have figured out that they can treat the FLOSS community as a source of free labor. On the other, more and more people are joining with the intent to monetize, to sell, or to benefit in some other way. People are copying the worst corporate malpractices, or perhaps cosplaying entrepreneurs.
The FLOSS world is a world of interdependence, and if you can’t trust that others share roughly the same values as you do, how can you not burn out?
Intermission: a personal story
Perhaps Linux distribution developers are particularly affected by that, because we are effectively dependent on pretty much every other software project out there that our users may wish to use.
I have felt it pretty badly when Cryptography started requiring Rust; Rust went against so many principles I believed in. It felt like saying: “Hey, have you seen this new cool language? Well, it effectively forces us to vendor hundreds of pinned dependencies, building takes forever and, oh, we declare a quarter of the architectures you supported dead; but hey, it’s open source, you can port Rust to them!”
Of course that sucked, and it still sucks. It’s as if a professional wrestler hit you repeatedly, and then told you that you just to have to work out some more. In your free time, obviously. When you’re not busy doing your dayjob or being hit. Oh, and the gym is moving elsewhere every month, and you have to start over.
So yes, nowadays I’m spending significant time dealing with the fallout from projects embracing Rust. Or perhaps just random pure Python packages deciding that it would be an awesome idea to embrace uv-build, and require Rust to build pure Python packages to shave a few milliseconds of wheel build time. That surely doesn’t contribute to me burning out.
And nowadays there’s the AI bubble. Unsurprisingly, a lot of people are enjoying being able to tell a machine to spew a lot of slop for them and then publish it. So I file bug reports about the broken code their plagiarism machine generated, and they answer me with more slop. And I do wonder: why am I even bothering? And I feel depressed, and burned out.
The sense of responsibility
In the end, when you create something, you are responsible for it.
Perhaps not all people feel that way. Perhaps some are perfectly happy to finish a project quickly, and sell it before the technical debt surfaces. Others just love to point out that if you don’t like something, you can always fork it. Because you obviously have lots of free time and energy to maintain other people’s projects while they’re busy spitting on you.
The problem is, responsibility turns a hobby project into a part-time job. Perhaps it’s just that the thrill of creating something new is replaced by the dullness of duty. Perhaps it turns out that the library you just made a huge API change that requires major rewrite of your logic. Perhaps the task turned out more complex than you imagined, and what worked for you brought a lot of user problems. Maybe the whole design was wrong (but if that’s the case, then you may need to rewrite it, and you’ll feel the thrill for a while again, followed by the pain of having to restore feature parity). Or perhaps you just have that one user who’s a real asshole capable of making you regret opening your inbox.
Of course, that doesn’t always happen. Sometimes you keep enjoying working on something. Something a great community builds up around your project. Sometimes you find co-maintainers who you enjoy working with. And sometimes, you eventually stop feeling the weigh of responsibility: because you know you have other people you can trust, and eventually you can leave the project in their hands.
However, this is one more burnout source that money can’t always fix. In fact, it can make it worse: by confirming that your project is a part-time job now and cementing the sense of responsibility. It can help a lot if you can still enjoy working on the project, but just pouring money and making demands will only make things worse.
Epilogue: the spiral
Burnout is a serious problem in FLOSS. What used to be your hobby now becomes a painful responsibility. Your dayjob may be sucking the life of you, and your users may be giving you another part-time job. However, the real problem is that it’s spiraling.
Toxicity leads to burnout. Burnout leads to more toxicity. People don’t want to interact with one another. The community fractures. A new dynamic emerges: people no longer want to work together to create better software; they just want to get their job done. They don’t file bugs, they fork. They don’t address bugs, they tell you to fork. They start using LLMs because they don’t want to maintain their code anymore. Software turns into slop, which burns out even more people.
It’s a cancer that’s consuming the FLOSS community I love so much. And the hardest part is that resisting takes so much effort. You end spending all your energy keeping things from falling apart completely, you can’t really dedicate it to things you enjoy. And again, you burn out.
What can we do? I don’t know. I really don’t know, and it frustrates me to no end. If we just accept this as the “new normal”, maybe we’ll learn to cope and suffer less. Or maybe the dam will break and we will drown. That’s not a risk I’m willing to take right now, but I don’t know how long I’m going to be able to hold.
February 26 2026
In Memory of Hans de Graaff
We share the tragic news that Hans de Graaff (graaff), a longtime Gentoo developer, has passed away.
Hans was a dedicated member of the Gentoo community for over 20 years, near single-handedly maintaining Ruby ecosystem support. He also brought his careful attention to important security work in Gentoo in the last few years.
Kind, patient, and dedicated - we mourn the loss of a wonderful colleague.
Our deepest condolences to his family. Donations in his memory can be made for CAR T cell therapy at the LUMC Foundation.
Please join us in remembering Hans on the Gentoo forums. Details on the funeral (including an online stream) to be held on 2026-03-02 can be obtained by contacting Elvike Reitsma (elvike AT winkwaves.com).
February 16 2026
Gentoo on Codeberg
Gentoo now has a presence on Codeberg, and contributions can be submitted for the Gentoo repository mirror at https://codeberg.org/gentoo/gentoo as an alternative to GitHub. Eventually also other git repositories will become available under the Codeberg Gentoo organization. This is part of the gradual mirror migration away from GitHub, as already mentioned in the 2025 end-of-year review. Codeberg is a site based on Forgejo, maintained by a dedicated non-profit organization, and located in Berlin, Germany. Thanks to everyone who has helped make this move possible!
These mirrors are for convenience for contribution and we continue to host our own repositories, just like we did while using GitHub mirrors for ease of contribution too.
Submitting pull requests
If you wish to submit pull requests on Codeberg, it is recommended to use the AGit approach as it is more space efficient and does not require you to maintain a fork of gentoo.git on your own Codeberg profile. To set it up, clone the upstream URL and check out a branch locally:
git clone git@git.gentoo.org:repo/gentoo.git
cd gentoo
git remote add codeberg ssh://git@codeberg.org/gentoo/gentoo
git checkout -b my-new-fixes
Once you’re ready to create your PR:
git push codeberg HEAD:refs/for/master -o topic="$title"
and the PR should be created automatically. To push additional commits, repeat the above command - be sure that the same topic is used. If you wish to force-push updates (because you’re amending commits), add “-o force-push=true” to the above command.
More documentation can be found on our wiki.
January 05 2026
2025 in retrospect & happy new year 2026!
Happy New Year 2026! Once again, a lot has happened in Gentoo over the past months. New developers,
more binary packages, GnuPG alternatives support, Gentoo for WSL, improved Rust bootstrap, better NGINX packaging, …
As always here
we’re going to revisit all the exciting news from our favourite Linux distribution.
Gentoo in numbers
Gentoo currently consists of 31663 ebuilds for 19174 different packages. For amd64 (x86-64), there are 89 GBytes of binary packages available on the mirrors. Gentoo each week builds 154 distinct installation stages for different processor architectures and system configurations, with an overwhelming part of these fully up-to-date.
The number of commits to the main ::gentoo repository has remained at an overall high level in 2025, with a slight decrease from 123942 to 112927. The number of commits by external contributors was 9396, now across 377 unique external authors.
GURU, our user-curated repository with a trusted user model, as entry point for potential developers, has shown a decrease in activity. We have had 5813 commits in 2025, compared to 7517 in 2024. The number of contributors to GURU has increased, from 241 in 2024 to 264 in 2025. Please join us there and help packaging the latest and greatest software. That’s the ideal preparation for becoming a Gentoo developer!
Activity has slowed down somewhat on the Gentoo bugtracker bugs.gentoo.org, where we’ve had 20763 bug reports created in 2025, compared to 26123 in 2024. The number of resolved bugs shows the same trend, with 22395 in 2025 compared to 25946 in 2024. The current values are closer to those of 2023 - but clearly this year we fixed more than we broke!
New developers
In 2025 we have gained four new Gentoo developers. They are in chronological order:
-
Jay Faulkner (jayf):
Jay joined us in March from Washington, USA. In Gentoo and open source in general, he’s very much involved with OpenStack; further, he’s a a big sports fan, mainly ice hockey and NASCAR racing, and already long time Gentoo enthusiast.
-
Michael Mair-Keimberger (mm1ke):
Michael joined us finally in June from Austria, after already amassing over 9000 commits beforehand. Michael works as Network Security Engineer for a big System House in Austria and likes to go jogging regulary and hike the mountains on weekends. In Gentoo, he’s active in quality control and cleanup.
-
Alexander Puck Neuwirth (apn-pucky):
Alexander, a physics postdoc, joined us in July from Italy. At the intersection of Computer Science, Linux, and high-energy physics, he already uses Gentoo to manage his code and sees it as a great development environment. Beyond sci-physics, he’s also interested in continuous integration and RISC-V.
-
Jaco Kroon (jkroon):
Jaco signed up as developer in October from South Africa. He is a system administrator who works for a company that runs and hosts multiple Gentoo installations, and has been around in Gentoo since 2003! Among our packages, Asterisk is one example of his interests.
Featured changes and news
Let’s now look at the major improvements and news of 2025 in Gentoo.
Distribution-wide Initiatives
-
Goodbye Github, welcome Codeberg: Mostly because of the continuous attempts to force Copilot usage
for our repositories, Gentoo currently considers and plans the migration of our repository mirrors and pull
request contributions to Codeberg. Codeberg is a site based on
Forgejo, maintained by a non-profit organization, and located in Berlin, Germany. Gentoo
continues to host its own primary git, bugs, etc infrastructure and has no plans to change that. -
EAPI 9: The wording for EAPI 9, a new version of the specifications for our ebuilds, has been finalized and approved, and support in Portage is complete. New features in EAPI 9 include pipestatus for better error handling, an edo function for printing a command and executing it, a cleaner environment for the build processes, and the possibility of declaring a default EAPI for the profile directory tree.
-
Event presence: At FOSDEM 2025 in Brussels, Gentoo has been present
once more with a stand, this year together with Flatcar Container Linux (which
is based on Gentoo). Naturally we had mugs, stickers, t-shirts, and of course the famous self-compiled buttons…
Further, we have been present at
FrOSCon 2025 in Sankt Augustin with workshops
Gentoo installation and
configuration and Writing
your own ebuilds. Last but not least, the toolchain team has represented Gentoo at the
GNU Tools Cauldron 2025 in Porto. -
SPI migration: The migration of our financial structure to Software in
the Public Interest (SPI) is continuing slowly but steadily, with expense payments following the moving intake.
If you are donating to Gentoo, and especially if you are a recurrent donor, please change your payments to be directed
to SPI; see also our donation web page. -
Online workshops: Our German support, Gentoo e.V., is grateful to the speakers and participants of four online workshops in 2025 in German and English, on topics as varied as EAPI 9 or GnuPG and LibrePGP. We are looking forward to more exciting events in 2026.
Architectures
-
RISC-V bootable QCOW2:
Same as for amd64 and arm64, also for RISC-V we now have ready-made bootable disk images in QCOW2 format
available for download on our mirrors in a console and
a cloud-init variant. The disk images use the rv64gc instruction set and the lp64d ABI, and can be booted via
the standard RISC-V UEFI support. -
Gentoo for WSL: We now publish weekly Gentoo images for Windows
Subsystem for Linux (WSL), based on the amd64 stages,
see our mirrors.
While these images are not present in the Microsoft store yet, that’s something we intend to fix soon. -
hppa and sparc destabilized: Since we do not have hardware readily available anymore and these architectures mostly fill a retrocomputing niche, stable keywords have been dropped for both hppa (PA-RISC) and sparc. The architectures will remain supported with testing keywords.
-
musl with locales: Localization support via the package
sys-apps/musl-locales has been added by default
to the Gentoo stages based on the lightweight musl C library.
Packages
-
GPG alternatives: Given the unfortunate fracturing of the GnuPG / OpenPGP / LibrePGP ecosystem due to competing standards,
we now provide an alternatives mechanism to choose the system gpg provider and ease compatibility testing. At the moment,
the original, unmodified GnuPG, the FreePG fork/patchset
as also used in many other Linux distributions (Fedora, Debian, Arch, …), and the re-implementation
Sequoia-PGP with
Chameleon
are available. In practice, implementation details vary between the providers, and while GnuPG and FreePG are fully supported,
you may still encounter difficulties when selecting Sequoia-PGP/Chameleon. -
zlib-ng support: We have introduced initial support for using zlib-ng and minizip-ng in compatibility mode in place of the reference zlib libraries.
-
System-wide jobserver: We have created steve, an implementation of a
token-accounting system-wide jobserver, and introduced experimental global jobserver support in Portage. Thanks to that, it
is now possible to globally control the concurrently running build job count, correctly accounting for parallel emerge jobs,
make and ninja jobs, and other clients supporting the jobserver protocol. -
NGINX rework: The packaging of the NGINX web server and reverse proxy in Gentoo has undergone a major improvement, including also the splitting off of several third-party modules into separate packages.
-
C++ based Rust bootstrap: We have added a bootstrap path for Rust from C++ using
Mutabah’s Rust compiler mrustc, which alleviates the
need for pre-built binaries and makes it significantly easier to support more configurations. -
Ada and D bootstrap: Similarly, Ada and D support in gcc now have clean bootstrap paths, which makes enabling these in the compiler as easy as switching the useflags on gcc and running emerge.
-
FlexiBLAS: Gentoo has adopted the new FlexiBLAS wrapper
library as the primary way of switching implementations of the BLAS numerical algorithm library at runtime.
This automatically also provides ABI stability for linking programs and bundles the specific treatment of different BLAS
variants in one place. -
Python: In the meantime the default Python version in Gentoo has reached Python 3.13. Additionally we have also Python 3.14 available stable - fully up to date with upstream.
-
KDE upgrades: As of end of 2025, in Gentoo stable we have KDE Gear 25.08.3, KDE Frameworks 6.20.0, and KDE Plasma 6.5.4. As always, Gentoo testing follows the newest upstream releases (and using the KDE overlay you can even install from git sources).
Physical and Software Infrastructure
-
Additional build server: A second dedicated build server, hosted at Hetzner Germany, has been added to speed up the generation of installation stages, iso and qcow2 images, and binary packages.
-
Documentation: Documentation work has made constant progress on wiki.gentoo.org. The Gentoo Handbook had some particularly useful updates, and the documentation received lots of improvements and additions from the many active volunteers. There are currently 9,647 pages on the wiki, and there have been 766,731 edits since the project started. Please help Gentoo by contributing to documentation!
Finances of the Gentoo Foundation
-
Income: The Gentoo Foundation took in $12,066 in fiscal year 2025 (ending 2025/06/30); the dominant part
(over 80%) consists of individual cash donations from the community. On the SPI side, we received $8,471
in the same period as fiscal year 2025; also here, this is all from small individual cash donations. - Expenses: Our expenses in 2025 were, program services (e.g. hosting costs) $8,332, management & general (accounting) $1,724, fundraising $905, and non-operating (depreciation expenses) $10,075.
- Balance: We have $104,831 in the bank as of July 1, 2025 (which is when our fiscal year 2026 starts for accounting purposes). The Gentoo Foundation FY2025 financial statement is available on the Gentoo Wiki.
- Transition to SPI: The Foundation encourages donors to ensure their ongoing contributions are going to SPI - more than 40 donors had not responded to requests to move the recurring donations by the end of the year. Expenses will be moved to the SPI structure as ongoing income permits.
Thank you!
As every year, we would like to thank all Gentoo developers and all who have submitted contributions for their relentless everyday Gentoo work. If you are interested and would like to help, please join us to make Gentoo even better! As a volunteer project, Gentoo could not exist without its community.
December 26 2025
FOSDEM 2026
Once again it’s FOSDEM time! Join us at Université Libre de Bruxelles, Campus du Solbosch, in Brussels, Belgium. The upcoming FOSDEM 2026 will be held on January 31st and February 1st 2026. If you visit FOSDEM, make sure to come by at our Gentoo stand (exact location still to be announced), for the newest Gentoo news and Gentoo swag. Also, this year there will be a talk about the official Gentoo binary packages in the Distributions devroom. Visit our Gentoo wiki page on FOSDEM 2026 to see who’s coming and for more practical information.
November 30 2025
One jobserver to rule them all
A common problem with running Gentoo builds is concurrency. Many packages include extensive build steps that are either fully serial, or cannot fully utilize the available CPU threads throughout. This problem becomes less pronounced when running building multiple packages in parallel, but then we are risking overscheduling for packages that do take advantage of parallel builds.
Fortunately, there are a few tools at our disposal that can improve the situation. Most recently, they were joined by two experimental system-wide jobservers: guildmaster and steve. In this post, I’d like to provide the background on them, and discuss the problems they are facing.
The job multiplication problem
You can use the MAKEOPTS variable to specify a number of parallel jobs to run:
MAKEOPTS="-j12"
This is used not only by GNU make, but it is also recognized by a plethora of eclasses and ebuilds, and converted into appropriate options for various builders, test runners and other tools that can benefit from concurrency. So far, that’s good news; whenever we can, we’re going to run 12 jobs and utilize all the CPU threads.
The problems start when we’re running multiple builds in parallel. This could be either due to running emerge --jobs, or simply needing to start another emerge process. The latter happens to me quite often, as I am testing multiple packages simultaneously.
For example, if we end up building four packages simultaneously, and all of them support -j, we may end up spawning 48 jobs. The issue isn’t just saturating the CPU; imagine you’re running 48 memory-hungry C++ compilers simultaneously!
Load-average scheduling to the rescue
One possible workaround is to use the --load-average option, e.g.:
MAKEOPTS="-j12 -l13"
This causes tools supporting the option not to start new jobs if the current load exceeds 13, which roughly approximates 13 processes running simultaneously. However, the option isn’t universally supported, and the exact behavior differs from tool to tool. For example, CTest doesn’t start any jobs when the load is exceeded, effectively stopping test execution, whereas GNU make and Ninja throttle themselves down to one job.
Of course, this is a rough approximation. While GNU make attempts to establish the current load from /proc/loadavg, most tools just use the one-minute average from getloadavg(), suffering from some lag. It is entirely possible to end up with interspersed periods of overscheduling while the load is still ramping up, followed by periods of underscheduling before it decreases again. Still, it is better than nothing, and can become especially useful for providing background load for other tasks: a build process that can utilize the idle CPU threads, and back down when other builds need them.
The nested Makefile problem and GNU Make jobserver
Nested Makefiles are processed by calling make recursively, and therefore face a similar problem: if you run multiple make processes in parallel, and they run multiple jobs simultaneously, you end up overscheduling. To avoid this, GNU make introduces a jobserver. It ensures that the specified job number is respected across multiple make invocations.
At the time of writing, GNU make supports three kinds of the jobserver protocol:
- The legacy Unix pipe-based protocol that relied on passing file descriptors to child processes.
- The modern Unix protocol using a named pipe.
- The Windows protocol using a shared semaphore.
All these variants follow roughly the same design principles, and are peer-to-peer protocols for using shared state rather than true servers in the network sense. The jobserver’s role is mostly limited to initializing the state and seeding it with an appropriate number of job tokens. Afterwards, clients are responsible for acquiring a token whenever they are about to start a job, and returning it once the job finishes. The availability of job tokens therefore limits the total number of processes started.
The flexibility of modern protocols permitted more tools to support them. Notably, the Ninja build system recently started supporting the protocol, therefore permitting proper parallelism in complex build systems combining Makefiles and Ninja. The jobserver protocol is also supported by Cargo and various Rust tools, GCC and LLVM, where it can be used to limit the number of parallel LTO jobs.
A system-wide jobserver
With a growing number of tools becoming capable of parallel processing, and at the same time gaining support for the GNU make jobserver protocol, it starts being an interesting solution to the overscheduling problem. If we could run one jobserver shared across all build processes, we could control the total number of jobs running simultaneously, and therefore have all the simultaneously running builds dynamically adjust one to another!
In fact, this is not a new idea. A bug requesting jobserver integration has been filed for Portage back in 2019. NixOS jobserver effort dates back at least to 2021, though it has not been merged yet. Guildmaster and steve joined the effort very recently.
There are two primary problems with using a system-wide jobserver: token release reliability, and the “implicit slot” problem.
The token release problem
The first problem is more important. As noted before, the jobserver protocol relies entirely on clients releasing the job tokens they acquired, and the documentation explicitly emphasizes that they must be returned even in error conditions. Unfortunately, this is not always possible: if the client gets killed, it cannot run any cleanup code and therefore return the tokens! For scoped jobservers like GNU make’s this usually isn’t that much of a problem, since make normally terminates upon a child being killed. However, a system jobserver could easily be left with no job tokens in the queue this way!
This problem cannot really be solved within the strict bounds of the jobserver protocol. After all, it is just a named pipe, and there are limits to how much you can monitor what’s happening to the pipe buffer. Fortunately, there is a way around that: you can implement a proper server for the jobserver protocol using FUSE, and provide it in place of the named pipe. Good news is, most of the tools don’t actually check the file type, and these that do can easily be patched.
The current draft of NixOS jobserver provides a regular file with special behavior via FUSE, whereas guildmaster and steve both provide a character device via its CUSE API. NixOS jobserver and guildmaster both return unreleased tokens once the process closes the jobserver file, whereas steve returns them once the process acquiring them exits. This way, they can guarantee that a process that either can’t release its tokens (e.g. because it’s been killed), or one that doesn’t because of implementation issue (e.g. Cargo), doesn’t end up effectively locking other builds. It also means we can provide live information on which processes are holding the tokens, or even implement additional features such as limiting token provision based on the system load, or setting per-process limits.
The implicit slot problem
The second problem is related to the implicit assumption that a jobserver is inherited from a parent GNU make process that already acquired a token to spawn the subprocess. Since the make subprocess doesn’t really do any work itself, it can “use” the token to spawn another job instead. Therefore, every GNU make process running under a jobserver has one implicit slot that runs jobs without consuming any tokens. If the jobserver is running externally and no job tokens were acquired while running the top make process, it ends up running an extra process without a job token: so steve -j12 permits 12 jobs, plus one extra job for every package being built.
Fortunately, the solution is rather simple: one needs to implement token acquisition at Portage level. Portage acquires a new token prior to starting a build job, and releases it once the job finishes. In fact, this solves two problems: it accounts for the implicit slot in builders implementing the jobserver protocol, and it limits the total number of jobs run for parallel builds.
However, this is a double-edged sword. On one hand, it limits the risk of overscheduling when running parallel build jobs. On the other, it means that a new emerge job may not be able to start immediately, but instead wait for other jobs to free up job tokens first, negatively affecting interactivity.
A semi-related issue is that acquiring a single token doesn’t properly account for processes that are parallel themselves but do not implement the jobserver protocol, such as pytest-xdist runs. It may be possible to handle these better by acquiring multiple tokens prior to running them (or possibly while running them), but in the former case one needs to be careful to acquire them atomically, and not end up with the equivalent of lock contention: two processes acquiring part of the tokens they require, and waiting forever for more.
The implicit slot problem also causes issues in other clients. For example, nasm-rs writes an extra token to the jobserver pipe to avoid special-casing the implicit slot. However, this violates the protocol and breaks clients with per-process tokens. Steve carries a special workaround for that package.
Summary
A growing number of tools is capable of some degree of concurrency: from builders traditionally being able to start multiple parallel jobs, to multithreaded compilers. While they provide some degree of control over how many jobs to start, avoiding overscheduling while running multiple builds in parallel is non-trivial. Some builders can use load average to partially mitigate the issue, but that’s far from a perfect solution.
Jobservers are our best bet right now. Originally designed to handle job scheduling for recursive GNU make invocations, they are being extended to control other parallel processes throughout the build, and can be further extended to control the job numbers across different builds, and even across different build containers.
While NixOS seems to have dropped the ball, Gentoo is now finally actively pursuing global jobserver support. Guildmaster and steve both prove that the server-side implementation is possible, and integration is just around the corner. At this point, it’s not clear whether a jobserver-enabled systems are going to become the default in the future, but certainly it’s an interesting experiment to carry.
October 12 2025
How we incidentally uncovered a 7-year old bug in gentoo-ci
“Gentoo CI” is the service providing periodic linting for the Gentoo repository. It is a part of the Repository mirror and CI project that I’ve started in 2015. Of course, it all started as a temporary third-party solution, but it persisted, was integrated into Gentoo Infrastructure and grew organically into quite a monstrosity.
It’s imperfect in many ways. In particular, it has only some degree of error recovery and when things go wrong beyond that, it requires a manual fix. Often the “fix” is to stop mirroring a problematic repository. Over time, I’ve started having serious doubts about the project, and proposed sunsetting most of it.
Lately, things have been getting worse. What started as a minor change in behavior of Git triggered a whole cascade of failures, leading to me finally announcing the deadline for sunsetting the mirroring of third-party repositories, and starting ripping non-critical bits out of it. Interesting enough, this whole process led me to finally discover the root cause of most of these failures — a bug that has existed since the very early version of the code, but happened to be hidden by the hacky error recovery code. Here’s the story of it.
Repository mirror and CI is basically a bunch of shell scripts with Python helpers run via a cronjob (repo-mirror-ci code). The scripts are responsible for syncing the lot of public Gentoo repositories, generating caches for them, publishing them onto our mirror repositories, and finally running pkgcheck on the Gentoo repository. Most of the “unexpected” error handling is set -e -x, with a dumb logging to a file, and mailing on a cronjob failure. Some common errors are handled gracefully though — sync errors, pkgcheck failures and so on.
The whole cascade started when Git was upgraded on the server. The upgrade involved a change in behavior where git checkout -- ${branch} stopped working; you could only specify files after the --. The fix was trivial enough.
However, once the issue was fixed I’ve started periodically seeing sync failures from the Gentoo repository. The scripts had a very dumb way of handling sync failures: if syncing failed, they removed the local copy entirely and tried again. This generally made sense — say, if upstream renamed the main branch, git pull would fail but a fresh clone would be a cheap fix. However, the Gentoo repository is quite big and when it gets removed due to sync failure, cloning it afresh from the Gentoo infrastructure failed.
So when it failed, I did a quick hack — I’ve cloned the repository manually from GitHub, replaced the remote and put it in place. Problem solved. Except a while later, the same issue surfaced. This time I kept an additional local clone, so I wouldn’t have to fetch it from server, and added it again. But then, it got removed once more, and this was really getting tedious.
What I have assumed then is that the repository is failing to sync due to some temporary problems, either network or Infrastructure related. If that were the case, it really made no sense to remove it and clone afresh. On top of that, since we are sunsetting support for third-party repositories anyway, there is no need for automatic recovery from issues such as branch name changes. So I removed that logic, to have sync fail immediately, without removing the local copy.
Now, this had important consequences. Previously, any failed sync would result in the repository being removed and cloned again, leaving no trace of the original error. On top of that, a logic stopping the script early when the Gentoo repository failed meant that the actual error wasn’t even saved, leaving me only with the subsequent clone failures.
When the sync failed again (and of course it did), I was able to actually investigate what was wrong. What actually happened is that the repository wasn’t on a branch — the checkout was detached at some commit. Initially, I assumed this was some fluke, perhaps also related to the Git upgrade. I’ve switched manually to master, and that fixed it. Then it broke again. And again.
So far I’ve been mostly dealing with the failures asynchronously — I wasn’t around at the time of the initial failure, and only started working on it after a few failed runs. However, finally the issue resurfaced so fast that I was able to connect the dots. The problem likely happened immediately after gentoo-ci hit an issue, and bisected it! So I’ve started suspecting that there is another issue in the scripts, perhaps another case of missed --, but I couldn’t find anything relevant.
Finally, I’ve started looking at the post-bisect code. What we were doing is calling git rev-parse HEAD prior to bisect, and then using that result in git checkout. This obviously meant that after every bisect, we ended up with detached tree, i.e. precisely the issue I was seeing. So why didn’t I notice this before?
Of course, because of the sync error handling. Once bisect broke the repository, next sync failed and the repository got cloned again, and we never noticed anything was wrong. We only started noticing once cloning started failing. So after a few days of confusion and false leads, I finally fixed a bug that was present for over 7 years in production code, and caused the Gentoo repository to be cloned over and over again whenever any bad commit happened.
July 26 2025
EPYTEST_PLUGINS and other goodies now in Gentoo
If you are following the gentoo-dev mailing list, you may have noticed that there’s been a fair number of patches sent for the Python eclasses recently. Most of them have been centered on pytest support. Long story short, I’ve came up with what I believed to be a reasonably good design, and decided it’s time to stop manually repeating all the good practices in every ebuild separately.
In this post, I am going to shortly summarize all the recently added options. As always, they are all also documented in the Gentoo Python Guide.
The unceasing fight against plugin autoloading
The pytest test loader defaults to automatically loading all the plugins installed to the system. While this is usually quite convenient, especially when you’re testing in a virtual environment, it can get quite messy when you’re testing against system packages and end up with lots of different plugins installed. The results can range from slowing tests down to completely breaking the test suite.
Our initial attempts to contain the situation were based on maintaining a list of known-bad plugins and explicitly disabling their autoloading. The list of disabled plugins has gotten quite long by now. It includes both plugins that were known to frequently break tests, and these that frequently resulted in automagic dependencies.
While the opt-out approach allowed us to resolve the worst issues, it only worked when we knew about a particular issue. So naturally we’d miss some rarer issue, and learn only when arch testing workflows were failing, or users reported issues. And of course, we would still be loading loads of unnecessary plugins at the cost of performance.
So, we started disabling autoloading entirely, using PYTEST_DISABLE_PLUGIN_AUTOLOAD environment variable. At first we only used it when we needed to, however over time we’ve started using it almost everywhere — after all, we don’t want the test suites to suddenly start failing because of a new pytest plugin installed.
For a long time, I have been hesitant to disable autoloading by default. My main concern was that it’s easy to miss a missing plugin. Say, if you ended up failing to load pytest-asyncio or a similar plugin, all the asynchronous tests would simply be skipped (verbosely, but it’s still easy to miss among the flood of warnings). However, eventually we started treating this warning as an error (and then pytest started doing the same upstream), and I have decided that going opt-in is worth the risk. After all, we were already disabling it all over the place anyway.
EPYTEST_PLUGINS
Disabling plugin autoloading is only the first part of the solution. Once you disabled autoloading, you need to load the plugins explicitly — it’s not sufficient anymore to add them as test dependencies, you also need to add a bunch of -p switches. And then, you need to keep maintaining both dependencies and pytest switches in sync. So you’d end up with bits like:
BDEPEND="
test? (
dev-python/flaky[${PYTHON_USEDEP}]
dev-python/pytest-asyncio[${PYTHON_USEDEP}]
dev-python/pytest-timeout[${PYTHON_USEDEP}]
)
"
distutils_enable_tests pytest
python_test() {
local -x PYTEST_DISABLE_PLUGIN_AUTOLOAD=1
epytest -p asyncio -p flaky -p timeout
}
Not very efficient, right? The idea then is to replace all that with a single EPYTEST_PLUGINS variable:
EPYTEST_PLUGINS=( flaky pytest-{asyncio,timeout} )
distutils_enable_tests pytest
And that’s it! EPYTEST_PLUGINS takes a bunch of Gentoo package names (without category — almost all of them reside in dev-python/, and we can special-case the few that do not), distutils_enable_tests adds the dependencies and epytest (in the default python_test() implementation) disables autoloading and passes the necessary flags.
Now, what’s really cool is that the function will automatically determine the correct argument values! This can be especially important if entry point names change between package versions — and upstreams generally don’t consider this an issue, since autoloading isn’t affected.
Going towards no autoloading by default
Okay, that gives us a nice way of specifying which plugins to load. However, weren’t we talking of disabling autoloading by default?
Well, yes — and the intent is that it’s going to be disabled by default in EAPI 9. However, until then there’s a simple solution we encourage everyone to use: set an empty EPYTEST_PLUGINS. So:
EPYTEST_PLUGINS=() distutils_enable_tests pytest
…and that’s it. When it’s set to an empty list, autoloading is disabled. When it’s unset, it is enabled for backwards compatibility. And the next pkgcheck release is going to suggest it:
dev-python/a2wsgi EPyTestPluginsSuggestion: version 1.10.10: EPYTEST_PLUGINS can be used to control pytest plugins loaded
EPYTEST_PLUGIN* to deal with special cases
While the basic feature is neat, it is not a golden bullet. The approach used is insufficient for some packages, most notably pytest plugins that run a pytest subprocesses without appropriate -p options, and expect plugins to be autoloaded there. However, after some more fiddling we arrived at three helpful features:
- EPYTEST_PLUGIN_LOAD_VIA_ENV that switches explicit plugin loading from -p arguments to PYTEST_PLUGINS environment variable. This greatly increases the chance that subprocesses will load the specified plugins as well, though it is more likely to cause issues such as plugins being loaded twice (and therefore is not the default). And as a nicety, the eclass takes care of finding out the correct values, again.
- EPYTEST_PLUGIN_AUTOLOAD to reenable autoloading, effectively making EPYTEST_PLUGINS responsible only for adding dependencies. It’s really intended to be used as a last resort, and mostly for future EAPIs when autoloading will be disabled by default.
- Additionally, EPYTEST_PLUGINS can accept the name of the package itself (i.e. ${PN}) — in which case it will not add a dependency, but load the just-built plugin.
How useful is that? Compare:
BDEPEND="
test? (
dev-python/pytest-datadir[${PYTHON_USEDEP}]
)
"
distutils_enable_tests pytest
python_test() {
local -x PYTEST_DISABLE_PLUGIN_AUTOLOAD=1
local -x PYTEST_PLUGINS=pytest_datadir.plugin,pytest_regressions.plugin
epytest
}
…and:
EPYTEST_PLUGINS=( "${PN}" pytest-datadir )
EPYTEST_PLUGIN_LOAD_VIA_ENV=1
distutils_enable_tests pytest
Old and new bits: common plugins
The eclass already had some bits related to enabling common plugins. Given that EPYTEST_PLUGINS only takes care of loading plugins, but not passing specific arguments to them, they are still meaningful. Furthermore, we’ve added EPYTEST_RERUNS.
The current list is:
- EPYTEST_RERUNS=... that takes a number of reruns and uses pytest-rerunfailures to retry failing tests the specified number of times.
- EPYTEST_TIMEOUT=... that takes a number of seconds and uses pytest-timeout to force a timeout if a single test does not complete within the specified time.
- EPYTEST_XDIST=1 that enables parallel testing using pytest-xdist, if the user allows multiple test jobs. The number of test jobs can be controlled (by the user) by setting EPYTEST_JOBS with a fallback to inferring from MAKEOPTS (setting to 1 disables the plugin entirely).
The variables automatically add the needed plugin, so they do not need to be repeated in EPYTEST_PLUGINS.
JUnit XML output and gpy-junit2deselect
As an extra treat, we ask pytest to generate a JUnit-style XML output for each test run that can be used for machine processing of test results. gpyutils now supply a gpy-junit2deselect tool that can parse this XML and output a handy EPYTEST_DESELECT for the failing tests:
$ gpy-junit2deselect /tmp/portage/dev-python/aiohttp-3.12.14/temp/pytest-xml/python3.13-QFr.xml EPYTEST_DESELECT=( tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_nonzero_passed tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_passed_to_create_connection tests/test_connector.py::test_tcp_connector_ssl_shutdown_timeout_zero_not_passed )
While it doesn’t replace due diligence, it can help you update long lists of deselects. As a bonus, it automatically collapses deselects to test functions, classes and files when all matching tests fail.
hypothesis-gentoo to deal with health check nightmare
Hypothesis is a popular Python fuzz testing library. Unfortunately, it has one feature that, while useful upstream, is pretty annoying to downstream testers: health checks.
The idea behind health checks is to make sure that fuzz testing remains efficient. For example, Hypothesis is going to fail if the routine used to generate examples is too slow. And as you can guess, “too slow” is more likely to happen on a busy Gentoo system than on dedicated upstream CI. Not to mention some upstreams plain ignore health check failures if they happen rarely.
Given how often this broke for us, we have requested an option to disable Hypothesis health checks long ago. Unfortunately, upstream’s answer can be summarized as: “it’s up to packages using Hypothesis to provide such an option, and you should not be running fuzz testing downstream anyway”. Easy to say.
Well, obviously we are not going to pursue every single package using Hypothesis to add a profile with health checks disabled. We did report health check failures sometimes, and sometimes got no response at all. And skipping these tests is not really an option, given that often there are no other tests for a given function, and even if there are — it’s just going to be a maintenance nightmare.
I’ve finally figured out that we can create a Hypothesis plugin — now hypothesis-gentoo — that provides a dedicated “gentoo” profile with all health checks disabled, and then we can simply use this profile in epytest. And how do we know that Hypothesis is used? Of course we look at EPYTEST_PLUGINS! All pieces fall into place. It’s not 100% foolproof, but health check problems aren’t that common either.
Summary
I have to say that I really like what we achieved here. Over the years, we learned a lot about pytest, and used that knowledge to improve testing in Gentoo. And after repeating the same patterns for years, we have finally replaced them with eclass functions that can largely work out of the box. This is a major step forward.
April 30 2025
Urgent - OSU Open Source Lab needs your help
Oregon State University’s Open Source Lab (OSL) has been a major supporter
of Gentoo Linux and many other software projects for years.
It is currently hosting several of our infrastructure servers as well as development machines for exotic
architectures, and is critical for Gentoo operation.
Due to drops in sponsor contributions, OSL has been operating at loss for a while, with the OSU College of Engineering picking up the rest of the bill. Now, university funding has been cut, this is not possible anymore, and unless US$ 250.000 can be provided within the next two weeks OSL will have to shut down. The details can be found in a blog post of Lance Albertson, the director of OSL.
Please, if you value and use Gentoo Linux or any of the other projects that OSL has been supporting, and if you are in a position to make funds available, if this is true for the company you work for, etc … contact the address in the blog post. Obviously, long-term corporate sponsorships would here serve best - for what it’s worth, OSL developers have ended up at almost every big US tech corporation by now. Right now probably everything helps though.
February 20 2025
Bootable Gentoo QCOW2 disk images - ready for the cloud!
We are very happy to announce new official
downloads on our website and our mirrors: Gentoo for amd64 (x86-64) and arm64 (aarch64),
as immediately bootable disk images in qemu’s QCOW2 format! The images, updated weekly,
include an EFI boot partition and a fully functional Gentoo installation; either with no
network activated but a password-less root login on the console (“no root pw”), or with
network activated, all accounts initially locked, but
cloud-init running on boot
(“cloud-init”). Enjoy, and
read on for more!
Questions and answers
How can I quickly test the images?
We recommend using the “no root password” images and qemu system emulation. Both amd64 and arm64 images have all the necessary drivers ready for that. Boot them up, use as login name “root”, and you will immediately get a fully functional Gentoo shell. The set of installed packages is similar to that of an administration or rescue system, with a focus more on network environment and less on exotic hardware. Of course you can emerge whatever you need though, and binary package sources are already configured too.
What settings do I need for qemu?
You need qemu with the target architecture (aarch64 or x86_64) enabled in QEMU_SOFTMMU_TARGETS, and the UEFI firmware.
app-emulation/qemu sys-firmware/edk2-bin
You should disable the useflag “pin-upstream-blobs” on qemu and update edk2-bin at least to the 2024 version. Also, since you probably want to use KVM hardware acceleration for the virtualization, make sure that your kernel supports that and that your current user is in the kvm group.
For testing the amd64 (x86-64) images, a command line could look like this, configuring 8G RAM and 4 CPU threads with KVM acceleration:
qemu-system-x86_64 \
-m 8G -smp 4 -cpu host -accel kvm -vga virtio -smbios type=0,uefi=on \
-drive if=pflash,unit=0,readonly=on,file=/usr/share/edk2/OvmfX64/OVMF_CODE_4M.qcow2,format=qcow2 \
-drive file=di-amd64-console.qcow2 &
For testing the arm64 (aarch64) images, a command line could look like this:
qemu-system-aarch64 \
-machine virt -cpu neoverse-v1 -m 8G -smp 4 -device virtio-gpu-pci -device usb-ehci -device usb-kbd \
-drive if=pflash,unit=0,readonly=on,file=/usr/share/edk2/ArmVirtQemu-AARCH64/QEMU_EFI.qcow2 \
-drive file=di-arm64-console.qcow2 &
Please consult the qemu documentation for more details.
Can I install the images onto a real harddisk / SSD?
Sure. Gentoo can do anything. The limitations are:
- you need a disk with sector size 512 bytes (otherwise the partition table of the image file will not work), see the “SSZ” value in the following example:
pinacolada ~ # blockdev --report /dev/sdb RO RA SSZ BSZ StartSec Size Device rw 256 512 4096 0 4000787030016 /dev/sdb
- your machine must be able to boot via UEFI (no legacy boot)
- you may have to adapt the configuration yourself to disks, hardware, …
So, this is an expert workflow.
Assuming your disk is /dev/sdb and has a size of at least 20GByte, you can then use the utility qemu-img to decompress the image onto the raw device. Warning, this obviously overwrites the first 20Gbyte of /dev/sdb (and with that the existing boot sector and partition table):
qemu-img convert -O raw di-amd64-console.qcow2 /dev/sdb
Afterwards, you can and should extend the new root partition with xfs_growfs, create an additional swap partition behind it, possibly adapt /etc/fstab and the grub configuration, …
If you are familiar with partitioning and handling disk images you can for sure imagine more workflow variants; you might find also the qemu-nbd tool interesting.
So what are the cloud-init images good for?
Well, for the cloud. Or more precisely, for any environment where a configuration data source for cloud-init is available. If this is already provided for you, the image should work out of the box. If not, well, you can provide the configuration data manually, but be warned that this is a non-trivial task.
Are you planning to support further architectures?
Eventually yes, in particular (EFI) riscv64 and loongarch64.
Are you planning to support legacy boot?
No, since the placement of the bootloader outside the file system complicates things.
How about disks with 4096 byte sectors?
Well… let’s see how much demand this feature finds. If enough people are interested, we should be able to generate an alternative image with a corresponding partition table.
Why XFS as file system?
It has some features that ext4 is sorely missing (reflinks and copy-on-write), but at the same time is rock-solid and reliable.
February 01 2025
Tinderbox shutdown
Due to the lack of hardware, the Tinderbox (and CI) service is no longer operational.
I would like to take this opportunity to thank all the people who have always seen the Tinderbox as a valuable resource and who have promptly addressed bugs, significantly improving the quality of the packages we have in Portage as well as the user experience.