|
|
@ -0,0 +1,213 @@
|
|
|
|
|
|
|
|
My gripes with systemd
|
|
|
|
|
|
|
|
@@@@@@@@@@@@@@@@@@@@@@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
:slug: systemd-gripes
|
|
|
|
|
|
|
|
:date: 2023-11-13 21:50:36
|
|
|
|
|
|
|
|
:tags: systemd, linux
|
|
|
|
|
|
|
|
:category: programming
|
|
|
|
|
|
|
|
:keywords: systemd, cgroup, programming, bug
|
|
|
|
|
|
|
|
:lang: en
|
|
|
|
|
|
|
|
:translation: false
|
|
|
|
|
|
|
|
:status: draft
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This is mainly me complaining about systemd, in very concrete instances. And I
|
|
|
|
|
|
|
|
will probably extend this in future, when some noteworthy bug appears again.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I think I could love systemd…
|
|
|
|
|
|
|
|
=============================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now, don't get me wrong. The *idea* behind systemd is a nice one, it is awesome
|
|
|
|
|
|
|
|
that we can track processes with cgroups and have fine-grained control over
|
|
|
|
|
|
|
|
them. In fact, as of now, my server uses this extensively to limit resources of
|
|
|
|
|
|
|
|
various services and allow me to always SSH to it, even when services try to
|
|
|
|
|
|
|
|
use all CPU and RAM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The idea with everything using sockets, starting asynchronously and waiting on
|
|
|
|
|
|
|
|
sockets for services that did not yet start also feels like a neat hack (I do
|
|
|
|
|
|
|
|
not yet know the downsides, it feels there are some, but whatever).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If this worked well, I would probably be happy. This is also the feeling I get
|
|
|
|
|
|
|
|
from the uselessd project and I am sad it died.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
… but I don't
|
|
|
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The actual systemd implementation, however, has quite many issues and a rather
|
|
|
|
|
|
|
|
steady influx of new ones. (They are being fixed, but they are still included
|
|
|
|
|
|
|
|
in official releases, making a painful experience for bleeding-edge users.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Many of my gripes feel like a lack of quality control: either the devs forgot
|
|
|
|
|
|
|
|
to consider some use cases, or did not test thoroughly enough. In a few cases,
|
|
|
|
|
|
|
|
the changes seem to be merged without too much thought. That would be great for
|
|
|
|
|
|
|
|
a toy project, but that IMHO should not have become a main implementation of a
|
|
|
|
|
|
|
|
critical system part
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I have also the feeling that the project is too big for devs' capabilities, so
|
|
|
|
|
|
|
|
seeing the impact of changes is hard. This can be rephrased as "the devs are
|
|
|
|
|
|
|
|
too incompetent to handle a project this big/tangled".[#reach]_
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Also: there is a lot written from other people:
|
|
|
|
|
|
|
|
- TODO
|
|
|
|
|
|
|
|
- TODO
|
|
|
|
|
|
|
|
- TODO
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I do not necessarily agree with everything on the pages linked above, but I am
|
|
|
|
|
|
|
|
definitely not alone. However, this page serves to list my own views and
|
|
|
|
|
|
|
|
experiences, so I will try to not repeat what has already been said.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Here come the concrete cases:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Disabling the user instance
|
|
|
|
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applies to version: 247.3-7+deb11u4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A common systemd deployment spawns the user instance automatically when logging
|
|
|
|
|
|
|
|
in. There does not seem to be a documented way of disabling it, but masking
|
|
|
|
|
|
|
|
``default.target`` in the user instance seems to do the job. (Another way is
|
|
|
|
|
|
|
|
masking the ``user@XXX.service`` in the system instance, if you have the rights.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Firstly, this is undocumented (and might break in the future). Secondly, if you
|
|
|
|
|
|
|
|
try to find out the current default target, it throws a lovely bullshit error::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
$ systemctl --user get-default
|
|
|
|
|
|
|
|
Failed to get default target: Operation not possible due to RF-kill
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You know what 'wat' means? The message was transferred through D-bus and I am
|
|
|
|
|
|
|
|
pretty sure that kernel did not return ERFKILL, this is some internal systemd's
|
|
|
|
|
|
|
|
confusion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
OOM-kill kills the whole session
|
|
|
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applies to version: TODO! 252? (fixed in 252.4 iirc)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I was so mad at this one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Out of memory is a quite common condition for me, because I happen to use a few
|
|
|
|
|
|
|
|
programs that leak memory. I am well aware of the risk that kernel's OOM killer
|
|
|
|
|
|
|
|
may kill some vital process, but also know that pretty much all the time the
|
|
|
|
|
|
|
|
leaking program gets shot, so it's not a big deal.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Then new systemd landed. It started monitoring OOMs, so that it can terminate
|
|
|
|
|
|
|
|
and fail the unit which ran OOM. The idea is probably not to leave processes
|
|
|
|
|
|
|
|
running without other processes of the same service.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
That seems reasonable. For services. The session is *also* a unit. No special handling.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yes, *any* runaway process would kill the *entire* session. Or rather systemd
|
|
|
|
|
|
|
|
would, proactively.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This had quite a few strong words in the issue report, and systemd devs said
|
|
|
|
|
|
|
|
such language was unwarranted. I am siding with the reporter, sending SIGKILL
|
|
|
|
|
|
|
|
to everything in a session is not a sane behaviour imho.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. TODO: link to github
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hardcoded user.slice in logind's PAM module
|
|
|
|
|
|
|
|
-------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applies to version: TODO
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The PAM module, which takes care about creating the cgroup hierarchy, is
|
|
|
|
|
|
|
|
hardcoded to put everything under ``user.slice``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. TODO: link source code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This effectively prevents having service/monitoring access over SSH be in
|
|
|
|
|
|
|
|
correct place in the hierarchy, i.e. somewhere under ``system.slice``. (Well,
|
|
|
|
|
|
|
|
you could skip the PAM module, but then everything is under ``sshd.service``).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In my use case, I wanted to allocate a tiny amount of resources to the
|
|
|
|
|
|
|
|
monitoring (which logs in as a system account over SSH) and have the rest
|
|
|
|
|
|
|
|
available to users, so that I can monitor the machine even when the users use
|
|
|
|
|
|
|
|
all available memory or forkbomb it. Can't do.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Generators
|
|
|
|
|
|
|
|
----------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applies to version: TODO
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Systemd wants almost everything to be a unit. Unfortunately, not everything
|
|
|
|
|
|
|
|
natively is, so systemd needs generators to create the units ad-hoc. However,
|
|
|
|
|
|
|
|
generators run very early, so many facilities of systemd are not available,
|
|
|
|
|
|
|
|
making for a very hacky feel. Namely: there is no logging, so generators need
|
|
|
|
|
|
|
|
to log to kernel's ring buffer, and they *cannot* be disabled in any way.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Quite naturally, generators form another layer in the stack and as such, they
|
|
|
|
|
|
|
|
can introduce bugs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For me, the pain point was when the fstab generator (which creates ``.mount``
|
|
|
|
|
|
|
|
units) did not parse ``noauto`` or ``_netdev`` flag correctly, thus hanging
|
|
|
|
|
|
|
|
every single boot of my machine for 90 seconds waiting for a drive that I
|
|
|
|
|
|
|
|
expected not to be available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When I learned the nature of the problem, I pulled systemd out of the system.
|
|
|
|
|
|
|
|
(I was already aware of many other deficiencies of systemd at that time, this
|
|
|
|
|
|
|
|
sounded like fun and surprisingly the first random attempts at removing systemd
|
|
|
|
|
|
|
|
worked well enough, so I did not revert that state.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Spooky action at a distance w.r.t. logging from user instance
|
|
|
|
|
|
|
|
-------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applies to version: TODO
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Services have logs, and logs are to be read. User services have also logs, but
|
|
|
|
|
|
|
|
systemd *sometimes* fails at the reading part. Namely, this
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
assert and broken login
|
|
|
|
|
|
|
|
D-bus and weird interactions
|
|
|
|
|
|
|
|
various quality of modules
|
|
|
|
|
|
|
|
generator and fstab
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Can systemd fix this?
|
|
|
|
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I think it might be possible, since most of these issues stem from not caring
|
|
|
|
|
|
|
|
about various usecases very much. So an end-to-end test suite of the different
|
|
|
|
|
|
|
|
usecases could help avoid many of these problems.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I have no idea how many people work on systemd and what the community looks
|
|
|
|
|
|
|
|
like, so I don't know whether the test suite or other remediations would be
|
|
|
|
|
|
|
|
viable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Do I use systemd?
|
|
|
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tough question. I use several machines with different stage of
|
|
|
|
|
|
|
|
systemd-eviction:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- As said, my server uses systemd quite extensively. Each service has its own
|
|
|
|
|
|
|
|
user, whose user instance runs the service, the system service manages
|
|
|
|
|
|
|
|
resource limits on the user instances (``user-XXXX.slice`` from the system
|
|
|
|
|
|
|
|
instance POV). A few services for which I wanted to have logs accessible, run
|
|
|
|
|
|
|
|
under s6.
|
|
|
|
|
|
|
|
- My desktop is complete opposite: system is pretty stable, so it is only
|
|
|
|
|
|
|
|
started with a trivial shell script (without supervision or service
|
|
|
|
|
|
|
|
management). User's services run under s6-rc.[#stability]_ There is no
|
|
|
|
|
|
|
|
systemd anywhere.
|
|
|
|
|
|
|
|
- My laptops and other computers usually run system-wide systemd, but often
|
|
|
|
|
|
|
|
have very tweaked user-instance – either masked altogether, or with
|
|
|
|
|
|
|
|
non-default setup of the slices and services, or at least with some services
|
|
|
|
|
|
|
|
masked. And I am very much not afraid to kill user-systemd on the slightest
|
|
|
|
|
|
|
|
issue, because I have debugged the s6-rc solution.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The main reason for using systemd for me is that I want to use the pre-built
|
|
|
|
|
|
|
|
Arch's packages (esp. on low-end hardware) and have them work without
|
|
|
|
|
|
|
|
rebuilding. (I am considering creating a build-server to build packages for my
|
|
|
|
|
|
|
|
other machines, but it is a rather low priority and thus has not happened yet.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
And systemd somehow manages to only suck sometimes :-)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. [#reach] This is a rather bold statement. If you disagree even after reading
|
|
|
|
|
|
|
|
*the whole article*, please reach me at
|
|
|
|
|
|
|
|
`<mailto:systemd-gripes@pokemon.ledoian.cz>`__, I would like to know your POV.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. [#stability] The requirement for service manager is Linux's sound system and
|
|
|
|
|
|
|
|
its many quirks. I found out that when Pipewire (or its companions) crashes,
|
|
|
|
|
|
|
|
it sometimes brings down other services (MPD), which is unpleasant. On the
|
|
|
|
|
|
|
|
other hand, OpenSSH daemon does not crash like ever, and IP configuration is
|
|
|
|
|
|
|
|
oneshot, so there is no need for supervision on the system part.
|