Fixing DNS issues on the Arch cloud image on Digital Ocean
Table of Contents
The issue#
When running an Arch cloud image on Digital Ocean, DNS lookups frequently fail or take an unreasonably long time to resolve.
Look at this insanely long query time:
And this query just fails:
System updates are also slow due to mirrors taking long to resolve, and docker images don’t build correctly. I use run linuxserver.io’s SWAG with the imagemagick mod, and this mod in particular was failing to download and get added to the image.
Welp! No imagemagick in the container means my clients won’t have imagick on their Wordpress servers.
Troubleshooting#
When I first started to debug this, I asked on #archlinux at Libera, thinking it was a bug in the cloud image itself. It isn’t. I was told to disable DNSSEC, which technically solves the problem, but is not a viable solution as we’re lowering security.
Arch uses systemd-resolved by default for network resolution. Running resolvectl
shows us:
It looks like Digital Ocean’s own DNS servers are being added to the list of DNS servers.
The next step was to search the OS for any files containing those IP addresses to find out what config file it’s in to learn what might be causing it.
What in the world? What’s cloud-init?
Turns out it’s a package that contains utilities for early initialization of cloud instances, and is needed in Arch Linux images that are built with the intention of being launched in cloud environments.
Okay, so there must be a way to have it not inject Digital Ocean’s DNS IPs, right?
The cloud-init docs say setting manage_resolv_conf
must be set to true in order for a resolv_conf
section to be applied.
Alright, let’s check the cloud-init config file.
No resolv_conf
section in here. And sure enough, adding manage_resolv_conf: false
does nothing either.
We saw this file earlier /var/lib/cloud/instances/385249616/boothooks/resolver-fix
in the search log earlier. Let’s see what it does.
JFC. It adds the DNS IPs as part of the cloud-init process at boot.
Searching for this on DuckDuckGo led nowhere. So I *shudder* searched on Google, which surprisingly yielded a Chinese blog post with what seemed like a solution. The author is impressed by the Digital Ocean Customer Support Staff’s confusion and inability to solve two support tickets about this.
The blog post suggests creating a systemd service file to remove the offending config file before and whenever systemd-resolved.service
is started, stopped, or reloaded.
Solution#
Let’s create a file /etc/systemd/system/remove-systemd-resolved-conf-d.service
.
Then make systemd-resolved.service
want it.
Now run resolvectl
to check.
Okay good. The offending DNS server IPs are gone from the Global
section, and it stays this way even after a reboot.
The two network interfaces are still using those IPs though, and DNS resolution isn’t any better than before.
There are no other relevant files with these IPs in them so this time they must be being pulled dynamically.
Let’s look in /etc/systemd/network/
to see if these interfaces are defined there.
Yep!
Let’s see what’s inside them.
They’re using DHCP, so the DHCP server must be providing the DNS addresses.
This is an easy enough fix.
Just add UseDNS=no
in both files.
They should look like this now:
Reload your network, and your DNS resolution woes should be yesterday’s news.
Let’s check resolvectl
.
Awesome!
The linuxserver mod downloads and builds sucessfully now, and resolving domains takes milliseconds.