Nine times out of ten, when a client's browser is hanging on "Resolving host," it's a DNS problem – and it's fixable in under 10 minutes once you know where to look.

The standard advice out there tells you to switch to Cloudflare, flush your cache, and call it a day. That works for home users. For MSPs managing dozens of client environments across Windows servers, Linux endpoints, containers, and mixed AD domains, the root causes are usually messier – and the fixes require more precision.

This guide skips the basics that every other article covers and gets to the causes that actually bite MSPs in production.

Diagnose Before You Fix

Changing resolvers without measuring first is guesswork. A slow DNS lookup could be a 200ms resolver, a 5-second IPv6 timeout, or a 15-second ndots misconfiguration. They look identical from the browser. They're not the same fix.

Measure your actual lookup time first:

bash
# Linux / macOS
time dig google.com

# Windows (PowerShell)
Measure-Command { Resolve-DnsName google.com }

# Test a specific resolver directly
time dig @1.1.1.1 google.com
time dig @8.8.8.8 google.com

What you're looking for:

  • Under 50ms: DNS is probably not your bottleneck
  • 50–150ms: Suboptimal resolver, likely fixable
  • 150–500ms: Congested or geographically distant resolver
  • 500ms+: Something is broken – a timeout, a misconfiguration, or a conflict

If dig @1.1.1.1 google.com is fast but your system DNS is slow, the resolver is the problem. If everything is slow including direct resolver queries, you're dealing with network-level latency or packet loss. If dig is fast but the browser is slow, you've got a client-side caching or adapter conflict.

Fix 1: Switch to a Faster Resolver

ISP-provided resolvers are frequently the slowest option on the network. Overloaded infrastructure, poor anycast coverage, and no active optimization for query time.

The fastest public resolvers today, in order of global median response time:

  • Cloudflare 1.1.1.1 – consistently under 15ms globally, supports DoH and DoT
  • Google 8.8.8.8 – around 20ms, excellent uptime, widely cached
  • Quad9 9.9.9.9 – comparable speed, adds threat intelligence filtering

For MSP deployments, the right place to set this is on the router or DHCP server, not on individual endpoints. One change propagates across all client devices.

Windows DNS client (per machine):

vbnet
netsh interface ip set dns "Ethernet" static 1.1.1.1
netsh interface ip add dns "Ethernet" 8.8.8.8 index=2

On a pfSense/OPNsense router: System > General Setup > DNS Servers. Set 1.1.1.1 and 8.8.8.8. Disable "Allow DNS server list to be overridden by DHCP/PPP on WAN."

On UniFi: Networks > [Your Network] > DHCP Name Server > Manual > 1.1.1.1, 8.8.8.8.

Fix 2: The IPv6 AAAA Timeout

This one trips up MSPs constantly and barely gets mentioned in mainstream guides.

Modern operating systems – and glibc on Linux – send A (IPv4) and AAAA (IPv6) queries in parallel by default. If the DNS server doesn't respond to the AAAA query, the resolver waits for a timeout before falling back. That timeout is typically 5 seconds. So instead of a 20ms lookup, you get a 5000ms lookup. The browser shows "Resolving host" and everyone assumes the internet is broken.

This happens when:

  • The DNS server can't handle parallel A/AAAA requests
  • IPv6 is enabled on the interface but not actually working
  • A security appliance silently drops AAAA queries instead of returning NXDOMAIN

Diagnose it:

bash
time dig AAAA google.com

If that hangs for 5+ seconds or times out, you've found it.

Fix on Linux (resolv.conf):

vbnet
options single-request-reopen

This tells glibc to send A and AAAA sequentially instead of in parallel. It's slower overall, but eliminates the 5-second timeout entirely.

A cleaner long-term fix is to ensure IPv6 is either properly working or properly disabled on the interface:

bash
# Check if IPv6 is actually routing
ping6 -c2 2606:4700:4700::1111

# Disable AAAA lookups via systemd-resolved if IPv6 isn't in use
# In /etc/systemd/resolved.conf:
# DNSStubListenerExtra=
# Also add to /etc/gai.conf: precedence ::ffff:0:0/96 100

On Windows: If clients are on an IPv4-only network, disable IPv6 on the adapter. Network Adapter Properties > uncheck "Internet Protocol Version 6 (TCP/IPv6)." Not ideal long-term, but eliminates the fallback penalty immediately.

Fix 3: ndots Misconfiguration on Linux and Containers

This one is almost never mentioned outside of Kubernetes circles, but it affects any Linux environment with multiple DNS search domains configured.

The ndots option in /etc/resolv.conf controls how many dots a domain needs before the resolver treats it as a fully qualified domain name (FQDN). The default is 5. With ndots:5, a query for api.example.com (two dots) gets tried with every configured search domain appended before the resolver finally tries the actual domain.

If you have three search domains configured, that's three failed lookups before you get the right answer. On an internal network, each failed lookup can take 500ms–2s.

Check your current resolv.conf:

bash
cat /etc/resolv.conf

If you see something like:

lua
search corp.client.local internal.client.local ad.client.local
options ndots:5

...and your clients are complaining about slow internet, this is likely part of the problem.

Fix: Lower ndots to 2 or 1 for most environments. FQDNs (which include a trailing dot) bypass the ndots logic entirely.

options ndots:2

For containers and VDI deployments, ndots:1 is often the right value. External domains get resolved immediately without suffix appending.

Fix 4: systemd-resolved Conflicts with dnsmasq or avahi

On modern Ubuntu, Debian, and most RHEL-based systems, systemd-resolved handles DNS by default. It's a caching stub resolver that listens on 127.0.0.53. Problems happen when other services – dnsmasq, avahi-daemon, or a VPN client – also try to own port 53.

Symptoms:

  • DNS works, then stops, then works again
  • First lookup takes 15–30 seconds; subsequent ones are fast
  • systemctl status systemd-resolved shows errors or repeated restarts

Diagnose:

bash
sudo ss -ulnp | grep :53
resolvectl status

If you see multiple processes listening on port 53, you've got a conflict.

Fix for avahi conflict (common on desktop distros):

bash
# In /etc/systemd/resolved.conf:
MulticastDNS=no

# In /etc/nsswitch.conf, replace:
# hosts: files mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] dns
# with:
# hosts: files mdns_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] dns

sudo systemctl restart systemd-resolved

If systemd-resolved itself is the problem (cache stale, service stuck):

bash
sudo resolvectl flush-caches
sudo systemctl restart systemd-resolved

If the service keeps hanging after restarts, consider switching to a dedicated caching resolver like Unbound – covered in Fix 6.

Fix 5: Windows DNS Client Service and Ghost Adapters

On Windows, two things cause slow DNS that go unnoticed until someone actually checks:

The DNS Client service not running: If the Windows DNS Client service is stopped or in a degraded state, every lookup goes direct to the DNS server with no caching. Lookups that should be instant take 100–300ms.

powershell
Get-Service -Name Dnscache
# If not running:
Start-Service -Name Dnscache
Set-Service -Name Dnscache -StartupType Automatic

Ghost network adapters: Old VPN clients, VMware, VirtualBox, and Hyper-V all create virtual network adapters. When these aren't properly removed, Windows queries them for DNS along with your active adapter. The system waits for responses that never come.

Check for inactive adapters:

powershell
# Show ALL adapters including hidden ones
Get-PnpDevice -Class Net | Where-Object {$_.Status -ne "OK"}

To remove them properly: Device Manager > View > Show hidden devices. Expand Network Adapters. Remove anything greyed out that isn't in active use.

Flush DNS cache on Windows:

bash
ipconfig /flushdns
ipconfig /registerdns

Also flush the Chrome internal cache separately if browser slowness is the complaint: chrome://net-internals/#dns > Clear host cache.

Fix 6: Deploy a Local Caching Resolver at Client Sites

Switching the resolver to Cloudflare or Google is a quick win. A local caching resolver is the right long-term answer for any site with more than 10 devices.

Unbound is the open-source recursive resolver MSPs should know. It runs on anything – a Raspberry Pi, a spare VM, a firewall appliance. Deployed as the site's primary DNS, it:

  • Caches responses locally (subsequent queries return in < 1ms)
  • Performs recursive resolution directly, bypassing overloaded ISP resolvers
  • Supports DNSSEC validation out of the box
  • Has no licensing cost

Basic Unbound setup on Ubuntu:

bash
apt install unbound

/etc/unbound/unbound.conf:

yaml
server:
  interface: 0.0.0.0
  access-control: 192.168.0.0/24 allow
  cache-max-ttl: 86400
  cache-min-ttl: 300
  prefetch: yes
  prefetch-key: yes
  num-threads: 2
  so-reuseport: yes
  hide-identity: yes
  hide-version: yes
  use-caps-for-id: yes

The prefetch: yes option is particularly useful – it refreshes popular cache entries before they expire, so clients almost never see a cold lookup.

Pi-hole pairs well with Unbound if you also want DNS-level ad/malware blocking for clients. The combination gives you: network-wide blocking + local recursive caching + a clean dashboard for per-device DNS activity. Useful data for quarterly business reviews with clients.

For the full Pi-hole + Unbound setup, we have a detailed walkthrough from members who've deployed this across hundreds of client sites. Real configs, real gotchas, not a marketing blog.

If you're building out a fully open-source monitoring stack, OpenMSP's directory covers 97 open-source MSP tools across 19 categories.

Fix 7: Active Directory DNS Issues

For MSPs working with Windows domains – which is most of them – AD DNS is its own category of problems.

Slow internal name resolution: If clients are pointed directly at the domain controller for DNS, and that DC is handling external forwarding through a slow ISP resolver, every external lookup takes the long route. Fix: configure conditional forwarders to use Cloudflare or Google for external domains.

In DNS Manager: Forward Lookup Zones > Right-click root zone > Properties > Forwarders. Add 1.1.1.1 and 8.8.8.8.

Split-horizon DNS not resolving correctly: If internal and external zones share the same name (e.g., company.com), clients must hit the internal DNS server for internal resolution. If they're getting the external resolver for internal queries, you'll see intermittent "host not found" errors for internal services.

Verify which DNS server is being used:

bash
ipconfig /all
nslookup internalserver.company.com

If the response comes from an external resolver, your DHCP is handing out the wrong DNS server. Fix the DHCP scope options on the DC or on your router.

DNS scavenging not configured: Over time, AD DNS accumulates stale A records that can cause incorrect resolutions. Enable scavenging on your zones: DNS Manager > Right-click zone > Properties > General > Aging. Set No-refresh interval to 7 days, Refresh interval to 7 days.

Fix 8: Authoritative DNS – TTL and CNAME Chains

If client-side DNS is fine but the sites your clients run are loading slowly for their end users, the issue might be on the authoritative side.

TTL values that are too low: Some hosting control panels default to 60-second TTLs. That means every resolver in the world re-queries your nameserver every minute. For a busy site, this hammers your nameservers and means every cold visitor gets a live lookup instead of a cached one.

Check your TTLs:

bash
dig example.com | grep -E "A|AAAA|TTL"

Set records to 3600 seconds (1 hour) for stable infrastructure. 300 seconds during migrations is fine temporarily, but increase it again afterwards.

CNAME chains: Every CNAME adds a lookup. wwwcdn.provider.comanycast.provider.netA record is three lookups before you get an IP. Flatten chains where possible. Most modern CDN providers support CNAME flattening (sometimes called ALIAS or ANAME records) at the zone apex.

Monitoring DNS at Scale

One-off fixes aren't enough when you're managing 30+ client environments. You need to know when DNS degrades before a client tickets you.

Zabbix (open-source) has built-in DNS check items. Set up a simple external check on each client site's DNS resolver:

csharp
net.dns.record[,google.com,A,3,1]

Alert if response time exceeds 500ms or if queries start failing.

Prometheus + Blackbox Exporter can probe DNS endpoints and expose response times as metrics. Combine with Grafana for visibility across all client sites in one dashboard.

Both tools are free. Both integrate with TacticalRMM for MSPs already using it as their RMM platform. The OpenMSP community has shared monitoring configs for exactly this setup – contributed by MSPs who built and tested them in production.

Quick Reference: What to Check First

SymptomMost likely causeFirst fix
All sites slow, first visit onlyNo local DNS cachingDeploy Unbound, or restart DNS Client service
5-second hangs on specific sitesIPv6 AAAA timeoutAdd single-request-reopen to resolv.conf
Slow on Linux, fast on Windowsndots misconfigurationSet options ndots:2 in resolv.conf
Intermittent, gets worse over timesystemd-resolved stuckresolvectl flush-caches, restart service
Slow only on some Windows machinesGhost network adaptersRemove via Device Manager > Show hidden
Internal names don't resolveAD DNS / split-horizonCheck DHCP DNS options, verify DC forwarders
External sites slow from all endpointsISP resolver overheadSwitch to 1.1.1.1 at router level
Client's site slow for end usersLow TTL or long CNAME chainIncrease TTL, flatten CNAME

DNS issues at MSP scale aren't solved by switching to Cloudflare and calling it done. The real problems are buried in resolv.conf options, IPv6 stacks that half-work, and Windows adapters that never got cleaned up. Fix those, and you stop chasing the same tickets every three months.

If you want to compare notes on open-source DNS tooling with other MSPs who've already done this work, join the OpenMSP community. It's where MSP operators share real configs – not vendor playbooks 🦩

Kristina Shkriabina

Kristina Shkriabina

Contributing author to the OpenMSP Platform