Skip to content

How does DNS work inside a Kubernetes cluster, and how do applications discover each other?

Hello. As a Senior Kubernetes platform engineer, I will break down the mechanics of in-cluster DNS. This is often the single most critical component for application stability; if DNS fails, your microservices cannot find each other, effectively bringing down the workload even if the Pods themselves are healthy.

Kubernetes utilizes an internal DNS system (standardized on CoreDNS in modern clusters) to allow applications to discover Services and Pods via human-readable names rather than ephemeral IP addresses.

1. The DNS Server: CoreDNS

In a standard cluster, DNS is provided by an add-on called CoreDNS, which typically runs as a Deployment named coredns in the kube-system namespace,.

  • Service Exposure: The CoreDNS Pods are exposed via a Service, usually named kube-dns. This Service is assigned a static ClusterIP (often the 10th IP in the service CIDR range, e.g., 10.96.0.10),.
  • The Kubernetes Plugin: CoreDNS is modular. It uses a specific plugin called kubernetes that watches the Kubernetes API. When you create a Service or a Pod, CoreDNS observes this event and automatically creates the corresponding DNS records,.
  • Configuration: The server's behavior is defined in a ConfigMap named coredns. You can modify this to configure upstream forwarding, caching, or stub-domains.

2. The Client Side: How Pods are Configured

When a Pod is scheduled, the kubelet on that node is responsible for configuring the network environment for the containers. It does this by populating the /etc/resolv.conf file inside the container,.

A standard Pod's /etc/resolv.conf will look similar to this:

text
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
  • nameserver: Points to the ClusterIP of the kube-dns Service.
  • search: This list allows you to use short names. If a Pod in the default namespace tries to reach a service named my-db, the DNS resolver will automatically expand it to my-db.default.svc.cluster.local,.
  • options ndots:5: This instructs the resolver to append the search suffixes to any query containing fewer than 5 dots. This ensures internal cluster names are resolved first before reaching out to the public internet.

3. Service Discovery Mechanisms

Applications discover each other primarily through DNS records, though environment variables exist as a legacy fallback.

A. Standard Services (ClusterIP)

For standard Services, CoreDNS creates an A record (or AAAA for IPv6) that resolves the Service name to the Service's ClusterIP (Virtual IP).

  • Format: my-svc.my-namespace.svc.cluster-domain.example.
  • Traffic Flow: The client resolves the DNS to the Virtual IP. The node's network layer (kube-proxy via iptables/IPVS) intercepts traffic to that VIP and load balances it to one of the backing Pods.

B. Headless Services (Direct Pod Access)

If you set .spec.clusterIP: None in a Service definition, it becomes Headless. CoreDNS does not generate a single A record for the Service IP. Instead, it generates multiple A records, one for the IP of each healthy Pod backing the Service,.

  • Use Case: This is critical for stateful applications (like databases or ZooKeeper) where a client needs to connect to a specific instance (e.g., "connect to the primary writer") rather than a random load-balanced instance,.
  • StatefulSets: When combined with StatefulSets, this provides stable network identities (e.g., web-0.nginx.default.svc.cluster.local) that persist even if the Pod is rescheduled to a new node,.

C. Environment Variables (Legacy)

When a Pod starts, the kubelet populates environment variables for all currently active Services in the cluster,.

  • Format: {SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT.
  • Constraint: This depends on ordering. If the Pod is created before the Service, the environment variables will be missing. DNS is preferred because it handles dynamic updates automatically,.

4. NodeLocal DNSCache (Performance Optimization)

In high-scale clusters, routing every DNS query to the central CoreDNS Service can cause performance bottlenecks and connection tracking (conntrack) race conditions.

To solve this, you can deploy NodeLocal DNSCache. This runs a lightweight DNS caching agent on every node (as a DaemonSet).

  • Mechanism: Pods query the local agent on their own Node instead of the central Service.
  • Benefit: It upgrades UDP queries to TCP, skips iptables DNAT rules for DNS, and significantly reduces latency and load on the central CoreDNS infrastructure,.