Bare-Metal LoadBalancer Services on Talos with Cilium L2 Announcements
This article documents how to replace MetalLB with Cilium’s built-in L2 announcement feature, including the IPAM pool, the announcement policy, the supporting Cilium values, and what to verify on the wire.
1. MetalLB’s role
On a bare-metal cluster, a Service of type: LoadBalancer is meaningless until something assigns it an external IP and answers ARP for that IP on the local segment. The classic answer is MetalLB in L2 mode: a controller allocates an IP from a pool; a speaker DaemonSet replies to ARP requests, advertising via gratuitous ARP after leader election.
MetalLB works, but it is a second control plane to maintain — separate CRDs, Helm chart, and RBAC. When the cluster’s CNI is Cilium, every primitive MetalLB provides is already inside Cilium.
2. Cilium components
Three Cilium subsystems combine to replace MetalLB:
- LoadBalancer IPAM. A
CiliumLoadBalancerIPPooldeclares one or more CIDRs from which Cilium allocates IPs toLoadBalancerServices. - L2 announcements. A
CiliumL2AnnouncementPolicyselects which nodes answer ARP/NDP for which IPs on which interfaces, with leader election among the selected nodes. - kube-proxy replacement. Cilium’s eBPF datapath services Service IPs without kube-proxy. Required for L2 announcements to behave consistently — otherwise kube-proxy and Cilium race for ownership of the forwarding decision.
The first two are CRDs applied as ordinary manifests. The third is a Helm value on the Cilium install.
3. Cilium Helm values
The Cilium values for this setup:
| |
Field reference for the keys relevant here:
kubeProxyReplacement: true— disables kube-proxy and lets Cilium service all ClusterIP / NodePort / LoadBalancer traffic in eBPF. Required for L2 announcements. On Talos this also requires removing kube-proxy from the machine config; see section 7.l2announcements.enabled: true— turns on the controller that watchesCiliumL2AnnouncementPolicyresources and the responder that replies to ARP. Off by default because it relies on leader election, which adds API-server load.routingMode: native+autoDirectNodeRoutes: true— pods are routable on the underlay between nodes without overlay encapsulation. L2 announcements work in either mode but native routing keeps the data path shorter.encryption.type: wireguard— pod-to-pod traffic is encrypted between nodes. Independent of L2 announcements but worth noting because it pairs naturally with the same eBPF datapath.k8sServiceHost: localhost+k8sServicePort: 7445— Talos’skube-apiserver-loadbalancerconfig. Cilium reaches the API server via the local Talos LB, which itself does not depend on Cilium — this avoids a circular dependency at bootstrap.
4. The IP pool
| |
Field reference:
blocks[].cidr— a single CIDR or astart/stoppair. Multiple blocks are allowed. The pool must be on the same L2 segment as the nodes that will announce it; ARP cannot traverse a router.serviceSelector— restricts which Services pull from this pool. The expression above means “any Service that does not request a specific IP via theio.cilium/lb-ipam-ipsannotation”. OmittingserviceSelectormakes the pool match every Service.- For multiple pools (public vs internal), use label selectors on the Services and matching
serviceSelectormatchLabels.
Services consume from the pool either implicitly (type: LoadBalancer + no selector mismatch) or by requesting a specific IP via loadBalancerIP (deprecated) or the io.cilium/lb-ipam-ips: "192.168.105.130" annotation.
5. The announcement policy
| |
Field reference:
nodeSelector— restricts which nodes participate in announcing. L2 announcements should only come from the worker nodes that run the Envoy gateway DaemonSet handling the traffic; a node that answers ARP but has no local backend will draw the traffic and then drop it. The selector above (node-role.kubernetes.io/control-plane DoesNotExist) ensures control-plane nodes are ignored during announcements, leaving only the workers to respond. On a homelab with combined control-plane/worker nodes, drop the selector.interfaces— list of interface names on which to answer ARP. On a multi-NIC node, the wrong interface answers ARP into the wrong segment.ens20here is the Talos VM’s data network on VLAN 105.externalIPs: false/loadBalancerIPs: true— declares what kinds of Service IPs to announce.externalIPsis the (less commonly used) Servicespec.externalIPsfield;loadBalancerIPsisstatus.loadBalancer.ingress[].ip. Most setups want only the latter.
Leader election among the selected nodes happens via Kubernetes Leases — exactly one node answers ARP for any given IP at a time. On leader change, Cilium emits a gratuitous ARP to push the new MAC into adjacent ARP caches; clients usually fail over in under a second.
6. Verification
Pool allocations:
| |
status.conditions[] should show cilium.io/PoolConflict: False and cilium.io/NoUnassignedIPs reflecting the remaining capacity.
Service got an IP:
| |
ARP works from a client on the segment:
| |
The MAC address shown is the announcing node’s data-NIC MAC. Repeat after kubectl delete pod -n kube-system -l app.kubernetes.io/name=cilium-agent on the current leader to confirm failover: the MAC changes within a few seconds and traffic continues.
Hubble flow inspection:
| |
Confirms eBPF-level visibility of the inbound traffic; useful when troubleshooting whether the issue is ARP, eBPF service load-balancing, or backend pod readiness.
7. Talos configuration
On Talos, kube-proxy is part of the cluster manifest, not a separately deployable workload. Disabling it requires a machine-config patch:
| |
Apply via talosctl apply-config and reboot. Cilium then installs as the only CNI and provides kube-proxy replacement. Confirm with:
| |
l2announcements is incompatible with kube-proxy running alongside Cilium — both would try to install the same IP-to-MAC entries.