Bare-Metal LoadBalancer Services on Talos with Cilium L2 Announcements

#cilium #kubernetes #talos #loadbalancer #networking #l2-announcements #kube-proxy-replacement

2026-06-01

This article documents how to replace MetalLB with Cilium’s built-in L2 announcement feature, including the IPAM pool, the announcement policy, the supporting Cilium values, and what to verify on the wire.

1. MetalLB’s role

On a bare-metal cluster, a Service of type: LoadBalancer is meaningless until something assigns it an external IP and answers ARP for that IP on the local segment. The classic answer is MetalLB in L2 mode: a controller allocates an IP from a pool; a speaker DaemonSet replies to ARP requests, advertising via gratuitous ARP after leader election.

MetalLB works, but it is a second control plane to maintain — separate CRDs, Helm chart, and RBAC. When the cluster’s CNI is Cilium, every primitive MetalLB provides is already inside Cilium.

2. Cilium components

Three Cilium subsystems combine to replace MetalLB:

LoadBalancer IPAM. A CiliumLoadBalancerIPPool declares one or more CIDRs from which Cilium allocates IPs to LoadBalancer Services.
L2 announcements. A CiliumL2AnnouncementPolicy selects which nodes answer ARP/NDP for which IPs on which interfaces, with leader election among the selected nodes.
kube-proxy replacement. Cilium’s eBPF datapath services Service IPs without kube-proxy. Required for L2 announcements to behave consistently — otherwise kube-proxy and Cilium race for ownership of the forwarding decision.

The first two are CRDs applied as ordinary manifests. The third is a Helm value on the Cilium install.

3. Cilium Helm values

The Cilium values for this setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# cilium-values.yaml
ipv4NativeRoutingCIDR: "10.100.0.0/16"
autoDirectNodeRoutes: true
routingMode: native

k8sServiceHost: "localhost"
k8sServicePort: "7445"

kubeProxyReplacement: true

encryption:
  enabled: true
  type: wireguard

ipam:
  operator:
    clusterPoolIPv4PodCIDRList: "10.100.0.0/16"

bpf:
  masquerade: false
  datapathMode: veth

bandwidthManager:
  enabled: true
  bbr: true

l2announcements:
  enabled: true

envoy:
  enabled: false

hubble:
  enabled: true
  relay: { enabled: true }
  ui:    { enabled: true, replicas: 1 }

operator:
  replicas: 2

# Talos integration
cgroup:
  autoMount: { enabled: false }
  hostRoot: "/sys/fs/cgroup"

Field reference for the keys relevant here:

kubeProxyReplacement: true — disables kube-proxy and lets Cilium service all ClusterIP / NodePort / LoadBalancer traffic in eBPF. Required for L2 announcements. On Talos this also requires removing kube-proxy from the machine config; see section 7.
l2announcements.enabled: true — turns on the controller that watches CiliumL2AnnouncementPolicy resources and the responder that replies to ARP. Off by default because it relies on leader election, which adds API-server load.
routingMode: native + autoDirectNodeRoutes: true — pods are routable on the underlay between nodes without overlay encapsulation. L2 announcements work in either mode but native routing keeps the data path shorter.
encryption.type: wireguard — pod-to-pod traffic is encrypted between nodes. Independent of L2 announcements but worth noting because it pairs naturally with the same eBPF datapath.
k8sServiceHost: localhost + k8sServicePort: 7445 — Talos’s kube-apiserver-loadbalancer config. Cilium reaches the API server via the local Talos LB, which itself does not depend on Cilium — this avoids a circular dependency at bootstrap.

4. The IP pool

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: web-pool
spec:
  blocks:
    - cidr: 192.168.105.128/27
  serviceSelector:
    matchExpressions:
      - { key: io.cilium/lb-ipam-ips, operator: DoesNotExist }

Field reference:

blocks[].cidr — a single CIDR or a start/stop pair. Multiple blocks are allowed. The pool must be on the same L2 segment as the nodes that will announce it; ARP cannot traverse a router.
serviceSelector — restricts which Services pull from this pool. The expression above means “any Service that does not request a specific IP via the io.cilium/lb-ipam-ips annotation”. Omitting serviceSelector makes the pool match every Service.
For multiple pools (public vs internal), use label selectors on the Services and matching serviceSelector matchLabels.

Services consume from the pool either implicitly (type: LoadBalancer + no selector mismatch) or by requesting a specific IP via loadBalancerIP (deprecated) or the io.cilium/lb-ipam-ips: "192.168.105.130" annotation.

5. The announcement policy

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
  name: l2-announcement-policy
spec:
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: DoesNotExist
  interfaces:
    - ens20
  externalIPs: false
  loadBalancerIPs: true

Field reference:

nodeSelector — restricts which nodes participate in announcing. L2 announcements should only come from the worker nodes that run the Envoy gateway DaemonSet handling the traffic; a node that answers ARP but has no local backend will draw the traffic and then drop it. The selector above (node-role.kubernetes.io/control-plane DoesNotExist) ensures control-plane nodes are ignored during announcements, leaving only the workers to respond. On a homelab with combined control-plane/worker nodes, drop the selector.
interfaces — list of interface names on which to answer ARP. On a multi-NIC node, the wrong interface answers ARP into the wrong segment. ens20 here is the Talos VM’s data network on VLAN 105.
externalIPs: false / loadBalancerIPs: true — declares what kinds of Service IPs to announce. externalIPs is the (less commonly used) Service spec.externalIPs field; loadBalancerIPs is status.loadBalancer.ingress[].ip. Most setups want only the latter.

Leader election among the selected nodes happens via Kubernetes Leases — exactly one node answers ARP for any given IP at a time. On leader change, Cilium emits a gratuitous ARP to push the new MAC into adjacent ARP caches; clients usually fail over in under a second.

6. Verification

Pool allocations:

1
kubectl get ciliumloadbalancerippool web-pool -o yaml | yq .status

status.conditions[] should show cilium.io/PoolConflict: False and cilium.io/NoUnassignedIPs reflecting the remaining capacity.

Service got an IP:

1
kubectl get svc -A -o wide | awk '$5 ~ /^192\.168\.105\./'

ARP works from a client on the segment:

1
2
3
4
ip neigh flush all
ping -c1 192.168.105.130
ip neigh show 192.168.105.130
# 192.168.105.130 dev wlp4s0 lladdr aa:bb:cc:dd:ee:ff REACHABLE

The MAC address shown is the announcing node’s data-NIC MAC. Repeat after kubectl delete pod -n kube-system -l app.kubernetes.io/name=cilium-agent on the current leader to confirm failover: the MAC changes within a few seconds and traffic continues.

Hubble flow inspection:

1
hubble observe --to-ip 192.168.105.130 --last 50

Confirms eBPF-level visibility of the inbound traffic; useful when troubleshooting whether the issue is ARP, eBPF service load-balancing, or backend pod readiness.

7. Talos configuration

On Talos, kube-proxy is part of the cluster manifest, not a separately deployable workload. Disabling it requires a machine-config patch:

1
2
3
4
5
6
cluster:
  proxy:
    disabled: true
  network:
    cni:
      name: none

Apply via talosctl apply-config and reboot. Cilium then installs as the only CNI and provides kube-proxy replacement. Confirm with:

1
2
kubectl get pods -A | grep -E 'kube-proxy|cilium'
# only cilium pods should appear

l2announcements is incompatible with kube-proxy running alongside Cilium — both would try to install the same IP-to-MAC entries.

stat /posts/2026-06-01-cilium-l2-announcements/

2026-06-01: Initial publication of the article