Recently I have created an EKS Auto mode cluster and observed that “ARC Zonal shift” feature was enabled for my EKS cluster.
Amazon Application Recovery Controller (ARC) helps simplify and automate recovery for highly available applications. It was initially known as “Route 53 ARC” and since then has expanded to support Amazon EC2 Auto scaling groups, ALB, NLB and now EKS. So it is now just Amazon ARC.
You can recover from an impaired Availability Zone (AZ) using ARC Zonal Shift and Zonal autoshift. Zonal shift is used when you manually trigger it to shift traffic away from an impaired AZ. Use Zonal autoshift to let AWS monitor and shift traffic on your behalf.
If you want to run resilient and highly available applications across multi-AZs in EKS and you want to survive an AZ going down then this is the feature (EKS Zonal Shift) you need.
Remember that this will only redirect internal east-west traffic inside your EKS traffic between your pods. If you want to redirect traffic from loadbalancers similarly then you have to enable an ALB or NLB with ARC Zonal shift.
Performing a zonal shift enables you to achieve rapid recovery from application failures in a single Availability Zone (AZ). This is helpful to build resilience in case of an AZ impairment or when an AZ is down.
You can enable it from EKS cluster creation step or enable it afterwards for already running clusters. If you are creating EKS auto mode cluster with “Quick configuration” option then ARC Zonal shift is enabled by default.
With zonal shift, you can temporarily mitigate issues and incidents by triggering a shift and redirecting in-cluster network traffic to a healthy AZ.
For this to work, you should already be running EKS worker nodes in multiple AZs (at least three) for HA and resiliency and your applications are already running in multiple AZs. So if your application is already running in three different AZs and if one AZ is impaired then Zonal shift will redirect traffic away from impaired AZ to healthy AZs. In this case, you will have your application running in two AZs as highly available after the third one went down.
Ideally such setup comes with cost so use it for highly critical workloads where you need such level of high-availability and resiliency.
Check the EKS Zonal shift documentation to learn more: https://docs.aws.amazon.com/eks/latest/userguide/zone-shift.html
Follow me on LinkedIn for more content related to Kubernetes, EKS and AWS in general. Visit my website at https://vijay.eu/posts for all my posts in one place.