This blog covers the core kubernetes architecture in depth and easier way
Why you need to know all this?
- Before this, I actually had experience working with Pods, Service, Deployments, different types of ports and replicas.
- But it always didn’t feel it all when I hear the words like
control-plane
,kube-proxy
,kubelet
and so on. - And understanding this gets more and more important when you deep dive more into kubernetes.
Broad overview
(src: https://kubernetes.io/docs/concepts/overview/components/)
Container Runtime
-
This is a
software
that runs and manages the containers. It is responsible for creating, starting, stopping, and managing the lifecycle of the containers. -
Container Runtime Interface (CRI):
- This is a standard API in kubernetes for interacting with container runtimes.
- This helps k8s to manage different runtime seamlessly.
- This is useful as it decouples kubernetes from being dependent upon specific runtime. In this way kuberenetes can support various runtimes which follows some specific standards.
- OCI (Open Container Initiative)
- For any vendor to work as a container runtime for kubernetes, it should follow the OCI standard
- This contains the specification needed for the
image
andruntime
-
Some of such runtime includes:
- Docker Engine
- Implements the Container Runtime Interface(CRI) for Kubernetes. The dockershim used to provide this which has been now deprecated
- containerd
- A lightweight runtime developed (initially) developed by Docker. It is CRI compatible.
- This project is part of CNCF now!
- Docker Engine
Control Plane
- Control plane is the central management layer, which consists of several key components which are necessary for orchestration of the cluster, ensuring desired state and coordination of workloads across the cluster. This is the core layer around which different things in k8s gets lined up.
Control Plane consist of several key components:
-
Etcd
- A distributed, highly available key-value store that serves as the store for all cluster data
-
etcd
is designed to run across multiple nodes in a cluster, with each note storing a copy of data and participating in maintaining the system’s consistency. In simple terms, every node in etcd cluster holds a full replica of the entire key-value store. -
etcd
uses the Raft consensus algorithm to ensure all node in the cluster agree on the same data and maintain the consistency.
-
- A distributed, highly available key-value store that serves as the store for all cluster data
-
kubebe-apisever (API server)
- This act as a
front-end
of your kubernetes structure. It exposes the necessary information to the outer world. - Handles all REST API request (e.g.
kubectl
commands) and updates the the cluster’s state inetcd
data store - It also validates and processes API request to manage resource
- This act as a
-
kube-scheduler
- Assigns the newly created pod to nodes based on resource requirements, constraints and policies.
- Considers the factors like CPU, memory, storage, and other rules.
-
kube-controller-manager
- Ensures that the overall state of the cluster matches with that of the desired state.
- For example, ReplicaSet controller ensures that the correct number of pod replicas are running
- The Deployment controller ensures that a Deployment’s pods are running as specified.
- Controllers in the manager continuously compare the desired state (stored in etcd) with the actual state (observed in the cluster) and make changes when necessary to bring them into alignment.
- Ensures that the overall state of the cluster matches with that of the desired state.
-
cloud-controller-manager (Optional)
- This comes into picture when your kubernetes cluster is running on a cloud provider and necessary configurations are needed.
- Responsible for tasks like provisioning load balancers, persistent volumes, and assigning external IPs.
Data Plane
- This is the layer responsible for running workloads and managing the networking, storage, and other operational aspects of the application within a cluster. The worker node is within this layer only.
The following shows the overview of the constituents of the data plane
The data plane resides on worker nodes and includes the following:
-
pods
- Smallest deployable unit in Kubernetes, might run one or more container. This is hosted on the worker node.
-
kubelet
- A node agent which runs on every worker node.
- Monitors the health of pod and reports back to
kube-apiserver
- Each worker node has its own kubelet
- Kubelet is responsible for:
- Communicating with control plane to receive pod specification
- Making sure that pod is running in its desired state including its containers.
-
kube-proxy
- A networking component runs on each node
- Maintains the network rules to allow communication between pods (within the node and across the node) and between services.
- This manages traffic by:
- Load balancing traffic to service endpoints
- Manages routing via iptables
- Service Abstraction
- Kubernetes Services provide a stable IP address (ClusterIP) and DNS name to access a group of pods.
- Behind the scenes, kube-proxy manages the routing to ensure packets destined for a Service are forwarded to the appropriate pod(s).
-
Container Runtime
- As we talked earlier, kubernetes requires a container runtime and this is part of the data plane.
Each worker node has its own set of pods, kubelet, kube-proxy and container runtime.
All coming up together
Lets consider a scenario when user requests to create a pod
-
User Request
- Let’s consider user is using
kubectl
- Typically user issues a command
kubectl create -f pod.yaml
- Let’s consider user is using
-
kube-apiserver takes the request
- This request by user to create a pod is now accepted by
kube-apiserver
- The API server acts as the central point for all control plane components and validates the user’s pod request.
- If the request is valid, the API server stores the desired state in the
etcd
- After storing the pod’s desired state, the API server updates its internal state, and communicates this change across the Kubernetes system.
- This request by user to create a pod is now accepted by
-
Scheduler decides on a Node
- Once the pod specification is stored in
etcd
, the kube-scheduler (which runs in the control plane) is responsible for determining which worker node (data plane) should host the pod. - The scheduler places the pod on a node where there is sufficient capacity to meet its resource requests. If there are multiple suitable nodes, the scheduler will pick one based on scheduling policies.
- The scheduling decision is then communicated back to the API server, which updates the pod’s status to reflect that it has been scheduled on a specific node.
- Once the pod specification is stored in
-
Kubelet on the worker node
- The kube-apiserver informs the kubelet running on the worker node where the pod has been scheduled.
- Once the kubelet receives the pod’s scheduling information from the API server, it communicates with the container runtime (e.g., Docker, containerd) to pull the required container images (if they aren’t already available locally).
- The kubelet instructs the container runtime to create containers from the pod specification.
-
Container runtime creates the container
- The runtime ensures that containers are started and that they conform to the requested pod definition, including resources (CPU, memory), environment variables, and volumes.
-
Kubelet monitors the pod and container health
- After the containers are running, the kubelet begins monitoring their health using liveness and readiness probes as specified in the pod configuration.
-
Pod Communicating and Networking
- At this point, the pod is successfully running on the worker node. Kubernetes automatically sets up networking for the pod, ensuring that the containers within the pod can communicate with each other and with other pods.
- This is managed by kube-proxy, which runs on each worker node and handles network routing and load balancing between pods using IPtables or IPVS.
-
Continuous Monitoring
- The kube-controller-manager continuously monitors the state of resources in the cluster. It ensures that the desired state matches the actual state. For example, if a pod is terminated or crashes, the replica controller will ensure that a new pod is created to maintain the desired replica count.
-
Final State in API server
- The API server is updated with the status of the pod, reflecting that the pod is now running and healthy on a worker node.
- Kubernetes continues to manage the pod, monitoring its health and scaling it as necessary based on the desired state defined by the user.
Knowing all this can help you troubleshoot issues more effectively, optimize your deployments, and explore advanced concepts like custom controllers, network policies, and scaling strategies.