Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.

Implementing Horizontal Pod Autoscaling (HPA) in Kubernetes for a Spring Boot Application 🚀🚀

3 min read

Horizontal Pod Autoscaling (HPA) is a crucial feature in Kubernetes, ensuring that your application can scale based on demand. In this article, we’ll walk through the steps of setting up HPA for a Spring Boot application, using CPU utilization to automatically scale the number of pods running in your Kubernetes cluster.

What is Horizontal Pod Autoscaling (HPA)?

Horizontal Pod Autoscaling is a Kubernetes feature that automatically adjusts the number of pods in a deployment or replica set based on observed metrics like CPU utilization or memory usage. This scaling mechanism ensures that your application can handle varying levels of load effectively, providing resources when demand is high and scaling down when demand is low.

Why is HPA Important?

  • Cost Efficiency: Automatically scaling pods means you only use the necessary resources for your workload. This can significantly reduce the cost of running your application, especially in cloud environments where you pay for the resources you consume.
  • Scalability: With HPA, your application can scale in real-time as traffic increases or decreases, providing a seamless experience for end-users without the need for manual intervention.
  • High Availability: HPA ensures your application remains available under varying traffic conditions by scaling up to handle high demand and scaling down when demand subsides.

Step-by-Step Guide to Implement HPA

In this example, we will use a Spring Boot application exposed on port 8082, with HPA configured to scale the pods based on CPU utilization.

Prerequisites

    1. Install Minikube
      Minikube allows you to run a local Kubernetes cluster on your machine. You can install Minikube using the following Install minikube
  • 2 Install kubectl

1. Define the Spring Boot Application Deployment

We start by defining the Deployment configuration for our Spring Boot application. This YAML file specifies the container image, resources (CPU and memory requests/limits), and replica configuration for the application.
Create deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-boot-hpa-app
  labels:
    app: spring-boot-hpa-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spring-boot-hpa-app
  template:
    metadata:
      labels:
        app: spring-boot-hpa-app
    spec:
      containers:
      - name: spring-boot-hpa-app
        image: bansikah/spring-boot-hpa:latest
        ports:
        - containerPort: 8082
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "500m"
            memory: "1Gi"
  • Replicas: We set the initial number of replicas to 1.
  • Resources: We define resource requests and limits for CPU and memory to ensure efficient resource allocation.
  • Image: The bansikah/spring-boot-hpa:latest image is used for the Spring Boot application.

2. Define the Service

The Service configuration exposes the Spring Boot application internally and externally on a NodePort.
Create service.yaml

apiVersion: v1
kind: Service
metadata:
  name: spring-boot-hpa-service
spec:
  selector:
    app: spring-boot-hpa-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8082
  type: NodePort

This will expose the application on a specific port on each node in your cluster. Kubernetes automatically assigns a port from the range 30000-32767.

3. Define the Horizontal Pod Autoscaler (HPA)

Now, we define the Horizontal Pod Autoscaler to scale the Spring Boot application based on CPU utilization.
Create hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: spring-boot-hpa-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: spring-boot-hpa-app
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Service:
The Service configuration exposes the Spring Boot application internally and externally on a NodePort.

  • minReplicas: The minimum number of pods is set to 1.
  • maxReplicas: The maximum number of pods is set to 5.
  • metrics: The scaling is based on CPU utilization, with the target set to 50% utilization.

4. Metrics Server:

HPA relies on the Metrics Server to gather resource usage data. In Minikube, you can enable the Metrics Server as follows:

minikube addons enable metrics-server

5. Deploy the Application

Once the YAML files are prepared, apply them to your Kubernetes cluster:

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f hpa.yaml

6. Generate Load to Test HPA

You can use wrk, a benchmarking tool, to generate load and test the scaling of your application.
Install wrk:

sudo apt update
sudo apt install wrk

##for centos
sudo apt update
sudo apt install wrk

##for macos
brew install wrk

Generate load by making HTTP requests to the /cpu endpoint:

minikube ip ##to get your minikube ip address
kubectl get services ## go get the service nodeport
## apply to increase utilization
wrk -t4 -c50 -d30s http://<minikube-ip>:<node-port>/cpu

This command simulates traffic to the application, which will cause the CPU utilization to rise. As the CPU utilization crosses the 50% threshold, HPA will scale the pods up.

7. Monitor the Scaling

# Check the status of the Horizontal Pod Autoscaler
kubectl get hpa

# Check the status of the pods
kubectl get pods

Results

Scaled

but initially without increasing the utilization or after sometime you will have this

hpa first

The pods can be scaled either from 1-5 depending on the workload so you might see 3 or 2 or 4 then it can be scaled back down to 1 as we can see above

Conclusion

Horizontal Pod Autoscaling (HPA) is an essential feature for maintaining application performance under varying loads. By automatically scaling the number of pods in a deployment, Kubernetes ensures that your application can handle increased traffic while optimizing resource usage. In this guide, we’ve set up HPA for a Spring Boot application, using CPU utilization as the metric for scaling.

This approach helps ensure that your application remains responsive and cost-efficient, automatically adjusting the number of pods based on traffic demand.

I also want to talk about this awesome repository. It is a project where i will be doing some awesome devops stuffs hand-on learning and solving challenges here is the link Devops repo, If you are interested in devops stuffs and want to contribute your knowledge you can fork the repo and make a pull request following the contributions guidelines and I will look at review.
Thank you so much for following up till this point 😊

Happy Coding 😊👨🏻‍💻👋

Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.