Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.

Running Ollama LLM on Kubernetes Locally: A Guide for Your Laptop

2 min read

Ollama makes it easy to run large language models locally, and combining it with Kubernetes can give you a flexible, containerized environment for AI development. In this guide, I’ll walk you through setting up Ollama on a local Kubernetes cluster that runs right on your laptop.

Prerequisites

  • A laptop with at least 16GB RAM (more is better for larger models)
  • Docker Desktop installed
  • Basic familiarity with Kubernetes concepts
  • kubectl command-line tool

Step 1: Set Up a Local Kubernetes Cluster

First, let’s check if we have any existing Kubernetes clusters running:

It appears we need to set up our Kubernetes environment first. For a local laptop setup, I recommend using minikube or kind. Let’s create a new namespace for our Ollama deployment:

# Start minikube (if not already running)
minikube start --driver=docker --cpus=4 --memory=8g

# Create a namespace for our Ollama deployment
kubectl create namespace ollama

Step 2: Create a Deployment for Ollama

Let’s create a deployment file for Ollama. We’ll call it ollama-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
volumeMounts:
- name: ollama-data
mountPath: /root/.ollama
volumes:
- name: ollama-data
persistentVolumeClaim:
claimName: ollama-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ollama-pvc
namespace: ollama
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
name: ollama-service
namespace: ollama
spec:
selector:
app: ollama
ports:
- port: 11434
targetPort: 11434
type: LoadBalancer

Apply this configuration with:

kubectl apply -f ollama-deployment.yaml

Step 3: Wait for the Deployment to Complete

Let’s check the status of our deployment:

kubectl -n ollama get pods
kubectl -n ollama get services

Wait until the pod status is “Running” and the service has been assigned an external IP.

Step 4: Pull a Language Model

Now that Ollama is running in your Kubernetes cluster, you can pull a language model. For a laptop environment, I recommend starting with a smaller model like Llama 2 7B:

# Get the service IP address
export OLLAMA_HOST=$(minikube service -n ollama ollama-service --url)

# Pull the model
curl -X POST $OLLAMA_HOST/api/pull -d '{"name": "llama2:7b"}'

This will download and prepare the model, which may take a few minutes depending on your internet connection and laptop specs.

Step 5: Test Your Deployment

Let’s test that everything is working correctly:

# Generate a simple response
curl -X POST $OLLAMA_HOST/api/generate -d '{
"model": "llama2:7b",
"prompt": "Write a haiku about Kubernetes",
"stream": false
}'

Port Forwarding for Easy Access

To make it easier to access your Ollama instance, you can set up port forwarding:

kubectl -n ollama port-forward svc/ollama-service 11434:11434

This will make Ollama available at http://localhost:11434.

Advanced Configuration: Resource Management

For laptop environments, you’ll want to be careful with resource allocation. Modify the resources section in your deployment YAML to match your laptop’s capabilities:

resources:
requests:
memory: "2Gi" # Lower for less powerful laptops
cpu: "1"
limits:
memory: "6Gi" # Adjust based on your available RAM
cpu: "2" # Adjust based on your CPU

Cleaning Up

When you’re done working with Ollama, you can clean up your resources:

kubectl delete namespace ollama
# Or if using minikube
minikube stop

Conclusion

Running Ollama on Kubernetes locally gives you a flexible environment for AI development that’s portable and reproducible. This setup allows you to experiment with different models and configurations while keeping everything neatly containerized.

By following this guide, you’ve set up a local AI environment that can run various language models right on your laptop. This approach is perfect for development, testing, and learning about both Kubernetes and large language models.

Remember that the performance will depend on your laptop’s specifications, so start with smaller models and adjust resource allocations accordingly.

Happy coding!

Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.
Ask Kubeex
Chatbot