Deploy Agentic AI Workflows With Kubernetes and Terraform

Containerize a Python-based LLM agent Kubernetes cluster with Terraform and deploy a production-ready agentic AI workflow with autoscaling and monitoring.

Nov 26th, 2025 9:00am by Oladimeji Sowole

Featued image for: Deploy Agentic AI Workflows With Kubernetes and Terraform

Image by Gerd Altmann from Pixabay.

AI agents are evolving from simple, prompt-based assistants into complex, multiagent systems capable of reasoning, memory retention and collaboration. However, most development teams still face a bottleneck: deployment. Creating a powerful agent in a notebook is one thing; running it reliably in production with scalability, resilience and automation is another. This is where Kubernetes and Terraform shine. Kubernetes (K8s) provides scalable orchestration for containerized workloads, while Terraform allows you to define and provision your infrastructure using code. Together, they form the foundation for cloud native AI systems that can scale intelligently as workloads grow. Let’s build and deploy an agentic AI workflow using a Python-based large language model (LLM) agent, containerize it with Docker and deploy it to a Kubernetes cluster provisioned via Terraform. Whether you’re a developer, architect or technical leader, this will show you how to move from prototype to production with confidence.

Architecture Overview

Here’s the high-level design of the system:

Agentic workflow: Introduce a LangChain-powered Python AI agent that responds intelligently to data queries.
Docker containerization: Package the agent’s environment for portability.
Terraform infrastructure: Provision cloud resources (VMs, networking and Kubernetes cluster).
Kubernetes deployment: Run the agent workflow as a microservice with autoscaling.
Load balancing and monitoring: Enable external access and observability.

Step 1: Create the Agentic AI Workflow

Begin by creating a Python-based AI agent using LangChain and OpenAI APIs.

Python Script: `agent_app.py`

import os
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.memory import ConversationBufferMemory

# Load and validate API key
openai_api_key = os.environ.get("OPENAI_API_KEY")
if not openai_api_key:
    raise ValueError("OPENAI_API_KEY must be set before running this script.")

# Initialize model
llm = ChatOpenAI(
    model="gpt-4",
    temperature=0,
    openai_api_key=openai_api_key
)

# Memory for context retention
memory = ConversationBufferMemory(memory_key="chat_history")

# Simple data retrieval tool
def fetch_data(query: str):
    # Simulated data retrieval
    return f"Data retrieved for query: {query}"

tools = [
    Tool(
        name="DataFetcher",
        func=fetch_data,
        description="Fetches business data for analysis."
    )
]

# Initialise agent
agent = initialize_agent(
    tools,
    llm,
    agent="chat-conversational-react-description",
    memory=memory
)



# REST API for interaction
from flask import Flask, request, jsonify
app = Flask(__name__)

@app.route("/ask", methods=["POST"])
def ask():
    user_input = request.json.get("query")
    response = agent.run(user_input)
    return jsonify({"response": response})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080)

Explanation:

The LangChain agent handles multistep reasoning using GPT-4.
Memory stores conversation context for adaptive responses.
A Flask API exposes the agent’s logic to external users and systems.

Step 2: Containerize With Docker

Next, package this app into a portable container image.

Dockerfile

# Base image
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy files
COPY . .

# Install dependencies
RUN pip install --no-cache-dir flask langchain-openai langchain openai

# Expose the Flask port
EXPOSE 8080

# Command to run the app
CMD ["python", "agent_app.py"]

Build and Test the Image

docker build -t agentic-ai-app:latest .
docker run -p 8080:8080 agentic-ai-app

Explanation:

Docker encapsulates all dependencies, making the agent easily deployable in any environment: local, cloud or on premises.

Step 3: Define Infrastructure With Terraform

Define the cloud infrastructure with a managed Kubernetes cluster and Terraform. Here’s AWS as an example. (Value: You can adapt it for Google Cloud Platform [GCP] or Azure.)

Terraform Configuration: `main.tf`

provider "aws" {
  region = "us-east-1"
}

# Create a VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "agentic-ai-vpc"
  }
}

# Public Subnet 1
resource "aws_subnet" "subnet1" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  map_public_ip_on_launch = true

  tags = {
    Name = "agentic-ai-subnet-1"
  }
}

# Public Subnet 2
resource "aws_subnet" "subnet2" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.2.0/24"
  map_public_ip_on_launch = true

  tags = {
    Name = "agentic-ai-subnet-2"
  }
}

# EKS Cluster
module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  cluster_name    = "agentic-ai-cluster"
  cluster_version = "1.29"

  vpc_id  = aws_vpc.main.id
  subnets = [
    aws_subnet.subnet1.id,
    aws_subnet.subnet2.id
  ]

  manage_aws_auth = true

  tags = {
    Environment = "dev"
    Project     = "agentic-ai"
  }
}

output "cluster_endpoint" {
  value = module.eks.cluster_endpoint
}

Initialize and Apply Terraform

terraform init
terraform apply -auto-approve

Explanation:

Terraform provisions your AWS virtual private cloud (VPC) and deploys an Elastic Kubernetes Service (EKS) cluster. The `output` provides your cluster’s endpoint for connection.

Step 4: Deploy the Agent to Kubernetes

Once your cluster is ready, it’s time to configure kubectl and deploy the agent.

Kubernetes Deployment File: `deployment.yaml`

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agentic-ai
spec:
  replicas: 2
  selector:
    matchLabels:
      app: agentic-ai
  template:
    metadata:
      labels:
        app: agentic-ai
    spec:
      containers:
        - name: agentic-ai
          image: agentic-ai-app:latest
          ports:
            - containerPort: 8080
          env:
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: openai-secret
                  key: api_key
---
apiVersion: v1
kind: Service
metadata:
  name: agentic-ai-service
spec:
  type: LoadBalancer
  selector:
    app: agentic-ai
  ports:
    - port: 80
      targetPort: 8080

Deploy to Cluster

kubectl apply -f deployment.yaml

Explanation:

The deployment ensures high availability with replicas, while the LoadBalancer service exposes your agentic workflow to the internet.

To test:

curl -X POST http://<load-balancer-endpoint>/ask -H "Content-Type: application/json" -d '{"query": "Analyse quarterly revenue trends"}'

Step 5: Add Monitoring and Autoscaling

To make the deployment production-grade, add monitoring and horizontal scaling.

Enable Autoscaling

kubectl autoscale deployment agentic-ai --cpu-percent=70 --min=2 --max=5

Monitor Logs

kubectl logs -f deployment/agentic-ai

Tip:

For advanced monitoring, integrate Prometheus and Grafana, or use managed AWS CloudWatch dashboards.

Step 6: Continuous Learning Pipeline (Optional Enhancement)

Incorporate continual learning by enabling the agent to store and reuse knowledge from past interactions. For example, you could integrate with Pinecone or LlamaIndex to store embeddings of previous user queries and responses.

from llama_index import VectorStoreIndex, Document

# Persist new learning
def learn_from_interaction(question, response):
    doc = Document(text=f"Q: {question}\\nA: {response}")
    index.insert(doc)
    index.save_to_disk(\"./vector_memory.json\")

Business and Technical Takeaways

For Developers

This setup allows modular and scalable AI workflows.
Agents can run in multiple containers, handling large-scale user interactions.
Infrastructure changes are version-controlled via Terraform for traceability.

For Tech Leaders and CEOs

Deploying AI agents on Kubernetes ensures high availability, security and cost-efficiency.
Infrastructure as Code (IaC) with Terraform provides reproducibility and governance.
The system can scale seamlessly — an agent that starts small can serve thousands of requests in production.

Shipping Complex, Multiagent Systems

AI innovation doesn’t end at the model level — it’s realized through deployment and scalability. By combining Terraform and Kubernetes, you can transform your intelligent agents into production-ready, cloud native systems that grow and adapt alongside your business needs. This full-stack approach bridges the gap between AI research and reliable software engineering. It empowers organizations to move beyond proof-of-concept experiments and confidently integrate AI into their infrastructure. Whether you’re deploying a customer support assistant, financial analysis agent or R&D copilot, the combination of Agentic AI, Kubernetes and Terraform gives you a scalable blueprint for the future of intelligent automation.

Oladimeji Sowole is a member of the Andela Talent Network, a private marketplace for global tech talent. A Data Scientist and Data Analyst with more than 6 years of professional experience building data visualizations with different tools and predictive models...