Join our community of software engineering leaders and aspirational developers. Always
stay in-the-know by getting the most important news and exclusive content delivered
fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter
in the past. Click the button below to open the re-subscribe form
in a new tab. When you're done, simply close that tab and continue
with this form to complete your subscription.
The New Stack does not sell your information or share it with
unaffiliated third parties. By continuing, you agree to our
Terms of Use and
Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!
We’re so glad you’re here. You can expect all the best TNS content to arrive
Monday through Friday to keep you on top of the news and at the top of your game.
What’s next?
Check your inbox for a confirmation email where you can adjust your preferences
and even join additional groups.
Follow TNS on your favorite social media networks.
Deploy Agentic AI Workflows With Kubernetes and Terraform
Containerize a Python-based LLM agent Kubernetes cluster with Terraform and deploy a production-ready agentic AI workflow with autoscaling and monitoring.
AI agents are evolving from simple, prompt-based assistants into complex, multiagent systems capable of reasoning, memory retention and collaboration. However, most development teams still face a bottleneck: deployment. Creating a powerful agent in a notebook is one thing; running it reliably in production with scalability, resilience and automation is another.
This is where Kubernetes and Terraform shine. Kubernetes (K8s) provides scalable orchestration for containerized workloads, while Terraform allows you to define and provision your infrastructure using code. Together, they form the foundation for cloud native AI systems that can scale intelligently as workloads grow.
Let’s build and deploy an agentic AI workflow using a Python-based large language model (LLM) agent, containerize it with Docker and deploy it to a Kubernetes cluster provisioned via Terraform. Whether you’re a developer, architect or technical leader, this will show you how to move from prototype to production with confidence.
Architecture Overview
Here’s the high-level design of the system:
Agentic workflow: Introduce a LangChain-powered Python AI agent that responds intelligently to data queries.
Docker containerization: Package the agent’s environment for portability.
Terraform infrastructure: Provision cloud resources (VMs, networking and Kubernetes cluster).
Kubernetes deployment: Run the agent workflow as a microservice with autoscaling.
Load balancing and monitoring: Enable external access and observability.
Step 1: Create the Agentic AI Workflow
Begin by creating a Python-based AI agent using LangChain and OpenAI APIs.
Python Script: `agent_app.py`
import os
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.memory import ConversationBufferMemory
# Load and validate API key
openai_api_key = os.environ.get("OPENAI_API_KEY")
if not openai_api_key:
raise ValueError("OPENAI_API_KEY must be set before running this script.")
# Initialize model
llm = ChatOpenAI(
model="gpt-4",
temperature=0,
openai_api_key=openai_api_key
)
# Memory for context retention
memory = ConversationBufferMemory(memory_key="chat_history")
# Simple data retrieval tool
def fetch_data(query: str):
# Simulated data retrieval
return f"Data retrieved for query: {query}"
tools = [
Tool(
name="DataFetcher",
func=fetch_data,
description="Fetches business data for analysis."
)
]
# Initialise agent
agent = initialize_agent(
tools,
llm,
agent="chat-conversational-react-description",
memory=memory
)
# REST API for interaction
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route("/ask", methods=["POST"])
def ask():
user_input = request.json.get("query")
response = agent.run(user_input)
return jsonify({"response": response})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080)
Explanation:
The LangChain agent handles multistep reasoning using GPT-4.
Memory stores conversation context for adaptive responses.
A Flask API exposes the agent’s logic to external users and systems.
Step 2: Containerize With Docker
Next, package this app into a portable container image.
Dockerfile
# Base image
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Copy files
COPY . .
# Install dependencies
RUN pip install --no-cache-dir flask langchain-openai langchain openai
# Expose the Flask port
EXPOSE 8080
# Command to run the app
CMD ["python", "agent_app.py"]
Build and Test the Image
docker build -t agentic-ai-app:latest .
docker run -p 8080:8080 agentic-ai-app
Explanation:
Docker encapsulates all dependencies, making the agent easily deployable in any environment: local, cloud or on premises.
Step 3: Define Infrastructure With Terraform
Define the cloud infrastructure with a managed Kubernetes cluster and Terraform. Here’s AWS as an example. (Value: You can adapt it for Google Cloud Platform [GCP] or Azure.)
Terraform provisions your AWS virtual private cloud (VPC) and deploys an Elastic Kubernetes Service (EKS) cluster. The `output` provides your cluster’s endpoint for connection.
Step 4: Deploy the Agent to Kubernetes
Once your cluster is ready, it’s time to configure kubectl and deploy the agent.
Incorporate continual learning by enabling the agent to store and reuse knowledge from past interactions. For example, you could integrate with Pinecone or LlamaIndex to store embeddings of previous user queries and responses.
from llama_index import VectorStoreIndex, Document
# Persist new learning
def learn_from_interaction(question, response):
doc = Document(text=f"Q: {question}\\nA: {response}")
index.insert(doc)
index.save_to_disk(\"./vector_memory.json\")
Business and Technical Takeaways
For Developers
This setup allows modular and scalable AI workflows.
Agents can run in multiple containers, handling large-scale user interactions.
Infrastructure changes are version-controlled via Terraform for traceability.
For Tech Leaders and CEOs
Deploying AI agents on Kubernetes ensures high availability, security and cost-efficiency.
Infrastructure as Code (IaC) with Terraform provides reproducibility and governance.
The system can scale seamlessly — an agent that starts small can serve thousands of requests in production.
Shipping Complex, Multiagent Systems
AI innovation doesn’t end at the model level — it’s realized through deployment and scalability. By combining Terraform and Kubernetes, you can transform your intelligent agents into production-ready, cloud native systems that grow and adapt alongside your business needs.
This full-stack approach bridges the gap between AI research and reliable software engineering. It empowers organizations to move beyond proof-of-concept experiments and confidently integrate AI into their infrastructure.
Whether you’re deploying a customer support assistant, financial analysis agent or R&D copilot, the combination of Agentic AI, Kubernetes and Terraform gives you a scalable blueprint for the future of intelligent automation.
Andela provides the world’s largest private marketplace for global remote tech talent driven by an AI-powered platform to manage the complete contract hiring lifecycle. Andela helps companies scale teams & deliver projects faster via specialized areas: App Engineering, AI, Cloud, Data & Analytics.