OpenAI Codex Deep Dive

7 min readJun 4, 2025

The AI Coding Assistant That’s Actually Worth Your Time

I’ve been knee-deep in AI coding tools for years now, and I’ll be honest — most of them feel like glorified autocomplete with some agentic behaviors. But OpenAI’s latest Codex update has me genuinely excited in a way I haven’t felt since I first discovered version control.

After spending the last few weeks putting the new Codex through its paces, I want to share what I’ve learned about this tool that’s quietly becoming the most capable AI coding assistant I’ve ever used. This isn’t just another “AI will replace developers” hot take — it’s a practical look at what Codex actually does well, where it falls short, and how you can start using it today.

What Makes the New Codex Different

Let me start with what caught my attention: the new codex-1 model isn't just another large language model trained on GitHub. It's a specialized version of OpenAI's o3 architecture that was specifically fine-tuned on real-world coding tasks and pull requests. The difference is immediately noticeable.

Where previous AI coding tools often felt like they were guessing at what you wanted, Codex demonstrates what I can only describe as software engineering intuition. It doesn’t just write code — it follows your project’s coding standards, writes tests, and even handles the mundane but crucial stuff like properly formatting pull requests.

The numbers back this up: 75% accuracy on OpenAI’s internal software engineering benchmarks, beating even the general-purpose o3 model. But more importantly, the code it produces feels like something a thoughtful human developer would write, not the overly clever or oddly structured output I’ve come to expect from AI tools.

Two Ways to Work with Codex

OpenAI offers Codex in two flavors, and understanding the difference is crucial for getting the most out of it.

Codex in ChatGPT: The Cloud-Powered Approach

The ChatGPT integration runs in OpenAI’s cloud and operates as what they call a “software engineering agent.” You connect your GitHub repository, assign it a task, and it works on it in an isolated cloud environment while you focus on other things.

This approach shines when you want to:

Offload time-consuming but straightforward tasks
Work on multiple repositories simultaneously
Let the AI handle setup-heavy work (like dependency management)
Collaborate with team members who can also assign tasks

The real game-changer here is the internet access feature they added in June 2025. Your Codex agent can install packages, run tests that hit staging APIs, and handle all those network-dependent tasks that used to require manual intervention.

Codex CLI: For Terminal Devotees

The command-line interface is a different beast entirely. It’s open-source, runs locally, and integrates directly with your existing development workflow. If you’re someone who lives in the terminal (like me), this feels much more natural.

What I love about the CLI approach:

Your code never leaves your machine (only prompts and context go to OpenAI)
It works with your local git setup, linters, and test runners
You can pipe it into scripts and CI/CD workflows
It supports multimodal input — I can literally screenshot a UI mockup and tell it to build a webpage

The CLI defaults to using the o4-mini model to keep costs reasonable, but you can configure it to use more powerful models for complex tasks.

The Secret Sauce: AGENTS.MD Files

Here’s where Codex gets really interesting. Most AI coding tools are like hiring a brilliant intern who doesn’t know anything about your codebase or team conventions. Codex solves this with AGENTS.MD files—essentially instruction manuals for the AI.

These files let you specify:

How to run your tests (npm test, pytest tests/, etc.)
Your coding standards (“Use Black for Python formatting”)
PR conventions (“Title format: [Fix] Short description”)
Project-specific guidelines (“Use functional components only”)

I’ve found that spending 30 minutes creating a comprehensive AGENTS.MD file dramatically improves the quality of Codex's output. It's like the difference between giving directions to someone who knows your neighborhood versus someone who's never been there before.

Here’s a simple example from one of my Python projects:

markdown

# Tests
run: pytest tests/

# Code Style
Use Black for Python formatting.
Avoid abbreviations in variable names.# Testing
Run pytest tests/ before finalizing a PR.
All commits must pass lint checks via flake8.# PR Instructions
Title format: [Fix] Short description
Include a one-line summary and a "Testing Done" section.

Real-World Use Cases That Actually Work

After weeks of testing, here are the scenarios where Codex genuinely saves me time:

Bug Fixes with Context: I can paste a stack trace and say “fix this error in the authentication module,” and Codex will trace through the codebase, identify the issue, and propose a fix. It’s not magic, but it’s remarkably good at connecting the dots.

Test Generation: This is where Codex really shines. Give it a function, and it’ll write comprehensive test cases, including edge cases I might have missed. It understands testing patterns and frameworks well enough to write tests that actually add value.

Scaffolding and Boilerplate: Need to set up a new API endpoint with proper error handling, validation, and documentation? Codex can generate a solid starting point that follows your project’s patterns.

Code Explanation: I’ve started using it as a “rubber duck” for understanding complex legacy code. It’s surprisingly good at explaining what a gnarly piece of code does and why it might have been written that way.

Multimodal Development: This one surprised me — with the CLI, I can screenshot a design mockup and ask Codex to build the HTML/CSS. It’s not perfect, but it’s a solid starting point that saves hours of initial markup work.

The Security Picture

One concern I had initially was security. Codex handles this better than I expected, with different approaches for each interface:

The ChatGPT version runs everything in ephemeral, isolated cloud containers. Each task gets its own sandbox, and internet access is carefully controlled with domain allowlists and admin oversight.

The CLI is even more security-conscious — it sandboxes operations on macOS using Apple’s Seatbelt technology and recommends Docker containers on Linux. In “Full Auto” mode, it completely blocks network access except to the OpenAI API.

For data privacy, OpenAI doesn’t train on Team/Enterprise users’ code, and other users can opt out of training. The CLI keeps your source code local, only sending prompts and high-level context to the API.

What It Costs and Who Can Use It

The pricing model is straightforward: Codex comes bundled with ChatGPT subscriptions rather than being a separate product.

ChatGPT Plus ($20/month): Gets you full Codex access, including the new internet features
Pro/Team/Enterprise: Higher limits and admin controls
CLI: Free to download, but requires an OpenAI API key (costs depend on model usage)

The decision to include Codex in the Plus tier was smart — it makes advanced AI coding assistance accessible to individual developers and small teams without requiring enterprise budgets.

Honest Assessment: Where It Falls Short

I don’t want to oversell this. Codex isn’t perfect, and there are scenarios where it struggles:

Complex Architecture Decisions: It’s great at implementing features within existing patterns but less helpful when you need to make high-level architectural choices.

Domain-Specific Knowledge: If you’re working with specialized libraries or uncommon frameworks, Codex might not have enough training data to be truly helpful.

Debugging Complex Issues: While it’s good at fixing obvious bugs, it can struggle with subtle race conditions, performance issues, or problems that require deep system knowledge.

Creative Problem Solving: When you need to think outside the box or come up with novel solutions, human creativity still wins.

Getting Started: My Recommendations

If you’re intrigued enough to try Codex, here’s how I’d recommend starting:

Begin with the ChatGPT version if you’re new to AI coding tools. The UI is more forgiving, and you can experiment without worrying about local setup.
Create an AGENTS.MD file early. Even a basic one will dramatically improve your experience.
Start with low-stakes tasks. Try generating tests for existing functions or asking it to explain confusing code before having it write new features.
Use the CLI if you’re terminal-comfortable. The local integration feels more natural for many development workflows.
Set up proper reviews. Codex generates diffs for everything, so establish a workflow for reviewing AI-generated code just like you would for any team member.

The Bigger Picture

What excites me most about Codex isn’t just its current capabilities — it’s what it represents for the future of software development. We’re moving toward a world where AI handles more of the mechanical aspects of coding, freeing developers to focus on architecture, user experience, and creative problem-solving.

This shift requires us to evolve our skills. Code review becomes more important. The ability to clearly communicate requirements and constraints becomes crucial. Understanding how to work effectively with AI systems becomes a core competency.

Codex feels like the first AI coding tool that’s genuinely ready for production use by experienced developers. It’s not trying to replace us — it’s trying to make us more effective. And based on my experience so far, it’s succeeding.

What’s Next?

OpenAI is actively iterating on Codex based on community feedback. The June 2025 update that added internet access and voice dictation shows they’re listening to what developers actually need.

I’m particularly excited about the potential for more sophisticated project understanding and the possibility of Codex learning from team-specific patterns over time. The foundation they’ve built with AGENTS.MD files and sandboxed execution seems designed to support much more advanced capabilities.

If you’re a developer who’s been skeptical about AI coding tools, I’d encourage you to give the new Codex a serious try. It might just change how you think about the role of AI in software development.

Have you tried the new Codex? I’d love to hear about your experiences, especially if you’ve found use cases I haven’t explored yet. The AI coding landscape is evolving rapidly, and the best insights come from developers actually using these tools in the wild.