Expectations for Agentic Coding Tools: Testing Gemini CLI
What are the 'quality-of-life' expectations for agentic applications? To learn more, we test Gemini CLI, Google's open source AI terminal app.
Jul 7th, 2025 5:28am by
Image via Unsplash.
- The model in use;
- The project directory;
- Any other pertinent permission or account information, or if a working file is being watched.
Starting up Gemini
As with all cloud based LLMs, we must show our fealty before we get access to the precious tokens. Go to Google Studio to generate a key. Currently you are given 100 requests a day (check the other tier limits here). We can install Gemini via npm at the terminal:npm install -g @google/gemini-cli
Next, set your API key as an environment variable — I’m doing it here in the command line on my MacBook:
Then type the command gemini and we are off:
As I mentioned in the quality-of-life section above, this does the important thing of pointing at the active model (Gemini-2.5 Pro in this case) as well as reflecting the project directory.
The theme selection screen disappears as soon as you press return, but I assume you can bring it back. It takes up quite a lot of space on the introduction screen.
Like Claude Code, there is markdown file — GEMINI.md in this case — for request customization. I won’t use it in this post.
What does “no sandbox” mean? The bad news is that Gemini starts off with no restrictions as to where your AI may roam. I’m afraid that isn’t very sensible, but Gemini gives you fairly straightforward options. The good news is that we can use macOS Seatbelt, which starts off with a sensible policy of restricting access to within the project directory.
So I’ll exit this session (type /quit) and we can restart with this basic security.
The quit screen provides some of the stats I referred to earlier:
We can use Seatbelt by just setting an environment variable in this session, then adding a flag:
Now we are good to go, as we have our seatbelt on.
As I did with Codex in a recent post, let’s try out the merge of two JSON files. As before, I’m looking for how the structure supports me, as much as the outcome. If you don’t want to read the previous post, imagine I have a city website that uses JSON data. I have a JSON file called original_cities.json:
{
"cities": [
{
"id": "London",
"text": "London is the capital of the UK",
"image": "BigBen"
},
{
"id": "Berlin",
"text": "Great night club scene",
"image": "Brandonburg Gate",
"imageintended": "Reichstag"
},
{
"id": "Paris",
"text": "Held the Olympics of 2024",
"image": "EifelTower",
}
]
}
{
"cities": [
{
"id": "London",
"text": "London is the capital and largest city in Great Britain",
"image": "BigBen"
},
{
"id": "Berlin",
"text": "Great night club scene but a small population",
"image": "BrandenburgGate",
"imageintended": "Reichstag"
},
{
"id": "Paris",
"text": "Held the Olympics of 2024",
"image": "NotreDame"
},
{
"id": "Rome",
"text": "The Eternal City",
"image": "TheColleseum"
}
]
}
I’ll use the same request I gave to Codex:
“please update the JSON file original_cities.json with the contents of the file updated_cities.json but if the ‘image’ field is different, please update or write a new ‘imageintended’ field with the new value instead”
So let’s see what it does. This task may look specific, but is actually a bit vague, which reflects a request from the average human.
After getting confused about its project file, it gave me a perfectly good answer:
Updating text, adding the new entry and not overwriting any values in the “image” key — all done. It didn’t try to fix inconsequential spelling and didn’t get confused by the trailing comma. It was far quicker than Codex as well.
I checked the file, and indeed the changes were made. Before it answered, it didn’t quite make a plan, but gave me a fairly basic explanation of what it would do:
As the outcome was entirely correct, the process didn’t really matter. But only by checking intentions can you really correct LLM “thinking” when it takes the wrong path.
I’ll exit to show the final expenditure summary:
Conclusion
As I said, this isn’t a direct LLM comparison, but Gemini gave me an efficient agentic experience. I’m sure Google can plug in any of the missing quality-of-life issues I mentioned (specifically, some running stats on token usage), but it is definitely ready for action right now. There is a growing coterie of agentic terminal applications out there for developers to try, and Gemini CLI is a solid addition to that list.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don't miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.