Code in Context: How AI Can Help Improve Our Documentation

Revisiting a documentation sprint to explore how LLM-powered tools like Unblocked can help us understand and explain complex codebases.

Apr 8th, 2024 7:12am by Jon Udell

Featued image for: Code in Context: How AI Can Help Improve Our Documentation

Photo by Adam Davis on Unsplash.

Although I’ve written a few Steampipe plugins, they’ve only required a basic understanding of the plugin SDK. I’m surely not the only one who has struggled to grasp its more advanced mechanisms. At our annual company hackathon in 2022, I participated in a weeklong sprint to improve the SDK’s documentation, working with several team members including Steampipe’s lead developer (who is the author of the SDK). The exercise yielded a fun essay on literate programming in Go, but the results didn’t really move the needle.

I’ve read a lot of plugin code since then, and written some, but still didn’t feel confident in my ability to understand, apply, and explain several key patterns. I’m certain that LLM assistance would have improved the outcome of our 2022 doc sprint. We can’t repeat that experiment, but I took another run at it with the help of Unblocked, a new LLM-backed tool for developers that focuses less on writing code (though it does that too), and more on understanding it. As founder Dennis Pilarinos says:

You’re almost always inheriting a code base that’s been around for a long time, so the hard part isn’t writing the code to implement the feature or fix the bug. It’s trying to understand why the application works the way it does, who the best person is to talk to, when these changes were introduced, and contextualizing all that to move forward.

To that end, Unblocked can not only ingest your code repositories, as can Sourcegraph’s Cody, but also related material — your website, your product documentation, your conversations in GitHub issues and Slack — in order to provide a service that I call context assembly. I picked up that term from Jack Ozzie, back when he was working with his brother Ray on Groove, a peer-to-peer successor to Ray’s greatest hit, Lotus Notes, which pioneered what became known as knowledge management. Like Notes, Groove brought information work into shared spaces where you could search your mail, calendars, documents, and data all at once.

Nowadays we work much too hard searching Slack and Google Docs and GitHub and email in order to build context for the work we really need to be doing. If we had effective metasearch that would help. But even then we’d just be running parallel keyword queries and doing the taxing work of context assembly in our heads, or transferring the tax to coworkers. Unblocked aims to lighten the cognitive tax burden that software teams bear.

This was my chance to revisit our 2022 doc sprint, armed with the kind of tool I wish we’d had then. One of our goals was to enrich the docs with examples of SDK idioms, along with links to the source code where those examples can be seen in context. Unblock’s ability to find such examples was probably the biggest benefit it delivered during this rewrite. At one point I was looking for examples of plugin code that uses a mixture of a List function (which populates some of the columns of a Steampipe table with the results of a primary API call) and a Hydrate function (which fills additional columns with results from an auxiliary API call). The words “List” and “Hydrate” appear everywhere: in the code for plugin SDK itself, and in code for every plugin. If you’re looking for an example that combines the two kinds of functions, a conventional code search is hopeless. But you can ask Unblocked a question like this:

Where’s a good example of List + Hydrate?

And get an answer like this:

A good example of a List function working in conjunction with a Hydrate function can be found in the steampipe-plugin-aws repository. In the table_aws_fms_app_list.go file, the listFmsAppLists function is used as the List function to fetch all AWS Firewall Manager (FMS) applications.This function fetches all FMS applications. For each application, it streams the application as a row to the table.

Then, the getFmsAppList Hydrate function is used to fetch more details for each application. This function makes an API call to AWS FMS’s GetAppsList API with a specific list ID and returns the result as a single row.

This is a common pattern in Steampipe plugins: a List function fetches a list of resources, and then one or more Hydrate functions fetch additional details for each resource.

This is a profoundly different kind of search than we’re used to. I can’t pretend to know how it works, and I’m not sure anyone can fully explain it, but my naive understanding is that when you ask this kind of question of an LLM you are not looking for words, you are looking for a pattern. Tools that can identify and locate such patterns within a corpus of documents will, I think, help us finally realize the promise of knowledge management — in the software realm, but also more widely.

On Ben Lorica’s Data Exchange, Chris Wiggins delivered a masterful survey of the history of what we now call AI which, he pointed out, used to be called machine learning, but even before that was known as pattern recognition.

Here’s how Ward Cunningham introduced the pioneering Portland Pattern Repository (which was also the world’s original wiki).

Patterns are the recurring solutions to the problems of design. People learn patterns by seeing them and recall them when need be without a lot of effort. Patterns link together in the mind so that one pattern leads to another and another until familiar problems are solved. That is, patterns form languages, not unlike natural languages, within which the human mind can assemble correct and infinitely varied statements from a small number of elements.

Like other LLM-backed tools, Unblocked has access to global documents as well as the local ones it can ingest. That enabled it to help me tease out the relationships between the schemas that define Steampipe tables, the corresponding data structures in the AWS Go SDK that wraps the underlying AWS APIs, and the raw APIs themselves. Our systems are increasingly layered in these ways. Pattern-oriented search across code and documentation, in both global and local contexts, feels like a powerful way to navigate the layers.

Explaining the Patterns

My goal was to clarify patterns supported by the Steampipe plugin SDK and baked into the suite of plugins built on top of the SDK. While finding examples of such patterns was Unblocked’s major contribution to the rewrite, it also helped me explain them. We weren’t starting from scratch — there was plenty of material in both source code comments and on the website. That meant I could apply rule 4 from Best Practices for Working with Large Language Models: Ask for choral explanations. In Mike Caulfield’s original formulation of the idea, question-answering sites like Quora and StackOverflow invite choruses of explanations.

Each response takes a different approach to providing an answer. As you read multiple responses some click with you, and some don’t. Some are above your head, and some ridiculously simplified. Some exercise metaphorical thinking, others dive into math.

I apply rule 4 in a couple of ways. Often, nowadays, I’ll ask the same question of ChatGPT, Claude, and Gemini. It’s quick and easy to do that, and for any given question the answer that clicks might come from any of the three. But it can also be valuable to pose a single LLM the same question several times, phrased in different ways to elicit different kinds and levels of explanation. In this case, the pre-existing documentation was probably sufficient for an expert like José Reyes who can jump into a code base and intuit — immediately and deeply — what’s going on. I’m not like that, and I’m sure many others aren’t either. In a math class, I’d be the kind of student who can’t grok a shorthand explanation of a proof but would instead need to see the steps of the proof spelled out in detail, ideally shown in a few different ways.

It would disrupt the class to ask a teacher to do that, which is why I’m particularly excited about Khan Academy’s new AI tutor, Khanmigo.

At any point, learners can ask questions in order to reframe the material in ways that make the most sense to them. The instructor never grows impatient, and the rest of the class is never inconvenienced, it’s wonderful.

Likewise, it would have been disruptive for me to ask our architects and lead developers for such reframings. To the extent that Unblocked could deliver them I was unblocked — no pun intended! — in my effort to better understand and explain our system. I don’t want to oversell this effect, which I would characterize as limited and nascent, but it’s real and it points to a powerful new way to gain code understanding.

Reviewing the Improvements

As I worked through each section of the rewrite, I repeatedly prompted Unblocked with my proposed new version and invited review. Sometimes it found nothing to add or change. That was a signal the section was doing its job. Not an infallible signal, of course! But a useful one nevertheless.

Sometimes, though, Unblocked made substantive contributions. Here is its review of a complete draft.

These were good suggestions that I included almost verbatim. If it were possible to credit Unblocked as a co-author of the pull request that resulted from this rewriting exercise, I would. It felt like a true collaboration, which is, in my view, the best-case scenario for AI.

Stress Tests for Documentation

Unrelated to the rewrite described here, I recently had a question about Datatank, a feature of Pipes (the hosted version of Steampipe) that persists in otherwise transient query results. My question was: can you edit the SQL that defines a Datatank custom query? I was pretty sure the answer was yes, but it had been a while since I’d used the feature, so I asked Unblocked and it said no.

I rechecked and confirmed that you can indeed edit that SQL, so I marked Unblocked’s answer as wrong, Then I took another look at the documentation and, sure enough, we didn’t explicitly say that you can edit the SQL. So I retracted my complaint and added this clarification to the doc.

Note: If you edit the source query and press Save your query runs immediately to refresh the data, and then repeats according to the schedule.

Later I asked the same question and got this answer.

That was partly correct. Yes, you can edit the SQL. But no, you needn’t wait until the next scheduled update. I’d been unsure of that myself, so I made a test update to confirm that (as the note I added explains) the query runs immediately and then per schedule.

So Unblocked didn’t get it entirely right. But, as always, you should apply rule 2: Never trust, always verify. Unblocked gives you the link. Use it!

My takeaway here was something I’d never considered. Once tools like Unblocked have ingested our documentation, we can ask questions we expect the docs to answer and check to see if they actually do. Such tools could even suggest questions to ask in order to do such stress testing. And in fact, Unblocked already does that.

Adjusting the docs to account for all of these questions might be overkill, but it’s useful to consider them.

Revise and Extend

Writing documentation from scratch is as uncommon as writing code from scratch. More typically, you’re updating or expanding or refactoring existing docs. My expectation was that an LLM-powered tool primed with both code and documentation could provide a powerful assist, and Unblocked did.

I don’t know how to measure the boost it gave me. But I do know that I’ll never again want to undertake this kind of project without a tool that can help me assemble the necessary context.

Jon Udell is an author and software developer who explores software tools and technologies and explains them in writing, audio, and video. He is the author of the cult classic Practical Internet Groupware. Past gigs include Lotus, BYTE magazine, Safari...