Join our community of software engineering leaders and aspirational developers. Always
stay in-the-know by getting the most important news and exclusive content delivered
fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter
in the past. Click the button below to open the re-subscribe form
in a new tab. When you're done, simply close that tab and continue
with this form to complete your subscription.
The New Stack does not sell your information or share it with
unaffiliated third parties. By continuing, you agree to our
Terms of Use and
Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!
We’re so glad you’re here. You can expect all the best TNS content to arrive
Monday through Friday to keep you on top of the news and at the top of your game.
What’s next?
Check your inbox for a confirmation email where you can adjust your preferences
and even join additional groups.
Follow TNS on your favorite social media networks.
With in-process, open source DuckDB, you can create an in-memory database that does not save data, or you can use a local file. Here's how to get started.
Featured image by Curated Lifestyle, from Unsplash+.
One of the ways in which modern databases silently helped rather poor programs during the first dot-com boom of the late ’90s was that they handled asynchronous queries — often when the rest of the system didn’t.
This led to some comments from Oracle Co-Founder Larry Ellison, to the Guardian in 2001, that are very interesting in retrospect: “Some of these New Economy companies — take pets.com — it’s good that they’re gone. Selling cat food on the internet was lunacy.”
Coming back to today, the attraction of DuckDB, an open source project that released its version 1.0 in June. is that it is an in-process database. So, I build it as part of my executable; I’m not connecting to some totally separate system. It isn’t for selling cat food.
The first thing to get your head around is that the data persistence is not the central concern here — just query processing. You can create an in-memory database that does not persist (i.e., save data) at all, or you can use a local file. The purpose of a DuckDB database is likely to be sucking up some data, querying over it, maybe making some transformations, then going away.
So for working with DuckDB, we just need a library or plug-in, not a new application or service. If I look at the website’s front page, C# is’t mentioned. However, C# is supported via an open source ADO.NET provider. That this exists already certainly proves the ecosystem is probably already quite healthy.
Design Goals
Let’s contemplate the very first line of the example code below, before we even fire up Visual Studio Code:
var duckDBConnection = new DuckDBConnection("Data Source=file.db");
Obviously, “database” is synonymous with persistence, even though that isn’t the primary purpose for DuckDB. In the example above, we use a file as a persistent data source.
The file format is forward and backward compatible, and obviously this is somewhat important for maintenance. But you would probably be unwise to commit to a long-term data strategy with a company that might have a shorter lifetime than your data. But that just underlines the same concept: persistence is not the mainstay here. If we left the data source argument out, or used the keyword :memory:, then we would have an in-memory database.
What about concurrency? The design goals of this project clearly point to not attempting to support complex scenarios, although they have simple modes to allow for optimistic concurrency. The idea is to set up the data, whack it with queries, then go.
Getting Started
So let’s get started with Visual Studio Code:
So, we get a fresh Visual Studio Code in a new project folder. I’ve written about setting up Visual Studio Code before; for now, I’ll just say these are the relevant extensions that I have installed for working with C#:
Create a new project using “.NET new project” with the command palette then add DuckDB.NET.Data.Full using “nuget: add package” with the palette again. These all have command-line equivalents, but I’ll stay in the IDE for this post.
You should now have an empty project, so let’s add the following into the file “Program.cs”:
using DuckDB.NET.Data;
var duckDBConnection = new DuckDBConnection("Data Source=file.db");
duckDBConnection.Open();
var command = duckDBConnection.CreateCommand();
command.CommandText = "CREATE TABLE integers(foo INTEGER, bar INTEGER);";
var executeNonQuery = command.ExecuteNonQuery();
command.CommandText = "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);";
executeNonQuery = command.ExecuteNonQuery();
command.CommandText = "Select count(*) from integers";
var count = command.ExecuteScalar();
Console.WriteLine($"Rows = {count}");
command.CommandText = "SELECT foo, bar FROM integers";
var reader = command.ExecuteReader();
Console.Write($"Columns: ")
for (var index = 0; index < reader.FieldCount; index++)
{
var column = reader.GetName(index);
Console.Write($"{column} ");
}
Console.WriteLine();
while (reader.Read())
{
for (int index = 0; index < reader.FieldCount; index++)
{
if (reader.IsDBNull(index))
{
Console.WriteLine("NULL");
continue;
}
var val = reader.GetFieldValue<int>(index);
Console.Write(val);
Console.Write(" ");
}
}
Running this, we get the following in the terminal output:
Rows = 3
Columns: foo bar
3 4 5 6 7 NULL
Let’s take a look at what happened.
First, we know we should have made a persistence file called “file.db”, and this is visible in the bin directory. Indeed, if you try to run this same code twice we get:
Unhandled exception. DuckDB.NET.Data.DuckDBException 0x0000000D):
Catalog Error: Table with name "integers" already exists!
This is correct, but only because we opened up a persistent database. Comment out the Database Definition Language (DDL) and we can run it as many times as we like. Or change the first line to:
var duckDBConnection = new DuckDBConnection("Data Source=:memory:");
for an in-memory db to run multiple times.
We then used some DDL to create the table.
command.CommandText = "CREATE TABLE integers(foo INTEGER, bar INTEGER);";
var executeNonQuery = command.ExecuteNonQuery();
I guess, as it isn’t Database Manipulation Language (DML), it is executed as a “nonquery.” Somewhat strange nomenclature, to be sure.
When we want a result, we ask specifically for the result type after execution:
command.CommandText = "Select count(*) from integers";
var count = command.ExecuteScalar();
Console.WriteLine($"Rows = {count}");
But count clearly resolves into something we can print.
For detailed results, we get a “Reader” after executing the query. (Yes, these commands could be better named.)
command.CommandText = "SELECT foo, bar FROM integers";
var reader = command.ExecuteReader();
Before we use it, we kind of know the Reader holds a result array that we will have to access via the API. We can simply summarise the rest of the code by looking at how we do this.
for (var index = 0; index < reader.FieldCount; index++) {
var column = reader.GetName(index);
Console.Write($"{column} ");
}
The FieldCount method just counts the columns, and GetName returns the column names. (We could use GetDataTypeName to get the column type.)
Finally, we do a full row-by-row read of the data.
while (reader.Read())
{
for (int index = 0; index < reader.FieldCount; index++)
{
if (reader.IsDBNull(index))
{
Console.WriteLine("NULL");
continue;
}
var val = reader.GetFieldValue<int>(index);
Console.Write(val);
Console.Write(" ");
}
}
We see that the iterator is just calling Read() until it returns null. For each row of records we can then just use GetFieldValue to extract the Integer value. This is quicker than using a separate result set or similar, but makes the code less readable as the Reader is holding both temporary and static data simultaneously.
Conclusion
DuckDB directly supports Python, R, Java, Node.js, Go and Rust, so your environment is almost certainly catered for. This should be a useful tool for both testing scenarios and transforming data on the fly. But I also like the idea of using it to gain SQL query support without worrying about the weight of a full database system.
But if you are planning on rewriting pets.com, use something else.
Percona is widely recognized as a world-class open source database software, support, and services company for MySQL®, MongoDB®, and PostgreSQL® databases. We are dedicated to helping make your databases and applications run better through a unique combination of expertise and open source software.