How to Make SQL Easier to Understand, Test and Maintain

By modularizing the code, you can make SQL easier to understand, easier to test, easier to update and simply better.

Aug 23rd, 2022 11:24am by Chris Saxon

Featued image for: How to Make SQL Easier to Understand, Test and Maintain

Feature image via Pixabay.

We all know that SQL helps us build better applications by giving us an efficient way to store, manipulate and retrieve data from databases. But we’ve also seen SQL statements hundreds of lines long that are practically impossible to test and debug. Don’t let these put you off SQL. By modularizing the code, you can make SQL easier to understand, easier to test, easier to update and simply better.

Chris Saxon

Chris is a developer advocate for Oracle Database, where his job is to help you get the best out of it and have fun with SQL. You can find him on Twitter, @ChrisRSaxon, and on his blog, All Things SQL.

Modularization pulls out bits of code that appear often in SQL statements and turns them into reusable components. Whenever you need to use those bits of code, you simply access the module. If a bug appears in a code module, you can readily identify it — and fix it in one place. This also applies if you find a better way to code a module. You can improve it once and the entire SQL statement benefits from the change. In the process, you can make a complex query shorter and clearer. In other words, modules make SQL better. Most popular cloud databases like MySQL, PostgreSQL and SQL Server enable you to modularize SQL with views and common table expressions (CTEs). For the examples below I will be using Oracle Database, which has many options to split Oracle SQL statements into reusable parts. Let’s see how that works with some examples.

Turn a Complex Query into a Function

Your boss asked you to write a complex query. This must return the total and average salaries for every department that has more than three employees, grouped by region. The first step is to find the departments with at least three staff members. You need to preserve the salaries for each employee to calculate the final average, so grouping by department here won’t work. Instead, you can turn count into an analytic function and filter its result. You must do this using a subquery, so you can use an inline view like this:

select * from (
  select department_id, salary,
     	count(*) over ( partition by department_id ) emp#
  from   hr.employees e
)
where  emp# > 3;

To read this query, you’re effectively working from the inside out. Easy enough with a short query like this. But to complete the report you still need to add the joins to find the region and calculate the totals. It’s better to place this subquery in the with clause.

Refactor Queries with Common Table Expressions

The with clause defines CTEs at the top of the statement. These are named subqueries you can refer to – this is a step toward modularization. Refactoring the previous query to a CTE looks like this

with employees_with_dept_counts as (
  select e.*,
     	count(*) over ( partition by department_id ) emp#
  from   hr.employees e
)
  select * from employees_with_dept_counts
  where  emp# > 3;

This has a few advantages over inline views:

The query reads top to bottom, instead of inside out. This more closely follows natural language, aiding readability.
You can give the query a meaningful name, helping others understand the subquery’s purpose.
You can reuse a CTE in a query, which is useful if it contains expressions you’ll access many times in one statement.

With the CTE in place, next, you need to find the region each employee is in. This requires joining the department through a series of tables like this:

select * from hr.departments d
join   hr.locations l
  using  ( location_id )
join   hr.countries c
  using  ( country_id )
join   hr.regions r
  using  ( region_id );

At this point, you could place these joins in another CTE. Doing this leads to a long statement, and it’s likely you’ll want to repeat these joins in other queries too. This highlights two drawbacks of CTEs:

The named subqueries are local to the query. If you want to reuse them in other SQL statements, you must copy and paste them.
The final statement can still be large. While CTEs help you break up monolithic queries, if you’re accessing many tables, they can still be lots to take in.

You can overcome both these shortcomings by placing the logic in a view, which is another step toward modularization. [Here I’ll note that you can see these examples in action in this recording of my Oracle Database World presentation, or you can join me and other database experts live at Oracle Cloud World in Las Vegas in October. To try the demos in this article yourself, sign up for an Always Free Oracle Cloud account and get the scripts from Live SQL.]

Define Reusable Queries with Views

Views store the text of a SQL statement. Querying a view is the same as running the statement within it. This enables you to define common joins, expressions or filters you can reuse in other queries. You can reference the view like a regular table. To link the employee rows to their region, wrap the joins above in a view and link this to the employees_with_dept_counts CTE:

create or replace view hr.department_country_details as
  select * from hr.departments d
  join   hr.locations l
  using  ( location_id )
  join   hr.countries c
  using  ( country_id )
  join   hr.regions r
  using  ( region_id );
 
with employees_with_dept_counts as (
  select e.*,
     	count(*) over ( partition by department_id ) emp#
  from   hr.employees e
)
  select region_name, salary
  from   employees_with_dept_counts e
  join   hr.department_country_details d
  using  ( department_id )
  where  e.emp# > 3;

To complete the report, you need to group by region and add the totals. You could do this all in the final select above but moving this subquery into another CTE will aid readability. To simplify the report, finish by placing the final query in a view:

create or replace view hr.region_summaries as
with employees_with_dept_counts as (
  select e.*,
     	count(*) over ( partition by department_id ) emp#
  from   hr.employees e
), filtered_employee_regions as (
  select region_name, salary
  from   employees_with_dept_counts e
  join   hr.department_country_details d
  using  ( department_id )
  where  e.emp# > 3
)
  select region_name,
     	sum ( salary ) total_salary,
     	round ( avg ( salary ) ) mean_salary
  from   filtered_employee_regions
  group  by region_name;

You can now generate the report by querying region_summaries. This summary view has a major drawback compared to the view it uses — department_country_details. The staff count filter is hardcoded to three. It’s likely you’ll want to allow users to change this value at runtime. To do this you need to add a parameter to the view. Sadly, this is invalid syntax! The inability to create parameterized views limits their reuse. To enable users to change the minimum employees per department, you had no choice but to ditch the view and reproduce the complete query where needed. Oracle Database has a way around this problem: SQL macros. Let’s see how that works in another modularization example.

Create Reusable SQL Expressions with SQL Macros

SQL macros are functions that define a SQL expression. These expressions can be one of two types: table or scalar.

Table macros return a complete select statement. You use them in the from clause of queries. These are available in Oracle Database 19c and Oracle Autonomous Database
Scalar macros return a formula. You can use them wherever it’s possible to use PL/SQL functions. For example, the select, where and order by clauses. These are available in Oracle Database 21c and Oracle Autonomous Database

Whatever their type, macros always return the expression as text. You can name any of the function’s parameters in the return string. These are placeholders that the database replaces when resolving the macro. This resolution happens at parse time. In this process, the database replaces all calls to SQL macros in the query with the resolved expression. It replaces any of the parameter placeholders in the string with the actual text you used for these parameters in the function call. This enables you to create parameterized views. For example, you could create a table macro to filter the employees per department:

create or replace function employees_filtered_by_dept_size (
  min_emps integer
) return clob sql_macro as
  stmt clob;
begin
  stmt := '
with employees_with_dept_counts as (
  select e.*,
     	count(*) over ( partition by department_id ) emp#
  from   hr.employees e
)
select  *
from	employees_with_dept_counts
where   emp# >= min_emps';
 
  return stmt;
end employees_filtered_by_dept_size;
/

When you call this macro, the database substitutes min_emps with whatever you passed for this parameter. Here are examples of the final where clause when calling this function with different parameters:

employees_filtered_by_dept_size ( 3 ) becomes where emp# >= 3
employees_filtered_by_dept_size ( :min_count_var ) becomes where emp# >= :min_count_var
employees_filtered_by_dept_size ( min_count_fn() ) becomes where emp# >= min_count_fn()

Notice that when passing bind variables and functions, the variable or function themselves become part of the statement. It is a textual find-and-replace of the placeholders (geeky note: it’s more sophisticated than that, but thinking of it as a like-for-like swap is good enough). Swapping the CTE employees_with_dept_counts for the macro employees_filtered_by_dept_size is a huge improvement in logic reuse. But the power of SQL macros goes even further. Filtering a query by the count of one of its columns may be the logic you want to apply to any table or column. In essence, you have this query template, swapping in the <source_table>, <group_column> and <min_count> as needed:

with tab_with_group_counts as (
  select t.*,
     	count(*) over ( partition by <group_column> ) grp#
  from   <source_table> t
)
select  *
from	tab_with_group_counts
where   grp# >= <min_count>;

Previously this was a task for dynamic SQL. This is clunky, hard to debug and easy to leave your application vulnerable to SQL injection. SQL macros avoid these issues by allowing you to pass table and column names as parameters. To do this, use these types first introduced with Polymorphic Table Functions:

dbms_tf.table_t — accepts a table identifier
dbms_tf.columns_t — accepts the columns pseudo-operator, which contains a list of column identifiers

Adding these to the macro enables you to filter any table by the count of any of its columns:

create or replace function filter_by_column_count (
  source_table dbms_tf.table_t,
  group_column dbms_tf.columns_t,
  min_count	integer
) return clob sql_macro as
  stmt clob;
begin
  stmt := '
with tab_with_group_counts as (
  select t.*,
     	count(*) over (
       	partition by ' || group_column (1) || '
     	) grp#
  from   source_table t
)
select  *
from	tab_with_group_counts
where   grp# > min_count';
 
  return stmt;
end filter_by_column_count;
/

This reveals a subtle yet important difference between table and column parameters. You place the table parameter directly in the string, and the database does the replacement for you. But the columns parameter is an array. There’s no implicit way in SQL to convert an array to a string. You must concatenate the values in explicitly. Swapping out the original macro for this generic version gives:

select r.region_name,
   	sum ( salary ) total_salary,
   	round ( avg ( salary ) ) mean_salary
from   filter_by_column_count (
  hr.employees, columns ( department_id ), :min_emps
)
join   hr.department_country_details r
using  ( department_id )
group  by r.region_name;

This is more concise than the original query in the view region_summaries. Each “table” in it encapsulates complex logic. To simplify the final query further, you could convert region_summaries to a table macro too. This accepts the minimum employee count to filter the departments by. Here’s the complete macro with examples of how to call it:

create or replace function region_summaries ( min_emps integer )
  return clob sql_macro as
  stmt clob;
begin
  stmt := '
select r.region_name,
   	sum ( salary ) total_salary,
   	round ( avg ( salary ) ) mean_salary
from   filter_by_column_count (
  hr.employees, columns ( department_id ), min_emps
)
join   hr.department_country_details r
using  ( department_id )
group  by r.region_name';
  return stmt;
 
end region_summaries;
/
 
/* Departments with > 1 staff */
select * from region_summaries ( 1 );
 
/* Departments with > 4 staff */
select * from region_summaries ( 4 );
 
/* Departments with > :min_emps staff */
select * from region_summaries ( :min_emps );

See the Complete Statement

At this point, you’ve made a modularized, parameterized query. But an important question remains: How do you see the whole statement, including all the underlying tables accessed? Once again Oracle Database has a function to do this. Pass your query to dbms_utility.expand_sql_text. This expands table macros and views to show all the base tables and columns in the underlying SQL.

declare
  l_clob clob;
begin
  dbms_utility.expand_sql_text (
	input_sql_text  =>
  	q'!select * from region_summaries ( :min_emps )!',
	output_sql_text => l_clob  );
  dbms_output.put_line ( l_clob );
end;
/

Conclusion

Views have long been a way to define common logic in SQL and reuse this in other statements. But their inability to accept parameters limits when this is possible, thus restricting your ability to use them to effectively modularize your SQL code. Subquery factoring aka CTEs aka the with clause makes it easier to break down large queries into their components. This also enables you to write SQL that you read in the same order as the logical flow. But CTEs are local to each query, so they do little for sharing logic across statements. The introduction of SQL macros in Oracle Database version 19c solves both these problems. You can define reusable units of logic that are fully parameterizable, not only with basic data types but with table and column names too. This enables you to create query templates you can apply to any tables you want. And the result is modular code that is easier to understand, test and improve. As mentioned above, to see these examples in action, watch the recording of my Oracle Database World presentation on this topic or join my session at Oracle Cloud World in Las Vegas. If you’d like to try the demos in this article yourself, sign up for an Always Free Oracle Cloud account, get the scripts from Live SQL and start hacking away!