How to Make SQL Easier to Understand, Test and Maintain
By modularizing the code, you can make SQL easier to understand, easier to test, easier to update and simply better.
Aug 23rd, 2022 11:24am by
Feature image via Pixabay.
Oracle sponsored this post.
Chris Saxon
Chris is a developer advocate for Oracle Database, where his job is to help you get the best out of it and have fun with SQL. You can find him on Twitter, @ChrisRSaxon, and on his blog, All Things SQL.
Turn a Complex Query into a Function
Your boss asked you to write a complex query. This must return the total and average salaries for every department that has more than three employees, grouped by region. The first step is to find the departments with at least three staff members. You need to preserve the salaries for each employee to calculate the final average, so grouping by department here won’t work. Instead, you can turncount into an analytic function and filter its result. You must do this using a subquery, so you can use an inline view like this:
select * from (
select department_id, salary,
count(*) over ( partition by department_id ) emp#
from hr.employees e
)
where emp# > 3;
Refactor Queries with Common Table Expressions
Thewith clause defines CTEs at the top of the statement. These are named subqueries you can refer to – this is a step toward modularization. Refactoring the previous query to a CTE looks like this
with employees_with_dept_counts as (
select e.*,
count(*) over ( partition by department_id ) emp#
from hr.employees e
)
select * from employees_with_dept_counts
where emp# > 3;
- The query reads top to bottom, instead of inside out. This more closely follows natural language, aiding readability.
- You can give the query a meaningful name, helping others understand the subquery’s purpose.
- You can reuse a CTE in a query, which is useful if it contains expressions you’ll access many times in one statement.
select * from hr.departments d
join hr.locations l
using ( location_id )
join hr.countries c
using ( country_id )
join hr.regions r
using ( region_id );
- The named subqueries are local to the query. If you want to reuse them in other SQL statements, you must copy and paste them.
- The final statement can still be large. While CTEs help you break up monolithic queries, if you’re accessing many tables, they can still be lots to take in.
Define Reusable Queries with Views
Views store the text of a SQL statement. Querying a view is the same as running the statement within it. This enables you to define common joins, expressions or filters you can reuse in other queries. You can reference the view like a regular table. To link the employee rows to their region, wrap the joins above in a view and link this to theemployees_with_dept_counts CTE:
create or replace view hr.department_country_details as
select * from hr.departments d
join hr.locations l
using ( location_id )
join hr.countries c
using ( country_id )
join hr.regions r
using ( region_id );
with employees_with_dept_counts as (
select e.*,
count(*) over ( partition by department_id ) emp#
from hr.employees e
)
select region_name, salary
from employees_with_dept_counts e
join hr.department_country_details d
using ( department_id )
where e.emp# > 3;
select above but moving this subquery into another CTE will aid readability.
To simplify the report, finish by placing the final query in a view:
create or replace view hr.region_summaries as
with employees_with_dept_counts as (
select e.*,
count(*) over ( partition by department_id ) emp#
from hr.employees e
), filtered_employee_regions as (
select region_name, salary
from employees_with_dept_counts e
join hr.department_country_details d
using ( department_id )
where e.emp# > 3
)
select region_name,
sum ( salary ) total_salary,
round ( avg ( salary ) ) mean_salary
from filtered_employee_regions
group by region_name;
region_summaries. This summary view has a major drawback compared to the view it uses — department_country_details. The staff count filter is hardcoded to three. It’s likely you’ll want to allow users to change this value at runtime. To do this you need to add a parameter to the view. Sadly, this is invalid syntax!
The inability to create parameterized views limits their reuse. To enable users to change the minimum employees per department, you had no choice but to ditch the view and reproduce the complete query where needed.
Oracle Database has a way around this problem: SQL macros. Let’s see how that works in another modularization example.
Create Reusable SQL Expressions with SQL Macros
SQL macros are functions that define a SQL expression. These expressions can be one of two types: table or scalar.- Table macros return a complete
selectstatement. You use them in thefromclause of queries. These are available in Oracle Database 19c and Oracle Autonomous Database - Scalar macros return a formula. You can use them wherever it’s possible to use PL/SQL functions. For example, the
select,whereandorder byclauses. These are available in Oracle Database 21c and Oracle Autonomous Database
create or replace function employees_filtered_by_dept_size (
min_emps integer
) return clob sql_macro as
stmt clob;
begin
stmt := '
with employees_with_dept_counts as (
select e.*,
count(*) over ( partition by department_id ) emp#
from hr.employees e
)
select *
from employees_with_dept_counts
where emp# >= min_emps';
return stmt;
end employees_filtered_by_dept_size;
/
min_emps with whatever you passed for this parameter. Here are examples of the final where clause when calling this function with different parameters:
employees_filtered_by_dept_size ( 3 )becomeswhere emp# >= 3employees_filtered_by_dept_size ( :min_count_var )becomeswhere emp# >= :min_count_varemployees_filtered_by_dept_size ( min_count_fn() )becomeswhere emp# >= min_count_fn()
employees_with_dept_counts for the macro employees_filtered_by_dept_size is a huge improvement in logic reuse. But the power of SQL macros goes even further.
Filtering a query by the count of one of its columns may be the logic you want to apply to any table or column. In essence, you have this query template, swapping in the <source_table>, <group_column> and <min_count> as needed:
with tab_with_group_counts as (
select t.*,
count(*) over ( partition by <group_column> ) grp#
from <source_table> t
)
select *
from tab_with_group_counts
where grp# >= <min_count>;
dbms_tf.table_t— accepts a table identifierdbms_tf.columns_t— accepts thecolumnspseudo-operator, which contains a list of column identifiers
create or replace function filter_by_column_count (
source_table dbms_tf.table_t,
group_column dbms_tf.columns_t,
min_count integer
) return clob sql_macro as
stmt clob;
begin
stmt := '
with tab_with_group_counts as (
select t.*,
count(*) over (
partition by ' || group_column (1) || '
) grp#
from source_table t
)
select *
from tab_with_group_counts
where grp# > min_count';
return stmt;
end filter_by_column_count;
/
select r.region_name,
sum ( salary ) total_salary,
round ( avg ( salary ) ) mean_salary
from filter_by_column_count (
hr.employees, columns ( department_id ), :min_emps
)
join hr.department_country_details r
using ( department_id )
group by r.region_name;
region_summaries. Each “table” in it encapsulates complex logic.
To simplify the final query further, you could convert region_summaries to a table macro too. This accepts the minimum employee count to filter the departments by.
Here’s the complete macro with examples of how to call it:
create or replace function region_summaries ( min_emps integer )
return clob sql_macro as
stmt clob;
begin
stmt := '
select r.region_name,
sum ( salary ) total_salary,
round ( avg ( salary ) ) mean_salary
from filter_by_column_count (
hr.employees, columns ( department_id ), min_emps
)
join hr.department_country_details r
using ( department_id )
group by r.region_name';
return stmt;
end region_summaries;
/
/* Departments with > 1 staff */
select * from region_summaries ( 1 );
/* Departments with > 4 staff */
select * from region_summaries ( 4 );
/* Departments with > :min_emps staff */
select * from region_summaries ( :min_emps );
See the Complete Statement
At this point, you’ve made a modularized, parameterized query. But an important question remains: How do you see the whole statement, including all the underlying tables accessed? Once again Oracle Database has a function to do this. Pass your query todbms_utility.expand_sql_text. This expands table macros and views to show all the base tables and columns in the underlying SQL.
declare
l_clob clob;
begin
dbms_utility.expand_sql_text (
input_sql_text =>
q'!select * from region_summaries ( :min_emps )!',
output_sql_text => l_clob );
dbms_output.put_line ( l_clob );
end;
/
Conclusion
Views have long been a way to define common logic in SQL and reuse this in other statements. But their inability to accept parameters limits when this is possible, thus restricting your ability to use them to effectively modularize your SQL code. Subquery factoring aka CTEs aka the with clause makes it easier to break down large queries into their components. This also enables you to write SQL that you read in the same order as the logical flow. But CTEs are local to each query, so they do little for sharing logic across statements. The introduction of SQL macros in Oracle Database version 19c solves both these problems. You can define reusable units of logic that are fully parameterizable, not only with basic data types but with table and column names too. This enables you to create query templates you can apply to any tables you want. And the result is modular code that is easier to understand, test and improve. As mentioned above, to see these examples in action, watch the recording of my Oracle Database World presentation on this topic or join my session at Oracle Cloud World in Las Vegas. If you’d like to try the demos in this article yourself, sign up for an Always Free Oracle Cloud account, get the scripts from Live SQL and start hacking away!
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don't miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.