Fun with Views and CTEs

A view is a stored query the results of which can be treated like a table. Note that it is the query that is saved and not the results of the query. Each time you use a view, its associated query is executed.

A related concept is that of the common table expression or CTE. A CTE can be thought of as a short-lived view; you can only use it within the query in which it appears (you can refer to it multiple times, however).

Let's say that you're the SQL-savvy owner of a store that sells kits for building robots. On your site, customers are guided through the process of selecting all of the components needed to build a quadrupedal robot, i.e. getting the right motors, microcontrollers, sensors, batteries, grippers, etc.

Robots have lots of component parts and you have to make sure that the correct components for

alt text
customers
alt text
purchases
alt text
component_purchases
query1
query_4
tree graph
working_table_1
inner_join_1
working_table_2
inner_join_2
working_table_3
inner_join_3
working_table_4
inner_join_4
working_table_5
subcomponents
purchases_made
pm_pc_join

order arrive or your customers will leave mean comments on social media.

Right now you have the following tables:

alt text

And those tables contain the following data:

customers
alt text
purchases
alt text
component_purchases

You want to see a list of all the purchases made and the components in those purchases. You might write a query like the following:

select p.id as purchase_id,
array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info,
sum(cp.num_components * c.price) as purchase_total_price,
customer_id,
created_at
from
purchases as p
inner join component_purchases as cp on p.id = cp.purchase_id
inner join components as c on c.id = cp.component_id
group by p.id
order by p.created_at

Just a quick explanation in case anything in the query is unclear. We're joining the tables purchases, component_purchases, and components and then grouping the rows by purchase_id. SQLFiddle here.

For each of the components in a purchase, the sum function adds the multiple of the number of units of a component ordered and the price of a component. The array_agg function composes an array of the details of the purchase. Notice that the sum and array_agg functions act across rows, but the concat function acts across columns.

The result of our query is thus:

query1

Check out the SQLFiddle here.

It's not an exceptionally long query, but you suspect that you'll be using it quite a lot and you'd prefer not to have to write the whole thing out all of the time.

That's where views come in:

create view purchase_information as
select p.id as purchase_id,
array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info,
sum(cp.num_components * c.price) as purchase_total_price,
customer_id,
created_at
from
purchases as p
inner join component_purchases as cp on p.id = cp.purchase_id
inner join components as c on c.id = cp.component_id
group by p.id
order by p.created_at;

Now you have a simple way to reuse this query. Let's say that you want to know all of the purchases made within the last week. Now you can just write the following:

select * from
purchase_information
where
created_at > now() - interval '7 months';

SQLFiddle here

From within the PostgreSQL interactive terminal, you can use the command \dv to see a list of all the views in the database.

Dropping a view is as simple as dropping a table:

drop view purchase_information;

Note that if you drop a table using cascade, any views dependent on that table will be dropped as well.

It's also possible to perform operations on the data in the underlying tables of some views. Updatable views are views where data can be edited, inserted, or deleted. To be updatable, a view must be really basic. Think something along the lines of:

create view customer_info as
select first_name, last_name, email
from
customers
where last_name > 'F';

Updatable views can only select rows from a single table (or updatable view). They also cannot contain 'group_by', 'limit', or 'with' clauses.

You can see all of the relevant restrictions in the Postgres docs here.

If you do have an updatable view, you can do something like

insert into customer_info values
('Anne', 'Droid', 'anne.droid@example.com');

A new row for 'Anne Droid' has been inserted into the customer table.
Note that the customer_info view only includes customers whose names begin with letters after 'E'. If you were to run the following query:

select * from customer_info;

Anne's name would not appear in the results because her last name begins with a 'D'. A new row has been inserted into the customers table using the customer_info view, but that row is not accessible via the view. SQLFiddle here

Note that you cannot insert values for columns that are present in the underlying table, but not in the view. The following query would result in an error:

insert into customer_info (first_name, last_name, email, phone) values
('Simon', 'Borg', 'simon.borg@example.com', '555-777-9999');

So, that's views in a nutshell. Let's move on to CTEs.

To start with, CTE stands for 'common table expression'. They're quite similar to views except that once the query is executed, a CTE ceases to be.

In the view purchase_information, we were able to get all of the components in a purchase, as well as the price of the purchase. Let's say I wanted a query that would provide me with purchase information, but also the customers to whom the purchases belonged. This can be achieved pretty easily using a common table expression.
You start a CTE using the keyword 'with' followed by the name of the CTE and the query itself.

with
purchases as
(select p.id as purchase_id,
array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info,
sum(cp.num_components * c.price) as purchase_total_price,
customer_id
from
purchases as p
inner join component_purchases as cp on p.id = cp.purchase_id
inner join components as c on c.id = cp.component_id
group by p.id
order by p.created_at)
select
purchases.purchase_id,
purchases.purchase_total_price,
concat(cu.last_name, ', ', cu.first_name) as customer_name,
concat(cu.email, ', ', cu.phone) as customer_contact_info,
concat(sa.street1, ', ', sa.street2, ', ', sa.postal_code, ', ', sa.city, ', ', sa.region, ', ', sa.country) as customer_address,
component_info
from purchases
inner join customers as cu on purchases.customer_id = cu.id
inner join shipping_addresses as sa on sa.customer_id = cu.id;

SQLFiddle here

The purchases CTE can be referred to like a table. Of course, once the query is executed, purchases doesn't exist.

The results of this query are:

query_4

Phew!
Now you can send the correct robot parts to your customers.

But wait! There's some complexity to the orders that you've forgotten. Some components need to be combined with other components in order to work. For the sake of simplicity, let's say that any given component requires one or fewer other components to function.
Drawing out the relationship between our components, we see that we get a tree graph.

tree graph

To describe this in the database, we'll need to add another column to the component table. The subcomponent_id field will point to the component upon which the current component depends.

Let's say that a customer orders the component with id 4. In order to get all of the parts the customer needs, we'll need traverse the tree all the way down.

Luckily, Postgres has just the solution for this – recursive common table expressions. Recursive CTEs provide a way to allow queries to perform some action with a table, and then to perform the same action on the results of the previous action.

That's probably kind of confusing.

Below is the structure of a recursive common table expression:

with recursive cte_name as (
  query to produce initial working table

  union/union all

  recursive query
)

do something with cte_name

The recursive CTE begins much like a non-recursive CTE. You'll notice that the modifier recursive is required after with. Immediately after recursive is the name of the CTE.

Above the union operator is a query that initializes the working table. The working table is the result set upon which the recursive term is currently acting. For instance, you might put select id, name from robots where name='Cindi Mayweather'. Since there is one and only one of the illustrious Cindi Mayweather, this query will return one row.

Note that this query is not executed again.

We'll come back to what purpose the union operator serves in a bit. For now, let's talk about the recursive query. The recursive query will make a reference to the CTE name. Wherever you see the CTE name in the query, that is a reference to the working table.

Given a single component that needs to be shipped, we need to fetch the ids of all of its subcomponents, and their subcomponents, etc. To get all of the subcomponents for the component with an id of 4, we use the following query:

with recursive purchase_subcomponents as (
  select id, subcomponent_id from components
  where id = 4

  union

  select components.id, components.subcomponent_id from
  components
  inner join purchase_subcomponents
  on purchase_subcomponents.subcomponent_id = components.id
)
select id from purchase_subcomponents;

That's pretty gross-looking, so let's go through what's happening step by step.

Execution of the cte starts with the non-recursive query. Running that query returns the following:

working_table_1

This is the first working table.

We see that an inner join of the working table and the components table produces the following:

inner_join_1

From this, we get the next working table.

working_table_2

The working table is substituted for purchase_components in the recursive query, with the following result:

inner_join_2

And so on…

working_table_3

inner_join_3

working_table_4

inner_join_4

working_table_5

The recursion ends when the working table has no rows, i.e. when the values for the rows are all null. [equivalent???]

Now we come to what union is doing. The union operator acts on all of the working tables that have been generated, combining them all into one table. To ensure that there are no duplicate rows, you use union all.

Hence the result of running the query is

subcomponents

SQLFiddle here

Boom! Nailed it.

Kinda.

If you recall, up top we have a list of purchases with the associated components and customer information. Well, that component list isn't complete; we have to include subcomponents as well.

with recursive
purchases_made as
(
select p.id as purchase_id,
c.id as component_id,
c.name as component_name,
subcomponent_id,
num_components
from
purchases as p
inner join component_purchases as cp on p.id = cp.purchase_id
inner join components as c on c.id = cp.component_id
order by p.created_at
),
purchase_components as (
  select * from purchases_made

  union

  select  
  purchase_id,
  components.id,
  components.name,
  components.subcomponent_id,
  num_components
  from components
  inner join purchase_components
  on purchase_components.subcomponent_id = components.id
)
select
purchase_id,
array_agg(concat('[', num_components::text, ' x cid:', component_id, ', ', component_name, ']')) as purchase_items,
sum(num_components * components.price) as purchase_total_price,
concat(cu.last_name, ', ', cu.first_name) as customer_name,
concat(cu.email, ', ', cu.phone) as customer_contact_info,
concat(sa.street1, ', ', sa.street2, ', ', sa.postal_code, ', ', sa.city, ', ', sa.region, ', ', sa.country) as customer_address
from purchase_components
inner join components on purchase_components.component_id = components.id
inner join purchases on purchases.id = purchase_components.purchase_id
inner join customers as cu on purchases.customer_id = cu.id
inner join shipping_addresses as sa on sa.customer_id = cu.id
group by purchase_id,
cu.last_name,
cu.first_name,
cu.email,
cu.phone,
sa.street1,
sa.street2,
sa.postal_code,
sa.city,
sa.region,
sa.country;

SQLFiddle here

In this query, we have two CTEs. When you want to use multiple CTEs in a query, you place with at the start of the group and separate each CTE with a comma. Notice that, even though only one of the CTEs is recursive, the recursive modifier is placed directly after with, as opposed to directly before the recursive CTE. Each CTE is executed normally. Another nifty thing about using recursive is that its presence means your CTEs can be in any order, i.e. one CTE can make reference to a CTE that is defined later. If you're wondering how the component_id alias showed up in the final results, remember that aliases are preserved in the results of a union.

To understand how this query works, we first need to understand what the results of the purchases_made query are.

purchases_made

SQLFiddle here

This is our first working table.

Following a process similar to the one in the earlier recursive query, the first inner join results in:

pm_pc_join

The results of the purchase_components CTE can be found in this SQLFiddle.

I would recommend drawing out tables in a notebook to really hammer home some of the ideas presented in this blog post. And feel free to play around in the SQLFiddle examples provided. It's a pretty great tool.

Related articles:


Photo by Eduard Militaru on Unsplash

Leave a Reply

Your email address will not be published. Required fields are marked *