Worksheet 5: Transactions, joins, subquerries#
import numpy as np
import pandas as pd
%load_ext sql
%config SqlMagic.displaylimit = 20
%config SqlMagic.autolimit = 30
Make sure to edit the credentials.json
file with your details
# load the credentials from the credentials.json file
import json
import urllib.parse
with open('data/credentials.json') as f:
login = json.load(f)
username = login['user']
password = urllib.parse.quote(login['password'])
host = login['host']
port = login['port']
%sql postgresql://{username}:{password}@{host}:{port}/dvdrental
Exercise 1: Transactions in SQL#
rubric: {accuracy = 10}
In this exercise, you will learn how to use transactions in SQL. Transactions allow you to execute a series of SQL statements as a single unit of work. You will also learn how to declare variables, use COMMIT
, and ROLLBACK
.
Scenario#
You are working with a bookstore database that contains the following tables:
books: Contains information about books.
book_id
title
author
price
stock
sales: Contains information about book sales.
sale_id
book_id
quantity
sale_date
%%sql
-- Step 1: Drop existing tables if they exist
DROP TABLE IF EXISTS books CASCADE;
DROP TABLE IF EXISTS sales CASCADE;
-- Step 2: Create the books table
CREATE TABLE books (
book_id SERIAL PRIMARY KEY,
title VARCHAR(255),
author VARCHAR(255),
price DECIMAL(10, 2),
stock INT
);
-- Step 3: Create the sales table
CREATE TABLE sales (
sale_id SERIAL PRIMARY KEY,
book_id INT REFERENCES books(book_id),
quantity INT,
sale_date TIMESTAMP
);
-- Step 4: Insert sample data into books table
INSERT INTO books (title, author, price, stock) VALUES
('Book A', 'Author A', 10.00, 100),
('Book B', 'Author B', 15.00, 50);
-- Step 5: Insert sample data into sales table
INSERT INTO sales (book_id, quantity, sale_date) VALUES
(1, 10, '2021-01-01 00:00:00'::timestamp + (RANDOM() * (INTERVAL '1 day'))),
(2, 5, '2021-01-02 00:00:00'::timestamp + (RANDOM() * (INTERVAL '1 day')));
%%sql
SELECT * FROM books;
%%sql
SELECT * FROM sales;
Task#
Imagine a customer will order Book A
Create a transaction that performs the following operations:
Declare a variable to store the
current_stock
andorder_quantity
with INT data type. Set order_quantity to 20If the
current_stock
is larger or equal than theorder_quantity
then do the following:Insert a new sale record into the
sales
table. Use NOW() as the sale_dateUpdate the stock of the sold book in the
books
table.Commit the transaction.
Otherwise, roll back the transaction.
# BEGIN SOLUTION
%%sql
-- Step 5: Begin the transaction
BEGIN;
-- Step 6: Declare variables to store the current stock and order quantity
DO $$
DECLARE
current_stock INT;
order_quantity INT := 20; -- Set the order quantity here
BEGIN
-- Step 7: Check if there is enough stock for the sale
SELECT stock INTO current_stock
FROM books
WHERE book_id = 1;
IF current_stock >= order_quantity THEN
-- Step 8: Insert a new sale record into the sales table
INSERT INTO sales (book_id, quantity, sale_date)
VALUES (1, order_quantity, NOW());
-- Step 9: Update the stock of the sold book in the books table
UPDATE books
SET stock = stock - order_quantity
WHERE book_id = 1;
-- Commit the transaction
COMMIT;
ELSE
-- Rollback the transaction due to insufficient stock
ROLLBACK;
END IF;
END $$;
# END SOLUTION
Check if the transaction was successful
%%sql
SELECT * FROM books;
%%sql
SELECT * FROM sales;
Now try to set the order quantity to 2000 and run the code, it should roll back the transaction and nothing changes
Exercise 2: Using JOINs with the DVD Rental Database#
rubric: {accuracy = 30}
You are working with the DVD Rental database and need to retrieve information about customers, their rentals, and the films they have rented. You will use different types of JOINs to combine data from multiple tables.
Tasks#
1. Inner Join#
Retrieve a list of customers and the titles of the films they have rented. For this task, you might need to INNER JOIN
multiple tables (customer, rental, inventory, film)
Expected output:
first_name |
last_name |
title |
---|---|---|
Tommy |
Collazo |
Freaky Pocus |
Manuel |
Murrell |
Graduate Lord |
Andrew |
Purdy |
Love Suicides |
… |
# BEGIN SOLUTION
%%sql
SELECT
c.first_name,
c.last_name,
f.title
FROM
customer c
INNER JOIN
rental r ON c.customer_id = r.customer_id
INNER JOIN
inventory i ON r.inventory_id = i.inventory_id
INNER JOIN
film f ON i.film_id = f.film_id;
# END SOLUTION
2. Left Join#
Retrieve Customer Payment Information: Use a LEFT JOIN
to retrieve a list of all customers and their payment amounts. Include customers who have not made any payments.
You will need to use tables customer
and payment
for this question
Expected output:
customer_id |
first_name |
last_name |
payment_amount |
---|---|---|---|
341 |
Peter |
Menard |
7.99 |
341 |
Peter |
Menard |
1.99 |
341 |
Peter |
Menard |
7.99 |
… |
# BEGIN SOLUTION
%%sql
SELECT
c.customer_id,
c.first_name,
c.last_name,
p.amount AS payment_amount
FROM
customer c
LEFT JOIN
payment p ON c.customer_id = p.customer_id
ORDER BY
c.customer_id;
# END SOLUTION
3. Inner join#
Retrieve a list of all customers and staff members, rental_id, and rental_date. Sort the result in descending order by rental_date
You will need to use 3 tables: customer
, rental
, staff
Expected output:
customer_first_name |
customer_last_name |
staff_first_name |
staff_last_name |
rental_date |
rental_id |
---|---|---|---|---|---|
Louis |
Leone |
Jon |
Stephens |
2006-02-14 15:16:03 |
11739 |
Neil |
Renner |
Mike |
Hillyer |
2006-02-14 15:16:03 |
14616 |
Natalie |
Meyer |
Jon |
Stephens |
2006-02-14 15:16:03 |
11676 |
… |
# BEGIN SOLUTION
%%sql
SELECT
c.first_name AS customer_first_name,
c.last_name AS customer_last_name,
s.first_name AS staff_first_name,
s.last_name AS staff_last_name,
r.rental_date,
r.rental_id
FROM
customer c
INNER JOIN
rental r ON c.customer_id = r.customer_id
INNER JOIN
staff s ON r.staff_id = s.staff_id
ORDER BY
r.rental_date DESC;
# END SOLUTION
Exercise 3: Using GROUP BY and JOIN with the DVD Rental Database#
rubric: {accuracy = 10}
In this exercise, you will learn how to use the GROUP BY
clause in combination with JOIN
to query data from the DVD Rental database.
Scenario#
You are working with the DVD Rental database and need to retrieve information about the total amount spent by each customer on rentals. Specifically, you want to list each customer along with the total amount they have spent.
Task#
Retrieve Customer Spending Information: Use a
JOIN
to combine data from thecustomer
andpayment
tables.Group By Customer: Use the
GROUP BY
clause to calculate the total amount spent by each customer.
# BEGIN SOLUTION
%%sql
SELECT
c.customer_id,
c.first_name,
c.last_name,
SUM(p.amount) AS total_spent
FROM
customer c
JOIN
payment p ON c.customer_id = p.customer_id
GROUP BY
c.customer_id, c.first_name, c.last_name
ORDER BY
total_spent DESC;
# END SOLUTION
Exercise 4: Using JOIN, GROUP BY, WHERE, and HAVING with the DVD Rental Database#
rubric: {accuracy = 20}
Scenario#
You are working with the DVD Rental database and need to retrieve information about customers who have spent more than a certain amount on rentals. Specifically, you want to list each customer along with the total amount they have spent between ‘2007-02-01’ AND ‘2007-03-01’, but only include customers who have spent more than $30.
Task#
Retrieve Customer Spending Information: Use a
JOIN
to combine data from thecustomer
andpayment
tables.Filter by Date: Use the
WHERE
clause to filterpayments
made between ‘2007-02-01’ and ‘2007-03-01’.Group By Customer: Use the
GROUP BY
clause to calculate the total amount spent by each customer.Filter by Total Spent: Use the
HAVING
clause to include only customers who have spent more than $30.Order by total spent in descending order
Expected output:
customer_id |
first_name |
last_name |
total_spent |
---|---|---|---|
454 |
Alex |
Gresham |
52.90 |
178 |
Marion |
Snyder |
44.92 |
176 |
June |
Carroll |
42.92 |
26 |
Jessica |
Hall |
41.93 |
… |
# BEGIN SOLUTION
%%sql
SELECT
c.customer_id,
c.first_name,
c.last_name,
SUM(p.amount) AS total_spent
FROM
customer c
JOIN
payment p ON c.customer_id = p.customer_id
WHERE
p.payment_date BETWEEN '2007-02-01' AND '2007-03-01'
GROUP BY
c.customer_id, c.first_name, c.last_name
HAVING
SUM(p.amount) > 30
ORDER BY
total_spent DESC;
# END SOLUTION
Exercise 5: Using Subqueries with the DVD Rental Database#
rubric: {accuracy = 20}
5.1: Find Customers with No Rentals in a Specific Period#
Task#
Retrieve a list of customers who did NOT make any rentals within the specified date range (‘2007-02-01’ AND ‘2007-03-01’). Sort the result by customer_id
You will need to use customer
and rental
table
Hint: Use NOT IN
Expected output:
customer_id |
first_name |
last_name |
---|---|---|
1 |
Mary |
Smith |
2 |
Patricia |
Johnson |
3 |
Linda |
Williams |
… |
# BEGIN SOLUTION
%%sql
SELECT
c.customer_id,
c.first_name,
c.last_name
FROM
customer c
WHERE
c.customer_id NOT IN (
SELECT
r.customer_id
FROM
rental r
WHERE
r.rental_date BETWEEN '2007-02-01' AND '2007-03-01'
)
ORDER BY
c.customer_id;
# END SOLUTION
5.2: Find Customers with More Than Average Spending#
Task#
Retrieve a list of customers who have total spending less than the average total amount spent by all customers. Sort the result by total spending in descending order.
You will need to use the customer
and payment
table
Expected output:
customer_id |
first_name |
last_name |
total_spent |
---|---|---|---|
44 |
Marie |
Turner |
101.79 |
466 |
Leo |
Ebert |
101.78 |
100 |
Robin |
Hayes |
101.77 |
593 |
Rene |
Mcalister |
101.76 |
… |
# BEGIN SOLUTION
%%sql
SELECT
c.customer_id,
c.first_name,
c.last_name,
SUM(p.amount) AS total_spent
FROM
customer c
JOIN
payment p ON c.customer_id = p.customer_id
GROUP BY
c.customer_id, c.first_name, c.last_name
HAVING
SUM(p.amount) < (
SELECT
AVG(total_spent)
FROM
(
SELECT
SUM(p.amount) AS total_spent
FROM
payment p
GROUP BY
p.customer_id
) AS avg_spending
)
ORDER BY
total_spent DESC;
# END SOLUTION
Submission instructions#
{rubric: mechanics = 5}
Make sure the notebook can run from top to bottom without any error. Restart the kernel and run all cells.
Commit and push your notebook to the github repo
Double check your notebook is rendered properly on Github and you can see all the outputs clearly
Submit URL of your Github repo to Moodle