Worksheet 5: Transactions, joins, subquerries#

import numpy as np
import pandas as pd
%load_ext sql
%config SqlMagic.displaylimit = 20
%config SqlMagic.autolimit = 30

Make sure to edit the credentials.json file with your details

# load the credentials from the credentials.json file
import json
import urllib.parse

with open('data/credentials.json') as f:
    login = json.load(f)
    
username = login['user']
password = urllib.parse.quote(login['password'])
host = login['host']
port = login['port']
%sql postgresql://{username}:{password}@{host}:{port}/dvdrental

Exercise 1: Transactions in SQL#

rubric: {accuracy = 10}

In this exercise, you will learn how to use transactions in SQL. Transactions allow you to execute a series of SQL statements as a single unit of work. You will also learn how to declare variables, use COMMIT, and ROLLBACK.

Scenario#

You are working with a bookstore database that contains the following tables:

  • books: Contains information about books.

    • book_id

    • title

    • author

    • price

    • stock

  • sales: Contains information about book sales.

    • sale_id

    • book_id

    • quantity

    • sale_date

%%sql
-- Step 1: Drop existing tables if they exist
DROP TABLE IF EXISTS books CASCADE;
DROP TABLE IF EXISTS sales CASCADE;

-- Step 2: Create the books table
CREATE TABLE books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    author VARCHAR(255),
    price DECIMAL(10, 2),
    stock INT
);

-- Step 3: Create the sales table
CREATE TABLE sales (
    sale_id SERIAL PRIMARY KEY,
    book_id INT REFERENCES books(book_id),
    quantity INT,
    sale_date TIMESTAMP
);

-- Step 4: Insert sample data into books table
INSERT INTO books (title, author, price, stock) VALUES
('Book A', 'Author A', 10.00, 100),
('Book B', 'Author B', 15.00, 50);

-- Step 5: Insert sample data into sales table
INSERT INTO sales (book_id, quantity, sale_date) VALUES
(1, 10, '2021-01-01 00:00:00'::timestamp + (RANDOM() * (INTERVAL '1 day'))),
(2, 5, '2021-01-02 00:00:00'::timestamp + (RANDOM() * (INTERVAL '1 day')));
%%sql
SELECT * FROM books;
%%sql
SELECT * FROM sales;

Task#

Imagine a customer will order Book A

Create a transaction that performs the following operations:

  1. Declare a variable to store the current_stock and order_quantity with INT data type. Set order_quantity to 20

  2. If the current_stock is larger or equal than the order_quantity then do the following:

    1. Insert a new sale record into the sales table. Use NOW() as the sale_date

    2. Update the stock of the sold book in the books table.

    3. Commit the transaction.

  3. Otherwise, roll back the transaction.

# BEGIN SOLUTION
%%sql

-- Step 5: Begin the transaction
BEGIN;

-- Step 6: Declare variables to store the current stock and order quantity
DO $$
DECLARE
    current_stock INT;
    order_quantity INT := 20;  -- Set the order quantity here
BEGIN

-- Step 7: Check if there is enough stock for the sale
SELECT stock INTO current_stock
FROM books
WHERE book_id = 1;

IF current_stock >= order_quantity THEN
    -- Step 8: Insert a new sale record into the sales table
    INSERT INTO sales (book_id, quantity, sale_date)
    VALUES (1, order_quantity, NOW());

    -- Step 9: Update the stock of the sold book in the books table
    UPDATE books
    SET stock = stock - order_quantity
    WHERE book_id = 1;

    -- Commit the transaction
    COMMIT;
ELSE
    -- Rollback the transaction due to insufficient stock
    ROLLBACK;
END IF;

END $$;
# END SOLUTION

Check if the transaction was successful

%%sql
SELECT * FROM books;
%%sql
SELECT * FROM sales;

Now try to set the order quantity to 2000 and run the code, it should roll back the transaction and nothing changes

Exercise 2: Using JOINs with the DVD Rental Database#

rubric: {accuracy = 30}

You are working with the DVD Rental database and need to retrieve information about customers, their rentals, and the films they have rented. You will use different types of JOINs to combine data from multiple tables.

Tasks#

1. Inner Join#

Retrieve a list of customers and the titles of the films they have rented. For this task, you might need to INNER JOIN multiple tables (customer, rental, inventory, film)

Expected output:

first_name

last_name

title

Tommy

Collazo

Freaky Pocus

Manuel

Murrell

Graduate Lord

Andrew

Purdy

Love Suicides

# BEGIN SOLUTION
%%sql
SELECT 
    c.first_name, 
    c.last_name, 
    f.title
FROM 
    customer c
INNER JOIN 
    rental r ON c.customer_id = r.customer_id
INNER JOIN 
    inventory i ON r.inventory_id = i.inventory_id
INNER JOIN 
    film f ON i.film_id = f.film_id;
# END SOLUTION

2. Left Join#

Retrieve Customer Payment Information: Use a LEFT JOIN to retrieve a list of all customers and their payment amounts. Include customers who have not made any payments.

You will need to use tables customer and payment for this question

Expected output:

customer_id

first_name

last_name

payment_amount

341

Peter

Menard

7.99

341

Peter

Menard

1.99

341

Peter

Menard

7.99

# BEGIN SOLUTION
%%sql
SELECT 
    c.customer_id,
    c.first_name, 
    c.last_name, 
    p.amount AS payment_amount
FROM 
    customer c
LEFT JOIN 
    payment p ON c.customer_id = p.customer_id
ORDER BY 
    c.customer_id;
# END SOLUTION

3. Inner join#

Retrieve a list of all customers and staff members, rental_id, and rental_date. Sort the result in descending order by rental_date

You will need to use 3 tables: customer, rental, staff

Expected output:

customer_first_name

customer_last_name

staff_first_name

staff_last_name

rental_date

rental_id

Louis

Leone

Jon

Stephens

2006-02-14 15:16:03

11739

Neil

Renner

Mike

Hillyer

2006-02-14 15:16:03

14616

Natalie

Meyer

Jon

Stephens

2006-02-14 15:16:03

11676

# BEGIN SOLUTION
%%sql
SELECT 
    c.first_name AS customer_first_name, 
    c.last_name AS customer_last_name, 
    s.first_name AS staff_first_name, 
    s.last_name AS staff_last_name,
    r.rental_date,
    r.rental_id
FROM 
    customer c
INNER JOIN 
    rental r ON c.customer_id = r.customer_id
INNER JOIN 
    staff s ON r.staff_id = s.staff_id
ORDER BY 
    r.rental_date DESC;
# END SOLUTION

Exercise 3: Using GROUP BY and JOIN with the DVD Rental Database#

rubric: {accuracy = 10}

In this exercise, you will learn how to use the GROUP BY clause in combination with JOIN to query data from the DVD Rental database.

Scenario#

You are working with the DVD Rental database and need to retrieve information about the total amount spent by each customer on rentals. Specifically, you want to list each customer along with the total amount they have spent.

Task#

  1. Retrieve Customer Spending Information: Use a JOIN to combine data from the customer and payment tables.

  2. Group By Customer: Use the GROUP BY clause to calculate the total amount spent by each customer.

# BEGIN SOLUTION
%%sql
SELECT 
    c.customer_id,
    c.first_name, 
    c.last_name, 
    SUM(p.amount) AS total_spent
FROM 
    customer c
JOIN 
    payment p ON c.customer_id = p.customer_id
GROUP BY 
    c.customer_id, c.first_name, c.last_name
ORDER BY 
    total_spent DESC;
# END SOLUTION

Exercise 4: Using JOIN, GROUP BY, WHERE, and HAVING with the DVD Rental Database#

rubric: {accuracy = 20}

Scenario#

You are working with the DVD Rental database and need to retrieve information about customers who have spent more than a certain amount on rentals. Specifically, you want to list each customer along with the total amount they have spent between ‘2007-02-01’ AND ‘2007-03-01’, but only include customers who have spent more than $30.

Task#

  • Retrieve Customer Spending Information: Use a JOIN to combine data from the customer and payment tables.

  • Filter by Date: Use the WHERE clause to filter payments made between ‘2007-02-01’ and ‘2007-03-01’.

  • Group By Customer: Use the GROUP BY clause to calculate the total amount spent by each customer.

  • Filter by Total Spent: Use the HAVING clause to include only customers who have spent more than $30.

  • Order by total spent in descending order

Expected output:

customer_id

first_name

last_name

total_spent

454

Alex

Gresham

52.90

178

Marion

Snyder

44.92

176

June

Carroll

42.92

26

Jessica

Hall

41.93

# BEGIN SOLUTION
%%sql
SELECT 
    c.customer_id,
    c.first_name, 
    c.last_name, 
    SUM(p.amount) AS total_spent
FROM 
    customer c
JOIN 
    payment p ON c.customer_id = p.customer_id
WHERE 
    p.payment_date BETWEEN '2007-02-01' AND '2007-03-01'
GROUP BY 
    c.customer_id, c.first_name, c.last_name
HAVING 
    SUM(p.amount) > 30
ORDER BY 
    total_spent DESC;
# END SOLUTION

Exercise 5: Using Subqueries with the DVD Rental Database#

rubric: {accuracy = 20}

5.1: Find Customers with No Rentals in a Specific Period#

Task#

Retrieve a list of customers who did NOT make any rentals within the specified date range (‘2007-02-01’ AND ‘2007-03-01’). Sort the result by customer_id

You will need to use customer and rental table

Hint: Use NOT IN

Expected output:

customer_id

first_name

last_name

1

Mary

Smith

2

Patricia

Johnson

3

Linda

Williams

# BEGIN SOLUTION
%%sql
SELECT 
    c.customer_id,
    c.first_name, 
    c.last_name
FROM 
    customer c
WHERE 
    c.customer_id NOT IN (
        SELECT 
            r.customer_id
        FROM 
            rental r
        WHERE 
            r.rental_date BETWEEN '2007-02-01' AND '2007-03-01'
    )
ORDER BY 
    c.customer_id;
# END SOLUTION

5.2: Find Customers with More Than Average Spending#

Task#

Retrieve a list of customers who have total spending less than the average total amount spent by all customers. Sort the result by total spending in descending order.

You will need to use the customer and payment table

Expected output:

customer_id

first_name

last_name

total_spent

44

Marie

Turner

101.79

466

Leo

Ebert

101.78

100

Robin

Hayes

101.77

593

Rene

Mcalister

101.76

# BEGIN SOLUTION
%%sql
SELECT 
    c.customer_id,
    c.first_name, 
    c.last_name, 
    SUM(p.amount) AS total_spent
FROM 
    customer c
JOIN 
    payment p ON c.customer_id = p.customer_id
GROUP BY 
    c.customer_id, c.first_name, c.last_name
HAVING 
    SUM(p.amount) < (
        SELECT 
            AVG(total_spent)
        FROM 
            (
                SELECT 
                    SUM(p.amount) AS total_spent
                FROM 
                    payment p
                GROUP BY 
                    p.customer_id
            ) AS avg_spending
    )
ORDER BY 
    total_spent DESC;
# END SOLUTION

Submission instructions#

{rubric: mechanics = 5}

  • Make sure the notebook can run from top to bottom without any error. Restart the kernel and run all cells.

  • Commit and push your notebook to the github repo

  • Double check your notebook is rendered properly on Github and you can see all the outputs clearly

  • Submit URL of your Github repo to Moodle