Lecture 9: Joins & subqueries in SQL#

Learning Objectives#

By the end of this lecture, students should understand:

  • What JOIN statements are and why they are used.

  • The different types of JOINs: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.

  • How to deal with ambiguous column names in JOINs.

  • How to join more than two tables.

  • What subqueries are and why they are used.

  • The different types of subqueries: single-row, multiple-row, and correlated subqueries.

# load the ipython-sql extension
%load_ext sql
%config SqlMagic.displaylimit = 10
import json
import urllib.parse

with open('data/credentials.json') as f:
    login = json.load(f)
    
username = login['user']
password = urllib.parse.quote(login['password'])
host = login['host']
port = login['port']

Let’s connect to your dvdrental database

%sql postgresql://{username}:{password}@{host}:{port}/dvdrental
'Connected: postgres@dvdrental'

JOIN Statements#

JOIN statements are used in SQL to combine rows from two or more tables based on a related column between them. This is essential for querying data that is spread across multiple tables.

Types of JOINs#

  1. INNER JOIN: Returns records that have matching values in both tables.

  2. LEFT JOIN (or LEFT OUTER JOIN): Returns all records from the left table, and the matched records from the right table. The result is NULL from the right side if there is no match.

  3. RIGHT JOIN (or RIGHT OUTER JOIN): Returns all records from the right table, and the matched records from the left table. The result is NULL from the left side when there is no match.

  4. FULL OUTER JOIN: Returns all records when there is a match in either left or right table. Records not matching in either table will have NULLs.

Example Tables#

For this lecture, we will create two simple tables: student and course.

Creating the student Table#

%%sql
DROP TABLE IF EXISTS student CASCADE;
CREATE TABLE student (
    student_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50)
);

INSERT INTO student (first_name, last_name) VALUES
('John', 'Doe'),
('Jane', 'Smith'),
('Alice', 'Johnson'),
('Bob', 'Brown')
RETURNING *;
 * postgresql://postgres:***@localhost:5432/dvdrental
Done.
Done.
4 rows affected.
student_id first_name last_name
1 John Doe
2 Jane Smith
3 Alice Johnson
4 Bob Brown

Creating the course Table#

%%sql

DROP TABLE IF EXISTS course CASCADE;
CREATE TABLE course (
    course_id SERIAL PRIMARY KEY,
    student_id INT,
    course_name VARCHAR(100),
    FOREIGN KEY (student_id) REFERENCES student(student_id)
);

INSERT INTO course (student_id, course_name) VALUES
(1, 'Mathematics'),
(2, 'Science'),
(3, 'History'),
(1, 'Literature')
RETURNING *;
 * postgresql://postgres:***@localhost:5432/dvdrental
Done.
Done.
4 rows affected.
course_id student_id course_name
1 1 Mathematics
2 2 Science
3 3 History
4 1 Literature

Before performing any JOIN operations, let’s display the first few rows of the student and course tables.

%%sql
SELECT * FROM student;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
student_id first_name last_name
1 John Doe
2 Jane Smith
3 Alice Johnson
4 Bob Brown
%%sql
SELECT * FROM course;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
course_id student_id course_name
1 1 Mathematics
2 2 Science
3 3 History
4 1 Literature

INNER JOIN#

The INNER JOINkeyword selects records that have matching values in both tables.

%%sql
SELECT student.student_id, first_name, last_name, course_name
FROM student
INNER JOIN course
ON student.student_id = course.student_id;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
student_id first_name last_name course_name
1 John Doe Mathematics
2 Jane Smith Science
3 Alice Johnson History
1 John Doe Literature

You can also use alias to give a table a temporary name to make the query more readable. In the example below we give student table an alias called s and the course table an alias called c:

%%sql
SELECT *
FROM student AS s
INNER JOIN course AS c 
ON s.student_id = c.student_id;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
student_id first_name last_name course_id student_id_1 course_name
1 John Doe 1 1 Mathematics
2 Jane Smith 2 2 Science
3 Alice Johnson 3 3 History
1 John Doe 4 1 Literature

Dealing with Ambiguous Column Names#

When joining multiple tables, you may encounter ambiguous column names. This happens when two or more tables have columns with the same name. To resolve this, you can use table aliases or fully qualify the column names.

Example with Ambiguous Column Names

Consider the student and course tables both having a column named student_id.

%%sql
SELECT student_id, first_name, last_name, course_name
FROM student
INNER JOIN course ON student.student_id = course.student_id
LIMIT 10;
 * postgresql://postgres:***@localhost:5432/dvdrental
(psycopg2.errors.AmbiguousColumn) column reference "student_id" is ambiguous
LINE 1: SELECT student_id, first_name, last_name, course_name
               ^

[SQL: SELECT student_id, first_name, last_name, course_name
FROM student
INNER JOIN course ON student.student_id = course.student_id
LIMIT 10;]
(Background on this error at: https://sqlalche.me/e/14/f405)
  • Problem: The column student_id is ambiguous because it exists in both tables.

Resolving Ambiguity with Table Aliases

You can use table aliases to differentiate between columns from different tables.

%%sql
SELECT s.student_id, s.first_name, s.last_name, c.course_name
FROM student AS s
INNER JOIN course AS c ON s.student_id = c.student_id
LIMIT 10;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
student_id first_name last_name course_name
1 John Doe Mathematics
2 Jane Smith Science
3 Alice Johnson History
1 John Doe Literature

LEFT JOIN#

The LEFT JOIN keyword returns all records from the left table (student), and the matched records from the right table (course). The result is NULL from the right side if there is no match.

%%sql
SELECT * FROM course;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
course_id student_id course_name
1 1 Mathematics
2 2 Science
3 3 History
4 1 Literature
%%sql
SELECT s.student_id, s.first_name, s.last_name, c.course_name
FROM student AS s
LEFT JOIN course AS c 
ON s.student_id = c.student_id;
 * postgresql://postgres:***@localhost:5432/dvdrental
5 rows affected.
student_id first_name last_name course_name
1 John Doe Mathematics
2 Jane Smith Science
3 Alice Johnson History
1 John Doe Literature
4 Bob Brown None

RIGHT JOIN#

The RIGHT JOIN keyword returns all records from the right table (course), and the matched records from the left table (student). The result is NULL from the left side when there is no match.

%%sql
SELECT s.student_id, s.first_name, s.last_name, c.course_name
FROM student AS s
RIGHT JOIN course AS c 
ON s.student_id = c.student_id;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
student_id first_name last_name course_name
1 John Doe Mathematics
2 Jane Smith Science
3 Alice Johnson History
1 John Doe Literature

FULL OUTER JOIN#

The FULL OUTER JOIN keyword returns all records when there is a match in either left (student) or right (course) table. Records not matching in either table will have NULLs.

%%sql
SELECT s.student_id, s.first_name, s.last_name, c.course_name
FROM student AS s
FULL OUTER JOIN course AS c ON s.student_id = c.student_id
;
 * postgresql://postgres:***@localhost:5432/dvdrental
5 rows affected.
student_id first_name last_name course_name
1 John Doe Mathematics
2 Jane Smith Science
3 Alice Johnson History
1 John Doe Literature
4 Bob Brown None

Joining More Than Two Tables#

You can join more than two tables by chaining multiple JOIN statements. Let’s join the student, course, and enrollment tables to get the student’s name, course name, and enrollment date.

Creating the enrollment Table

%%sql
DROP TABLE IF EXISTS enrollment CASCADE;
CREATE TABLE enrollment (
    enrollment_id SERIAL PRIMARY KEY,
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    FOREIGN KEY (student_id) REFERENCES student(student_id),
    FOREIGN KEY (course_id) REFERENCES course(course_id)
);

INSERT INTO enrollment (student_id, course_id, enrollment_date) VALUES
(1, 1, '2023-01-01'),
(2, 2, '2023-01-02'),
(3, 3, '2023-01-03'),
(1, 4, '2023-01-04');
 * postgresql://postgres:***@localhost:5432/dvdrental
Done.
Done.
4 rows affected.
[]
%%sql
SELECT * FROM enrollment;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
enrollment_id student_id course_id enrollment_date
1 1 1 2023-01-01
2 2 2 2023-01-02
3 3 3 2023-01-03
4 1 4 2023-01-04

Joining student, course, and enrollment Tables

%%sql
SELECT s.first_name, s.last_name, s.student_id, c.course_name, c.course_id, e.enrollment_date
FROM student AS s
INNER JOIN enrollment AS e ON s.student_id = e.student_id
INNER JOIN course AS c ON e.course_id = c.course_id
;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
first_name last_name student_id course_name course_id enrollment_date
John Doe 1 Mathematics 1 2023-01-01
Jane Smith 2 Science 2 2023-01-02
Alice Johnson 3 History 3 2023-01-03
John Doe 1 Literature 4 2023-01-04

Summary#

  • INNER JOIN: Returns only the matching rows.

  • LEFT JOIN: Returns all rows from the left table, and the matched rows from the right table.

  • RIGHT JOIN: Returns all rows from the right table, and the matched rows from the left table.

  • FULL OUTER JOIN: Returns all rows when there is a match in either left or right table.

  • Dealing with Ambiguous Column Names: Use table aliases or fully qualify column names to resolve ambiguity.

  • Joining More Than Two Tables: Chain multiple JOIN statements to combine data from more than two tables.

By understanding and using these JOIN statements and techniques, you can effectively query and combine data from multiple tables in your database.

Subqueries#

A subquery, also known as an inner query or nested query, is a query within another SQL query. Subqueries can be used to perform operations that require multiple steps, such as filtering data based on the results of another query.

Types of Subqueries#

  1. Single-Row Subqueries: Return a single row and are used with comparison operators like =, <, >, etc.

  2. Multiple-Row Subqueries: Return multiple rows and are used with operators like IN, ANY, ALL.

  3. Correlated Subqueries: Refer to columns in the outer query and are evaluated once for each row processed by the outer query.

%%sql
SELECT * FROM payment
LIMIT 10;
 * postgresql://postgres:***@localhost:5432/dvdrental
10 rows affected.
payment_id customer_id staff_id rental_id amount payment_date
17503 341 2 1520 7.99 2007-02-15 22:25:46.996577
17504 341 1 1778 1.99 2007-02-16 17:23:14.996577
17505 341 1 1849 7.99 2007-02-16 22:41:45.996577
17506 341 2 2829 2.99 2007-02-19 19:39:56.996577
17507 341 2 3130 7.99 2007-02-20 17:31:48.996577
17508 341 1 3382 5.99 2007-02-21 12:33:49.996577
17509 342 2 2190 5.99 2007-02-17 23:58:17.996577
17510 342 1 2914 5.99 2007-02-20 02:11:44.996577
17511 342 1 3081 2.99 2007-02-20 13:57:39.996577
17512 343 2 1547 4.99 2007-02-16 00:10:50.996577

Single-Row Subqueries#

Single-row subqueries return a single row and are used with comparison operators like =, <, >, etc.

Example: Find all the rentals with the price that is above the average rental price#

This query looks simple. You might be tempted to try

%%sql
SELECT rental_id 
FROM payment
WHERE amount > AVG(amount);
 * postgresql://postgres:***@localhost:5432/dvdrental
(psycopg2.errors.GroupingError) aggregate functions are not allowed in WHERE
LINE 3: WHERE amount > AVG(amount);
                       ^

[SQL: SELECT rental_id 
FROM payment
WHERE amount > AVG(amount);]
(Background on this error at: https://sqlalche.me/e/14/f405)

This doesn’t work because we cannot use an aggregate function within the WHERE clause.

To answer this question, we would need to querry the data twice:

  1. To retrive the average amount

  2. To retrive the rental_id where the amount is larger than the average amount from the first querry

However, we don’t need to do that in two separate queries and manually take the data coming from the first query and use it in the second. We can use a subquery to do that for us.

First, let’s compute the average amount in payment:

%%sql
SELECT AVG(amount) FROM payment;
 * postgresql://postgres:***@localhost:5432/dvdrental
1 rows affected.
avg
4.2006056453822965

We can use this intermediate information in our original query by embedding the above query in the WHERE clause of the original query:

%%sql
SELECT rental_id, amount
FROM payment
WHERE amount > (SELECT AVG(amount) FROM payment)
ORDER BY amount
LIMIT 10;
 * postgresql://postgres:***@localhost:5432/dvdrental
10 rows affected.
rental_id amount
1780 4.99
3331 4.99
1649 4.99
1678 4.99
2766 4.99
1547 4.99
3026 4.99
1457 4.99
1475 4.99
2275 4.99

Note

A subquery can be used in the SELECT, FROM, WHERE, and HAVING clauses. Subqueries are most commonly used in the WHERE clause.

Example: Return the name of all customers whose rental price is larger than $11.00#

%%sql
SELECT customer_id
FROM payment
WHERE amount > 11.00;
 * postgresql://postgres:***@localhost:5432/dvdrental
8 rows affected.
customer_id
362
204
116
195
237
591
592
13
%%sql
SELECt * FROM customer;
 * postgresql://postgres:***@localhost:5432/dvdrental
599 rows affected.
customer_id store_id first_name last_name email address_id activebool create_date last_update active
524 1 Jared Ely jared.ely@sakilacustomer.org 530 True 2006-02-14 2013-05-26 14:49:45.738000 1
1 1 Mary Smith mary.smith@sakilacustomer.org 5 True 2006-02-14 2013-05-26 14:49:45.738000 1
2 1 Patricia Johnson patricia.johnson@sakilacustomer.org 6 True 2006-02-14 2013-05-26 14:49:45.738000 1
3 1 Linda Williams linda.williams@sakilacustomer.org 7 True 2006-02-14 2013-05-26 14:49:45.738000 1
4 2 Barbara Jones barbara.jones@sakilacustomer.org 8 True 2006-02-14 2013-05-26 14:49:45.738000 1
5 1 Elizabeth Brown elizabeth.brown@sakilacustomer.org 9 True 2006-02-14 2013-05-26 14:49:45.738000 1
6 2 Jennifer Davis jennifer.davis@sakilacustomer.org 10 True 2006-02-14 2013-05-26 14:49:45.738000 1
7 1 Maria Miller maria.miller@sakilacustomer.org 11 True 2006-02-14 2013-05-26 14:49:45.738000 1
8 2 Susan Wilson susan.wilson@sakilacustomer.org 12 True 2006-02-14 2013-05-26 14:49:45.738000 1
9 2 Margaret Moore margaret.moore@sakilacustomer.org 13 True 2006-02-14 2013-05-26 14:49:45.738000 1
599 rows, truncated to displaylimit of 10
%%sql

SELECT first_name, last_name
FROM customer
WHERE customer_id IN (
    SELECT customer_id
    FROM payment
    WHERE amount > 11.00
);
 * postgresql://postgres:***@localhost:5432/dvdrental
8 rows affected.
first_name last_name
Karen Jackson
Victoria Gibson
Vanessa Sims
Rosemary Schmidt
Tanya Gilbert
Nicholas Barfield
Kent Arsenault
Terrance Roush
%%sql
SELECT first_name, last_name, amount
FROM customer
INNER JOIN (SELECT customer_id, amount FROM payment WHERE amount > 11.00) AS p
ON customer.customer_id = p.customer_id;
 * postgresql://postgres:***@localhost:5432/dvdrental
8 rows affected.
first_name last_name amount
Nicholas Barfield 11.99
Rosemary Schmidt 11.99
Victoria Gibson 11.99
Vanessa Sims 11.99
Tanya Gilbert 11.99
Kent Arsenault 11.99
Terrance Roush 11.99
Karen Jackson 11.99

Multiple-Row Subqueries#

Multiple-row subqueries return multiple rows and are used with operators like IN, ANY, ALL.

Example with IN#

Return the name of all customers whose rental price is larger than the average rental price

%%sql

SELECT first_name, last_name
FROM customer
WHERE customer_id IN (
    SELECT customer_id
    FROM payment
    WHERE amount > (SELECT AVG(amount) FROM payment)
    ORDER BY amount DESC
)
LIMIT 10;
 * postgresql://postgres:***@localhost:5432/dvdrental
10 rows affected.
first_name last_name
Vivian Ruiz
Wanda Patterson
Dan Paine
Priscilla Lowe
Guy Brownlee
Alice Stewart
Chris Brothers
Kay Caldwell
Christina Ramirez
Yolanda Weaver

Example with ANY#

The ANY operator is used to compare a value to any value in a list or subquery.

Find customers who have made a payment greater than any payment made by customer with customer_id = 1

%%sql
SELECT amount
FROM payment
WHERE customer_id = 1;
 * postgresql://postgres:***@localhost:5432/dvdrental
30 rows affected.
amount
5.99
0.99
9.99
4.99
4.99
0.99
3.99
4.99
3.99
0.99
30 rows, truncated to displaylimit of 10
%%sql
SELECT first_name, last_name
FROM customer
WHERE customer_id IN (
    SELECT customer_id
    FROM payment
    WHERE amount > ANY (
        SELECT amount
        FROM payment
        WHERE customer_id = 1
    )
);
 * postgresql://postgres:***@localhost:5432/dvdrental
599 rows affected.
first_name last_name
Jared Ely
Mary Smith
Patricia Johnson
Linda Williams
Barbara Jones
Elizabeth Brown
Jennifer Davis
Maria Miller
Susan Wilson
Margaret Moore
599 rows, truncated to displaylimit of 10

Example: Using ALL#

The ALL operator is used to compare a value to all values in a list or subquery.

Find customers who have made a payment greater than all payments made by customer with customer_id = 1

%%sql
SELECT DISTINCT customer_id
FROM payment
WHERE amount > ALL (
    SELECT amount
    FROM payment
    WHERE customer_id = 1
    )
ORDER BY customer_id;
 * postgresql://postgres:***@localhost:5432/dvdrental
101 rows affected.
customer_id
2
3
12
13
21
29
33
45
49
50
101 rows, truncated to displaylimit of 10
%%sql
SELECT first_name, last_name
FROM customer
WHERE customer_id IN (
    SELECT customer_id
    FROM payment
    WHERE amount > ALL (
        SELECT amount
        FROM payment
        WHERE customer_id = 1
    )
);
 * postgresql://postgres:***@localhost:5432/dvdrental
101 rows affected.
first_name last_name
Patricia Johnson
Linda Williams
Nancy Thomas
Karen Jackson
Michelle Clark
Angela Hernandez
Anna Hill
Janet Phillips
Joyce Edwards
Diane Collins
101 rows, truncated to displaylimit of 10

We can also achive the same result by filter the amount by the maximum amount paid by customer_id = 1

%%sql
SELECT first_name, last_name
FROM customer
WHERE customer_id IN (
    SELECT customer_id
    FROM payment
    WHERE amount > (
        SELECT MAX(amount)
        FROM payment
        WHERE customer_id = 1
    )
);
 * postgresql://postgres:***@localhost:5432/dvdrental
101 rows affected.
first_name last_name
Patricia Johnson
Linda Williams
Nancy Thomas
Karen Jackson
Michelle Clark
Angela Hernandez
Anna Hill
Janet Phillips
Joyce Edwards
Diane Collins
101 rows, truncated to displaylimit of 10

Correlated Subqueries#

Correlated subqueries refer to columns in the outer query and are evaluated once for each row processed by the outer query.

Example: Find payments where the amount is greater than the average payment amount for the same customer.#

%%timeit
%%sql
SELECT payment_id, customer_id, amount, payment_date
FROM payment p1
WHERE amount > (
    SELECT AVG(amount)
    FROM payment p2
    WHERE p2.customer_id = p1.customer_id
);
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
69.7 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The subquery is correlated because it references the customer_id from the outer query’s payment table.

You can also rewrite this subquerries as join

%%sql
SELECT p1.payment_id, p1.customer_id, p1.amount, p1.payment_date
FROM payment p1
JOIN (
    SELECT customer_id, AVG(amount) AS avg_amount
    FROM payment
    GROUP BY customer_id
) p2 ON p1.customer_id = p2.customer_id
WHERE p1.amount > p2.avg_amount;
 * postgresql://postgres:***@localhost:5432/dvdrental
7213 rows affected.
payment_id customer_id amount payment_date
17503 341 7.99 2007-02-15 22:25:46.996577
17505 341 7.99 2007-02-16 22:41:45.996577
17507 341 7.99 2007-02-20 17:31:48.996577
17508 341 5.99 2007-02-21 12:33:49.996577
17509 342 5.99 2007-02-17 23:58:17.996577
17510 342 5.99 2007-02-20 02:11:44.996577
17512 343 4.99 2007-02-16 00:10:50.996577
17513 343 6.99 2007-02-16 01:15:33.996577
17516 343 6.99 2007-02-18 18:26:38.996577
17517 343 8.99 2007-02-20 07:03:29.996577
7213 rows, truncated to displaylimit of 10

Summary#

  • Single-Row Subqueries: Return a single row and are used with comparison operators.

  • Multiple-Row Subqueries: Return multiple rows and are used with operators like IN, ANY, ALL.

  • Correlated Subqueries: Refer to columns in the outer query and are evaluated once for each row processed by the outer query.