Worksheet 4: PostgreSQL data manipulation#

import numpy as np
import pandas as pd
%load_ext sql
%config SqlMagic.displaylimit = 20
%config SqlMagic.autolimit = 30
# load the credentials from the credentials.json file
import json
import urllib.parse

with open('data/credentials.json') as f:
    login = json.load(f)
    
username = login['user']
password = urllib.parse.quote(login['password'])
host = login['host']
port = login['port']
%sql postgresql://{username}:{password}@{host}:{port}/dvdrental
'Connected: postgres@dvdrental'
%%sql
DROP TABLE IF EXISTS employees;
DROP TABLE IF EXISTS departments;
 * postgresql://postgres:***@localhost:5432/dvdrental
Done.
Done.
[]

Exercise 1: Create Table with Constraints#

Task#

Create a table named students with the following columns and constraints:

  • id: An auto-incrementing primary key.

  • first_name: A non-nullable string with a maximum length of 50 characters.

  • last_name: A non-nullable string with a maximum length of 50 characters.

  • email: A unique string with a maximum length of 100 characters.

  • age: An integer that must be greater than or equal to 18.

  • department: A string with a maximum length of 50 characters, defaulting to ‘General’.

Instructions#

  1. Write the SQL code to create the students table with the specified constraints.

  2. Write code to test each constraint:

    • Test the NOT NULL constraint on first_name and last_name.

    • Test the UNIQUE constraint on email.

    • Test the CHECK constraint on age.

    • Test the DEFAULT constraint on department.

Solution#

# BEGIN SOLUTION
%%sql
CREATE TABLE students (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    age INT CHECK (age >= 18),
    department VARCHAR(50) DEFAULT 'General'
);
# END SOLUTION
 * postgresql://postgres:***@localhost:5432/dvdrental
Done.
[]

Testing the Constraints#

  1. Test the NOT NULL Constraint:

    # Attempt to insert a row with a NULL first_name
    %%sql
    INSERT INTO students (first_name, last_name, email, age) VALUES
        (NULL, 'Doe', 'john.doe@example.com', 20);
    
    # Attempt to insert a row with a NULL last_name
    %%sql
    INSERT INTO students (first_name, last_name, email, age) VALUES
        ('John', NULL, 'john.doe@example.com', 20);
    
  2. Test the UNIQUE Constraint:

    # Insert a row with a unique email
    %%sql
    INSERT INTO students (first_name, last_name, email, age) VALUES
        ('John', 'Doe', 'john.doe@example.com', 20);
    
    # Attempt to insert another row with the same email
    %%sql
    INSERT INTO students (first_name, last_name, email, age) VALUES
        ('Jane', 'Smith', 'john.doe@example.com', 22);
    
  3. Test the CHECK Constraint:

    # Attempt to insert a row with an age less than 18
    %%sql
    INSERT INTO students (first_name, last_name, email, age) VALUES
        ('John', 'Doe', 'john.doe@example.com', 17);
    
  4. Test the DEFAULT Constraint:

    # Insert a row without specifying the department column
    %%sql
    INSERT INTO students (first_name, last_name, email, age) VALUES
        ('John', 'Doe', 'john.doe@example.com', 20);
    
    # Query the table to verify the default value
    %%sql
    SELECT * FROM students;
    

By following these steps, you can verify that the constraints on the students table are enforced correctly.

Exercise 2: Add Foreign Key Constraint with Cascade#

Task#

Extend the previous exercise by creating a new table named courses and adding a foreign key constraint to the students table. The courses table should have the following columns:

  • id: An auto-incrementing primary key.

  • name: A non-nullable string with a maximum length of 100 characters.

Modify the students table to include a foreign key column course_id that references the id column in the courses table. Use the ON DELETE CASCADE and ON UPDATE CASCADE options for the foreign key constraint.

Instructions#

  1. Write the SQL code to create the courses table.

  2. Modify the students table to include the course_id column with the foreign key constraint.

  3. Write code to test the referential integrity:

    • Test the ON DELETE CASCADE constraint.

    • Test the ON UPDATE CASCADE constraint.

Solution#

# BEGIN SOLUTION
%%sql

CREATE TABLE courses (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);
# END SOLUTION
 * postgresql://postgres:***@localhost:5432/dvdrental
Done.
[]
# BEGIN SOLUTION
%%sql
ALTER TABLE students
ADD COLUMN course_id INT,
ADD CONSTRAINT fk_course
FOREIGN KEY (course_id) REFERENCES courses (id)
ON DELETE CASCADE
ON UPDATE CASCADE;
# END SOLUTION
 * postgresql://postgres:***@localhost:5432/dvdrental
Done.
[]

Testing the Referential Integrity#

  1. Test the ON DELETE CASCADE Constraint:

    # Insert sample data into courses table
    %%sql
    INSERT INTO courses (name) VALUES
        ('Mathematics'),
        ('Physics');
    
    # Insert sample data into students table
    %%sql
    INSERT INTO students (first_name, last_name, email, age, course_id) VALUES
        ('John', 'Doe', 'john.doe@example.com', 20, 1),
        ('Jane', 'Smith', 'jane.smith@example.com', 22, 2);
    
    # Delete a course
    %%sql
    DELETE FROM courses WHERE id = 1;
    
    # Query the students table to verify the ON DELETE CASCADE constraint
    %%sql
    SELECT * FROM students;
    
  2. Test the ON UPDATE CASCADE Constraint:

    # Update the id of a course
    %%sql
    UPDATE courses SET id = 3 WHERE id = 2;
    
    # Query the students table to verify the ON UPDATE CASCADE constraint
    %%sql
    SELECT * FROM students;
    

By following these steps, you can verify that the referential integrity constraints on the students table are enforced correctly.

Exercise 3: Insert Records into the Students Table#

Task#

Extend the previous exercises by inserting 10 records into the students table. Ensure that the course_id values correspond to valid entries in the courses table.

Instructions#

  1. Write the SQL code to insert 10 records into the students table.

  2. Ensure that the course_id values correspond to valid entries in the courses table.

  3. Write code to query the students table and verify the inserted records.

Expected output#

courses table

id

name

1

Mathematics

2

Physics

3

Chemistry

4

Biology

students table

id

first_name

last_name

email

age

department

course_id

1

John

Doe

john.doe@example.com

20

General

1

2

Jane

Smith

jane.smith@example.com

22

General

2

3

Alice

Johnson

alice.johnson@example.com

19

General

3

4

Bob

Brown

bob.brown@example.com

21

General

4

5

Charlie

Davis

charlie.davis@example.com

23

General

1

6

Diana

Evans

diana.evans@example.com

20

General

2

7

Eve

Foster

eve.foster@example.com

22

General

3

8

Frank

Green

frank.green@example.com

19

General

4

9

Grace

Harris

grace.harris@example.com

21

General

1

10

Hank

Ivy

hank.ivy@example.com

23

General

2

# BEGIN SOLUTION
%%sql
-- Insert sample data into courses table
INSERT INTO courses (name) VALUES
    ('Mathematics'),
    ('Physics'),
    ('Chemistry'),
    ('Biology');

-- Insert 10 records into students table
INSERT INTO students (first_name, last_name, email, age, course_id) VALUES
    ('John', 'Doe', 'john.doe@example.com', 20, 1),
    ('Jane', 'Smith', 'jane.smith@example.com', 22, 2),
    ('Alice', 'Johnson', 'alice.johnson@example.com', 19, 3),
    ('Bob', 'Brown', 'bob.brown@example.com', 21, 4),
    ('Charlie', 'Davis', 'charlie.davis@example.com', 23, 1),
    ('Diana', 'Evans', 'diana.evans@example.com', 20, 2),
    ('Eve', 'Foster', 'eve.foster@example.com', 22, 3),
    ('Frank', 'Green', 'frank.green@example.com', 19, 4),
    ('Grace', 'Harris', 'grace.harris@example.com', 21, 1),
    ('Hank', 'Ivy', 'hank.ivy@example.com', 23, 2);
# END SOLUTION
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
10 rows affected.
[]

Check your output

%%sql
SELECT * FROM students;
 * postgresql://postgres:***@localhost:5432/dvdrental
10 rows affected.
id first_name last_name email age department course_id
1 John Doe john.doe@example.com 20 General 1
2 Jane Smith jane.smith@example.com 22 General 2
3 Alice Johnson alice.johnson@example.com 19 General 3
4 Bob Brown bob.brown@example.com 21 General 4
5 Charlie Davis charlie.davis@example.com 23 General 1
6 Diana Evans diana.evans@example.com 20 General 2
7 Eve Foster eve.foster@example.com 22 General 3
8 Frank Green frank.green@example.com 19 General 4
9 Grace Harris grace.harris@example.com 21 General 1
10 Hank Ivy hank.ivy@example.com 23 General 2
%%sql
SELECT * FROM courses;
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
id name
1 Mathematics
2 Physics
3 Chemistry
4 Biology

Exercise 4: Aggregate Functions - COUNT, AVG, MIN, MAX#

Task#

Write SQL queries to perform the following aggregate functions on the students table:

  1. Count the total number of students.

  2. Calculate the average age of students.

  3. Find the minimum age of students.

  4. Find the maximum age of students.

Solution#

# BEGIN SOLUTION
%%sql
SELECT 
    COUNT(*) AS total_students,
    AVG(age) AS average_age,
    MIN(age) AS minimum_age,
    MAX(age) AS maximum_age
FROM students;
# END SOLUTION
 * postgresql://postgres:***@localhost:5432/dvdrental
1 rows affected.
total_students average_age minimum_age maximum_age
10 21.0000000000000000 19 23

Exercise 5: Aggregate Functions - GROUP BY, WHERE, HAVING, and ORDER BY#

Task#

Write SQL queries to perform the following operations on the students table:

  1. Group students by course_id and count the number of students in each course.

  2. Group students by course_id and calculate the average age of students in each course.

  3. Group students by course_id and find the minimum age of students in each course.

  4. Group students by course_id and find the maximum age of students in each course.

  5. Use the HAVING clause to filter groups where the average age of students is greater than 21.

  6. Use the WHERE clause to filter students who are older than 20 before performing the aggregation.

  7. Use the ORDER BY clause to sort the results by the number of students in each course in descending order.

# BEGIN SOLUTION
%%sql
SELECT 
    course_id, 
    COUNT(*) AS student_count,
    AVG(age) AS average_age,
    MIN(age) AS minimum_age,
    MAX(age) AS maximum_age
FROM students
WHERE age > 20
GROUP BY course_id
HAVING AVG(age) > 21
ORDER BY student_count DESC;
# END SOLUTION
 * postgresql://postgres:***@localhost:5432/dvdrental
3 rows affected.
course_id student_count average_age minimum_age maximum_age
1 2 22.0000000000000000 21 23
2 2 22.5000000000000000 22 23
3 1 22.0000000000000000 22 22

Submission instructions#

{rubric: mechanics = 5}

  • Make sure the notebook can run from top to bottom without any error. Restart the kernel and run all cells.

  • Commit and push your notebook to the github repo

  • Double check your notebook is rendered properly on Github and you can see all the outputs clearly

  • Submit URL of your Github repo to Moodle under worksheet 3