Worksheet 3: PostgreSQL data manipulation#

import numpy as np
import pandas as pd
%load_ext sql
%config SqlMagic.displaylimit = 20
%config SqlMagic.autolimit = 30

Exercise 1: Connect to the dvdrental Database and Data Type Conversion#

  1. Connect to the dvdrental Database:

    • Connect to the dvdrental database

# load the credentials from the credentials.json file
import json
import urllib.parse

with open('data/credentials.json') as f:
    login = json.load(f)
    
username = login['user']
password = urllib.parse.quote(login['password'])
host = login['host']
port = login['port']
%sql postgresql://{username}:{password}@{host}:{port}/dvdrental
'Connected: postgres@dvdrental'

Print out the first 10 rows of the rental table.

%%sql
   SELECT * FROM rental LIMIT 10;
 * postgresql://postgres:***@localhost:5432/dvdrental
10 rows affected.
rental_id rental_date inventory_id customer_id return_date staff_id last_update
2 2005-05-24 22:54:33 1525 459 2005-05-28 19:40:33 1 2006-02-16 02:30:53
3 2005-05-24 23:03:39 1711 408 2005-06-01 22:12:39 1 2006-02-16 02:30:53
4 2005-05-24 23:04:41 2452 333 2005-06-03 01:43:41 2 2006-02-16 02:30:53
5 2005-05-24 23:05:21 2079 222 2005-06-02 04:33:21 1 2006-02-16 02:30:53
6 2005-05-24 23:08:07 2792 549 2005-05-27 01:32:07 1 2006-02-16 02:30:53
7 2005-05-24 23:11:53 3995 269 2005-05-29 20:34:53 2 2006-02-16 02:30:53
8 2005-05-24 23:31:46 2346 239 2005-05-27 23:33:46 2 2006-02-16 02:30:53
9 2005-05-25 00:00:40 2580 126 2005-05-28 00:22:40 1 2006-02-16 02:30:53
10 2005-05-25 00:02:21 1824 399 2005-05-31 22:44:21 2 2006-02-16 02:30:53
11 2005-05-25 00:09:02 4443 142 2005-06-02 20:56:02 2 2006-02-16 02:30:53

List all columns and its data type in the rental table.

%%sql
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'rental';
 * postgresql://postgres:***@localhost:5432/dvdrental
7 rows affected.
column_name data_type
rental_id integer
rental_date timestamp without time zone
inventory_id integer
customer_id smallint
return_date timestamp without time zone
staff_id smallint
last_update timestamp without time zone
  1. Convert Between Data Types:

    • Convert the rental_date column in the rental table to a string

%%sql
SELECT rental_date::TEXT
FROM rental
 * postgresql://postgres:***@localhost:5432/dvdrental
16044 rows affected.
rental_date
2005-05-24 22:54:33
2005-05-24 23:03:39
2005-05-24 23:04:41
2005-05-24 23:05:21
2005-05-24 23:08:07
2005-05-24 23:11:53
2005-05-24 23:31:46
2005-05-25 00:00:40
2005-05-25 00:02:21
2005-05-25 00:09:02
2005-05-25 00:19:27
2005-05-25 00:22:55
2005-05-25 00:31:15
2005-05-25 00:39:22
2005-05-25 00:43:11
2005-05-25 01:06:36
2005-05-25 01:10:47
2005-05-25 01:17:24
2005-05-25 01:48:41
2005-05-25 01:59:46
30 rows, truncated to displaylimit of 20

Exercise 2: Filtering Rows Using WHERE#

  1. Filter Using BETWEEN:

    • Filter the payment table where the amount is between 5 and 10.

%%sql
SELECT *
FROM payment
WHERE amount BETWEEN 5 AND 10
 * postgresql://postgres:***@localhost:5432/dvdrental
3511 rows affected.
payment_id customer_id staff_id rental_id amount payment_date
17503 341 2 1520 7.99 2007-02-15 22:25:46.996577
17505 341 1 1849 7.99 2007-02-16 22:41:45.996577
17507 341 2 3130 7.99 2007-02-20 17:31:48.996577
17508 341 1 3382 5.99 2007-02-21 12:33:49.996577
17509 342 2 2190 5.99 2007-02-17 23:58:17.996577
17510 342 1 2914 5.99 2007-02-20 02:11:44.996577
17513 343 1 1564 6.99 2007-02-16 01:15:33.996577
17516 343 2 2461 6.99 2007-02-18 18:26:38.996577
17517 343 1 2980 8.99 2007-02-20 07:03:29.996577
17526 346 1 1994 5.99 2007-02-17 09:35:32.996577
17529 347 2 1711 8.99 2007-02-16 12:40:18.996577
17532 347 1 3092 8.99 2007-02-20 14:33:08.996577
17533 347 1 3326 7.99 2007-02-21 07:33:16.996577
17535 348 1 2041 8.99 2007-02-17 12:47:26.996577
17539 349 2 2987 6.99 2007-02-20 07:24:16.996577
17540 349 1 3067 8.99 2007-02-20 12:27:47.996577
17545 351 2 1792 5.99 2007-02-16 18:33:16.996577
17554 353 2 1928 7.99 2007-02-17 05:16:57.996577
17555 353 2 3233 6.99 2007-02-21 01:07:57.996577
17558 354 1 2769 6.99 2007-02-19 16:20:40.996577
30 rows, truncated to displaylimit of 20
  • Filter the payment table where the paymen_date is between 2007-02-17 and 2007-02-18.

%%sql
SELECT *
FROM payment 
WHERE payment_date BETWEEN '2007-02-17' AND '2007-02-18'
 * postgresql://postgres:***@localhost:5432/dvdrental
283 rows affected.
payment_id customer_id staff_id rental_id amount payment_date
17509 342 2 2190 5.99 2007-02-17 23:58:17.996577
17514 343 2 1879 0.99 2007-02-17 01:26:00.996577
17515 343 2 1922 0.99 2007-02-17 04:32:51.996577
17526 346 1 1994 5.99 2007-02-17 09:35:32.996577
17535 348 1 2041 8.99 2007-02-17 12:47:26.996577
17541 350 2 2011 3.99 2007-02-17 10:24:35.996577
17546 351 1 1869 0.99 2007-02-17 00:36:26.996577
17554 353 2 1928 7.99 2007-02-17 05:16:57.996577
17567 357 2 1971 1.99 2007-02-17 07:52:25.996577
17568 357 2 2153 6.99 2007-02-17 21:26:30.996577
17570 358 2 1908 0.99 2007-02-17 03:39:02.996577
17571 358 1 2114 5.99 2007-02-17 18:28:51.996577
17604 365 1 1983 4.99 2007-02-17 08:50:39.996577
17631 374 2 2046 1.99 2007-02-17 13:08:16.996577
17646 378 2 2134 7.99 2007-02-17 19:42:10.996577
17651 380 1 1868 3.99 2007-02-17 00:31:48.996577
17652 380 1 1984 2.99 2007-02-17 08:53:54.996577
17653 380 1 2018 3.99 2007-02-17 11:04:24.996577
17660 381 1 1878 1.99 2007-02-17 01:23:58.996577
17675 384 1 1961 0.99 2007-02-17 07:31:24.996577
30 rows, truncated to displaylimit of 20
  1. Filter Using IN:

    • Select the rows from the customer table where the first_name is either ‘Tracy’ or ‘Kelly’

%%sql
SELECT *
FROM customer
WHERE first_name in ('Tracy', 'Kelly')
 * postgresql://postgres:***@localhost:5432/dvdrental
4 rows affected.
customer_id store_id first_name last_name email address_id activebool create_date last_update active
67 1 Kelly Torres kelly.torres@sakilacustomer.org 71 True 2006-02-14 2013-05-26 14:49:45.738000 1
108 1 Tracy Cole tracy.cole@sakilacustomer.org 112 True 2006-02-14 2013-05-26 14:49:45.738000 1
546 1 Kelly Knott kelly.knott@sakilacustomer.org 552 True 2006-02-14 2013-05-26 14:49:45.738000 1
589 1 Tracy Herrmann tracy.herrmann@sakilacustomer.org 595 True 2006-02-14 2013-05-26 14:49:45.738000 1
  1. Filter Using LIKE:

    • Select the rows from the customer table where the first_name have 4 characters and ends with ‘e’

%%sql
SELECT *
FROM customer
WHERE first_name LIKE '___e'
 * postgresql://postgres:***@localhost:5432/dvdrental
12 rows affected.
customer_id store_id first_name last_name email address_id activebool create_date last_update active
65 2 Rose Howard rose.howard@sakilacustomer.org 69 True 2006-02-14 2013-05-26 14:49:45.738000 1
77 2 Jane Bennett jane.bennett@sakilacustomer.org 81 True 2006-02-14 2013-05-26 14:49:45.738000 1
85 2 Anne Powell anne.powell@sakilacustomer.org 89 True 2006-02-14 2013-05-26 14:49:45.738000 1
176 1 June Carroll june.carroll@sakilacustomer.org 180 True 2006-02-14 2013-05-26 14:49:45.738000 1
326 1 Jose Andrew jose.andrew@sakilacustomer.org 331 True 2006-02-14 2013-05-26 14:49:45.738000 1
403 1 Mike Way mike.way@sakilacustomer.org 408 True 2006-02-14 2013-05-26 14:49:45.738000 1
407 1 Dale Ratcliff dale.ratcliff@sakilacustomer.org 412 True 2006-02-14 2013-05-26 14:49:45.738000 1
424 2 Kyle Spurlock kyle.spurlock@sakilacustomer.org 429 True 2006-02-14 2013-05-26 14:49:45.738000 1
498 1 Gene Sanborn gene.sanborn@sakilacustomer.org 503 True 2006-02-14 2013-05-26 14:49:45.738000 1
569 2 Dave Gardiner dave.gardiner@sakilacustomer.org 575 True 2006-02-14 2013-05-26 14:49:45.738000 1
593 2 Rene Mcalister rene.mcalister@sakilacustomer.org 599 True 2006-02-14 2013-05-26 14:49:45.738000 1
598 1 Wade Delvalle wade.delvalle@sakilacustomer.org 604 True 2006-02-14 2013-05-26 14:49:45.738000 1

Select the rows from the customer table where

  • first_name DO NOT ends with ‘e’ AND

  • last_name starts with ‘E’

%%sql
SELECT *
FROM customer
WHERE first_name NOT LIKE '%e' AND last_name LIKE 'E%'
 * postgresql://postgres:***@localhost:5432/dvdrental
9 rows affected.
customer_id store_id first_name last_name email address_id activebool create_date last_update active
524 1 Jared Ely jared.ely@sakilacustomer.org 530 True 2006-02-14 2013-05-26 14:49:45.738000 1
48 1 Ann Evans ann.evans@sakilacustomer.org 52 True 2006-02-14 2013-05-26 14:49:45.738000 1
417 1 Travis Estep travis.estep@sakilacustomer.org 422 True 2006-02-14 2013-05-26 14:49:45.738000 1
418 2 Jeff East jeff.east@sakilacustomer.org 423 True 2006-02-14 2013-05-26 14:49:45.738000 1
422 1 Melvin Ellington melvin.ellington@sakilacustomer.org 427 True 2006-02-14 2013-05-26 14:49:45.738000 1
466 1 Leo Ebert leo.ebert@sakilacustomer.org 471 True 2006-02-14 2013-05-26 14:49:45.738000 1
510 2 Ben Easter ben.easter@sakilacustomer.org 515 True 2006-02-14 2013-05-26 14:49:45.738000 0
535 1 Javier Elrod javier.elrod@sakilacustomer.org 541 True 2006-02-14 2013-05-26 14:49:45.738000 1
547 1 Kurt Emmons kurt.emmons@sakilacustomer.org 553 True 2006-02-14 2013-05-26 14:49:45.738000 1

Exercise 3: Derived Columns#

  1. Concatenate Columns:

    • Concatenate the first_name and last_name columns and turn all into lower cases

    • Store them in a new column called full_name

%%sql
SELECT 
   first_name,
   last_name,
   LOWER(CONCAT(first_name, last_name)) AS full_name
FROM customer
 * postgresql://postgres:***@localhost:5432/dvdrental
599 rows affected.
first_name last_name full_name
Jared Ely jaredely
Mary Smith marysmith
Patricia Johnson patriciajohnson
Linda Williams lindawilliams
Barbara Jones barbarajones
Elizabeth Brown elizabethbrown
Jennifer Davis jenniferdavis
Maria Miller mariamiller
Susan Wilson susanwilson
Margaret Moore margaretmoore
Dorothy Taylor dorothytaylor
Lisa Anderson lisaanderson
Nancy Thomas nancythomas
Karen Jackson karenjackson
Betty White bettywhite
Helen Harris helenharris
Sandra Martin sandramartin
Donna Thompson donnathompson
Carol Garcia carolgarcia
Ruth Martinez ruthmartinez
30 rows, truncated to displaylimit of 20
  1. Arithmetic Operations:

    • Calculate the payment amount assuming a discount rate of 15%

    • Store them in a new column called discount_payment

%%sql
SELECT 
   *,
   TRUNC((1-0.15)*amount,2) as discount_payment
FROM payment
 * postgresql://postgres:***@localhost:5432/dvdrental
14596 rows affected.
payment_id customer_id staff_id rental_id amount payment_date discount_payment
17503 341 2 1520 7.99 2007-02-15 22:25:46.996577 6.79
17504 341 1 1778 1.99 2007-02-16 17:23:14.996577 1.69
17505 341 1 1849 7.99 2007-02-16 22:41:45.996577 6.79
17506 341 2 2829 2.99 2007-02-19 19:39:56.996577 2.54
17507 341 2 3130 7.99 2007-02-20 17:31:48.996577 6.79
17508 341 1 3382 5.99 2007-02-21 12:33:49.996577 5.09
17509 342 2 2190 5.99 2007-02-17 23:58:17.996577 5.09
17510 342 1 2914 5.99 2007-02-20 02:11:44.996577 5.09
17511 342 1 3081 2.99 2007-02-20 13:57:39.996577 2.54
17512 343 2 1547 4.99 2007-02-16 00:10:50.996577 4.24
17513 343 1 1564 6.99 2007-02-16 01:15:33.996577 5.94
17514 343 2 1879 0.99 2007-02-17 01:26:00.996577 0.84
17515 343 2 1922 0.99 2007-02-17 04:32:51.996577 0.84
17516 343 2 2461 6.99 2007-02-18 18:26:38.996577 5.94
17517 343 1 2980 8.99 2007-02-20 07:03:29.996577 7.64
17518 343 1 3407 0.99 2007-02-21 14:42:28.996577 0.84
17519 344 1 1341 3.99 2007-02-15 10:54:44.996577 3.39
17520 344 2 1475 4.99 2007-02-15 19:36:27.996577 4.24
17521 344 1 1731 0.99 2007-02-16 14:00:38.996577 0.84
17522 345 2 1210 0.99 2007-02-15 01:26:17.996577 0.84
30 rows, truncated to displaylimit of 20

Exercise 4: Conditional Logic with CASE#

  1. Categorize Customers Based on Active Status:

    • Create a derived column called status to categorize customers as ‘Active’ or ‘Inactive’ based on column active.

    • Note that active == 0 means the customer was inactive and vice versa

%%sql
SELECT 
    *,
    CASE 
        WHEN active=0 THEN 'Inactive'
        WHEN active=1 THEN 'Active'
        ELSE Null
    END as status
from customer
 * postgresql://postgres:***@localhost:5432/dvdrental
599 rows affected.
customer_id store_id first_name last_name email address_id activebool create_date last_update active status
524 1 Jared Ely jared.ely@sakilacustomer.org 530 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
1 1 Mary Smith mary.smith@sakilacustomer.org 5 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
2 1 Patricia Johnson patricia.johnson@sakilacustomer.org 6 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
3 1 Linda Williams linda.williams@sakilacustomer.org 7 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
4 2 Barbara Jones barbara.jones@sakilacustomer.org 8 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
5 1 Elizabeth Brown elizabeth.brown@sakilacustomer.org 9 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
6 2 Jennifer Davis jennifer.davis@sakilacustomer.org 10 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
7 1 Maria Miller maria.miller@sakilacustomer.org 11 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
8 2 Susan Wilson susan.wilson@sakilacustomer.org 12 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
9 2 Margaret Moore margaret.moore@sakilacustomer.org 13 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
10 1 Dorothy Taylor dorothy.taylor@sakilacustomer.org 14 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
11 2 Lisa Anderson lisa.anderson@sakilacustomer.org 15 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
12 1 Nancy Thomas nancy.thomas@sakilacustomer.org 16 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
13 2 Karen Jackson karen.jackson@sakilacustomer.org 17 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
14 2 Betty White betty.white@sakilacustomer.org 18 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
15 1 Helen Harris helen.harris@sakilacustomer.org 19 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
16 2 Sandra Martin sandra.martin@sakilacustomer.org 20 True 2006-02-14 2013-05-26 14:49:45.738000 0 Inactive
17 1 Donna Thompson donna.thompson@sakilacustomer.org 21 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
18 2 Carol Garcia carol.garcia@sakilacustomer.org 22 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
19 1 Ruth Martinez ruth.martinez@sakilacustomer.org 23 True 2006-02-14 2013-05-26 14:49:45.738000 1 Active
30 rows, truncated to displaylimit of 20
  1. Categorize Payments Based on Amount:

    • Create a derived column called amount_category to categorize payments as ‘Low’ (<5), ‘Medium’ (5-10), or ‘High’ (>10) based on the amount and display the first 10 rows.

%%sql
SELECT 
   *,
   CASE 
      WHEN amount<5 THEN 'Low'
      WHEN 5<=amount and amount<=10 THEN 'Medium'
      WHEN amount>10 THEN 'High'
      ELSE Null
   END AS amount_category
FROM payment
LIMIT 10
 * postgresql://postgres:***@localhost:5432/dvdrental
10 rows affected.
payment_id customer_id staff_id rental_id amount payment_date amount_category
17503 341 2 1520 7.99 2007-02-15 22:25:46.996577 Medium
17504 341 1 1778 1.99 2007-02-16 17:23:14.996577 Low
17505 341 1 1849 7.99 2007-02-16 22:41:45.996577 Medium
17506 341 2 2829 2.99 2007-02-19 19:39:56.996577 Low
17507 341 2 3130 7.99 2007-02-20 17:31:48.996577 Medium
17508 341 1 3382 5.99 2007-02-21 12:33:49.996577 Medium
17509 342 2 2190 5.99 2007-02-17 23:58:17.996577 Medium
17510 342 1 2914 5.99 2007-02-20 02:11:44.996577 Medium
17511 342 1 3081 2.99 2007-02-20 13:57:39.996577 Low
17512 343 2 1547 4.99 2007-02-16 00:10:50.996577 Low

Exercise 5: Using Built-in Functions#

  1. Date Functions:

    • Extract the month from the rental_date column

    • The month should be displayed in full month name (e.g., May, June)

    • Store them in a new column called rental_month

Hint: You can use TO_CHAR() to converts the rental_date to a string representing the full month name.

%%sql
SELECT 
   rental_date
   , TO_CHAR(rental_date, 'Month') AS rental_month
FROM rental
 * postgresql://postgres:***@localhost:5432/dvdrental
16044 rows affected.
rental_date rental_month
2005-05-24 22:54:33 May
2005-05-24 23:03:39 May
2005-05-24 23:04:41 May
2005-05-24 23:05:21 May
2005-05-24 23:08:07 May
2005-05-24 23:11:53 May
2005-05-24 23:31:46 May
2005-05-25 00:00:40 May
2005-05-25 00:02:21 May
2005-05-25 00:09:02 May
2005-05-25 00:19:27 May
2005-05-25 00:22:55 May
2005-05-25 00:31:15 May
2005-05-25 00:39:22 May
2005-05-25 00:43:11 May
2005-05-25 01:06:36 May
2005-05-25 01:10:47 May
2005-05-25 01:17:24 May
2005-05-25 01:48:41 May
2005-05-25 01:59:46 May
30 rows, truncated to displaylimit of 20
  1. String Functions:

    • Convert the first_name column to uppercase and display the first 10 rows.

%%sql
SELECT 
   first_name,
   UPPER(first_name) as upper_first_name
FROM customer
LIMIT 10
 * postgresql://postgres:***@localhost:5432/dvdrental
10 rows affected.
first_name upper_first_name
Jared JARED
Mary MARY
Patricia PATRICIA
Linda LINDA
Barbara BARBARA
Elizabeth ELIZABETH
Jennifer JENNIFER
Maria MARIA
Susan SUSAN
Margaret MARGARET
  1. Aggregate Functions:

    • Calculate the total amount of payments by summing up the amount column as total_pay.

%%sql
SELECT 
   SUM(amount) AS total_pay
FROM payment
 * postgresql://postgres:***@localhost:5432/dvdrental
1 rows affected.
total_pay
61312.04

Submission instructions#

{rubric: mechanics = 5}

  • Make sure the notebook can run from top to bottom without any error. Restart the kernel and run all cells.

  • Commit and push your notebook to the github repo

  • Double check your notebook is rendered properly on Github and you can see all the outputs clearly

  • Submit URL of your Github repo to Moodle under worksheet 3