Worksheet 1: Introduction to databases#

Welcome! In this week, we will start off easy with some exercise to help you set up the environment in which you will be using for the upcoming assignment.

First off, you will need to install Miniconda, if you haven’t already

Install miniconda#

macOS

You can find the Mac ARM and Intel download links here: https://conda-forge.org/miniforge/. Make sure you use the Miniforge3 installers, not the other ones listed. We will assume you downloaded the file into your Downloads folder.

Once downloaded, open up a terminal and run the following command

bash \({HOME}/Downloads/Miniforge3.sh -b -p "\){HOME}/miniforge3”

After installation run the following commands

source “${HOME}/miniforge3/etc/profile.d/conda.sh”

conda activate

conda init

After installation, restart the terminal. If the installation was successful, you will see (base) prepending to your prompt string. To confirm that conda is working, you can ask it which version was installed:

conda –version

which should return something like this, it doesn’t have to be the exact same version:

conda 24.7.1

Note: If you see zsh: command not found: conda, see the section on Bash below to set your default Terminal shell to Bash as opposed to Zsh.

You can change the default shell in your Terminal to Bash by opening the Terminal and typing:

chsh -s /bin/bash

You will have to quit all instances of open Terminals and then restart the Terminal for this to take effect.

Windows

You can find the Mac ARM and Intel download links here: https://conda-forge.org/miniforge/. Make sure you use the Miniforge3 installers, not the other ones listed. We will assume you downloaded the file into your Downloads folder.

Once downloaded, run the installer.

Use all the default options in the installer.

The install location should look something like: C:\Users\YOUR_USER_NAME\miniforge3

Note: Do not add miniforge to PATH. We will set this up later.

After installation, open the Start Menu and search for the program called “Miniforge Prompt”. When this opens you will see a prompt similar to (base) C:\Users\your_name.

Type the following to check that your Python installation is working:

python –version

which should return Python 3.11.0 or greater:

Python 3.11.0

If not, run the following command in the Miniforge Prompt, Confirm that you are in the (base) environment. Then update the base python with:

conda install python=3.11

Integrating Python with the Git Bash terminal

To avoid having to open the separate Anaconda Prompt every time we want to use Python, we can make it available from the (Git Bash) terminal, which is what we will be using most of the time. To set this up, open the “Miniforge Prompt” again and type:

conda init bash

You will see that this modified a few configuration files, which makes conda visible to the terminal. Close all open terminal windows and launch a new one, you should now see that the prompt string has changed to include the word (base).

Set up the DASC 5410’s environment#

  • First off, clone the worksheet 1 github repo to your local computer.

If you don’t know how to clone a github repo yet, you can just open a terminal and type

git clone [insert your github URL to worksheet1 here]

For example, if I were a student, I would run git clone https://github.com/TRU-PBADS/week1-intro-nosql-quan3010

  • Second, open a terminal, and navigate to the git folder that you just cloned locally

cd [PATH to your git repo locally]

  • Third, install the conda environment based on the provided dasc_5410.yaml file

conda env create -f dasc_5410.yaml

This will take a while to set up the environment. You might be prompted to say y/n a few times. Once it’s finished, you can open a IDE (e.g., vscode or jupyterlab) and select the adsc_3610 environment

Exercise 1#

{rubric: accuracy = 5}

Follow the installation guide above. If successful, you should be able to run the following code chunk. We will try to import pymongo (to connect to mongoDB), ipython-sql which we use to run SQL querries directly in jupyter notebook, and otter which is an autograder package.

import pymongo
print(pymongo.__version__)
4.8.0
import otter
print(otter.__version__)
5.6.0
# load the ipython-sql extension
%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Exercise 2#

{rubric: accuracy = 5}

In this course, there will be some autograded question. Now I would like to you test one of them to see if it’s working.

Write a simply Pythonofunction to sum up two numbers

def sum(a,b):
    # BEGIN SOLUTION
    return a+b
    # END SOLUTION

Run the cell below to test if your sum function pass the autograded test. You should see a message like “q1 passed! 🎉”

assert sum(1,2) == 3, "Your function is not implemented correctly"

If you run into an error like this

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 grader.check("q1")

NameError: name 'grader' is not defined

Then make sure you run the first cell, on top of this notebook, to initialize otter-grader

Exercise 3: Download and install PostgreSQL#

3.1#

{rubric: completion = 5}

We will be using PostgreSQL as our database management system. You can download the most recent version of PostgreSQL from here.

Follow the instructions for the installation. In the password page, type whatever password you want, and make sure you remember it.

You do not need to run “StackBuilder” at the end of the installation (if you accidentally launch the StackBuilder, click “cancel”, you don’t need to check any boxes).

To test if the installation was successful open the SQL Shell app from the LaunchPad or applications directory. You will be asked to setup your configuration:

  • Accept the default value (the one within square brackets) for the first three values by pressing enter three times,

  • Enter postgres as the default username and hit enter,

  • Finally, type in the password that you set during installation and press enter one last time.

3.2#

{rubric: completion = 5}

Download a sample dvd rental dataset here. https://www.postgresqltutorial.com/postgresql-getting-started/postgresql-sample-database/

We will be using this dataset for exercises in the future

3.3#

{rubric: completion = 5}

Follow this video tutorial to load the dvdrental file into the PostgreSQL database

https://www.youtube.com/watch?v=mrlBdrOb_6s

3.4#

{rubric: completion = 5}

Now we will try to use the ipython-sql package to connect to our local PostgreSQL database called dvdrental

First you will need to create a credentials.json file in the data folder with the following configuration

{
    "host": "localhost",
    "port": 5432,
    "user": "postgres",
    "password": "replaceyourpasswordhere"
  }
import json
import urllib.parse

with open('data/credentials.json') as f:
    login = json.load(f)
    
username = login['user']
password = urllib.parse.quote(login['password'])
host = login['host']
port = login['port']

We will now try to connect to your dvdrental database in PostgreSQL.

If successful, it should display

‘Connected: postgres@dvdrental’

%sql postgresql://{username}:{password}@{host}:{port}/dvdrental
'Connected: postgres@dvdrental'

We can test to run a SQL querry on the actor table. This should give you the first name and last name of five actors

%sql SELECT first_name, last_name FROM actor Limit 5;
 * postgresql://postgres:***@localhost:5432/dvdrental
5 rows affected.
first_name last_name
Penelope Guiness
Nick Wahlberg
Ed Chase
Jennifer Davis
Johnny Lollobrigida

Submission instructions#

{rubric: mechanics = 5}

  • Make sure the notebook can run from top to bottom without any error. Restart the kernel and run all cells.

  • Commit and push your notebook to the github repo

  • Double check your notebook is rendered properly on Github and you can see all the outputs clearly