Lecture 11: Introduction to Non-Relational Databases

Lecture 11: Introduction to Non-Relational Databases#

Learning objectives#

By the end of this lecture, students should be able to:

Understand how relational and non-relational database differ in terms of data structure, schema, and query language.
Explain the pros & cons of each type of database and its use case
Explain different types of non-relational databases (i.e., column-base, key-pair, graph, document)

Readings#

Slides#

Note

Download a PDF version here

Supplemental materials#

Key differences between relational and non-relational databases#

Feature	SQL Databases (Relational)	NoSQL Databases (Non-Relational)
Data Model	Structured data with predefined schema (tables, rows, columns)	Flexible schema, supports unstructured and semi-structured data
Schema Flexibility	Rigid schema, requires predefined structure	Schema-less, allowing for dynamic and flexible data models
Scalability	Vertical scaling (scaling up by adding more power to a single server)	Horizontal scaling (scaling out by adding more servers)
ACID Compliance	Strong ACID (Atomicity, Consistency, Isolation, Durability) support	Often favors eventual consistency over strong ACID compliance
Performance	Efficient for complex queries and transactions	Optimized for high-volume reads/writes and specific use cases
Querying	Powerful and standardized SQL for complex queries	Varies by database type; may require custom query languages
Data Relationships	Strong support for complex joins and relationships	Limited or no support for complex joins, relationships are embedded or linked
Data Integrity	Enforces data integrity through constraints and normalization	Data integrity is managed by the application, denormalization is common
Use Case Suitability	Best for structured data, complex transactions, and relationships	Best for big data, real-time analytics, unstructured data, and high scalability needs
Maintenance	Requires more maintenance (e.g., schema changes, indexing, tuning)	Generally requires less maintenance, but can be complex in distributed systems
Learning Curve	Standardized and well-documented; widely taught	Diverse models with a steeper learning curve due to lack of standardization
Ecosystem	Mature ecosystem with a wide range of tools and support	Emerging ecosystem, tools and support vary by database type
Cost	Can be expensive at scale due to the need for powerful hardware	Cost-effective for large-scale systems, often runs on commodity hardware
Examples	MySQL, PostgreSQL, Oracle, SQL Server	MongoDB, Cassandra, Redis, Neo4j

Different types of non-relational databases and their features#

Type	Description	Features	Key Usage	Examples
Document Store	Stores data in documents (typically JSON or BSON format).	Flexible schema, hierarchical data structures, supports nested data.	Content management, user profiles, catalogs.	MongoDB, CouchDB, Firebase Firestore
Key-Value Store	Stores data as key-value pairs.	Simple data model, high performance for lookups by key, easy to scale horizontally.	Session management, caching, real-time data.	Redis, DynamoDB, Riak
Column Family Store	Stores data in columns rather than rows.	Optimized for read and write performance on large datasets, supports flexible column families.	Time-series data, analytics, real-time big data.	Apache Cassandra, HBase, ScyllaDB
Graph Database	Stores data as nodes and edges in a graph structure.	Optimized for relationships and connections, supports complex queries on graph data.	Social networks, recommendation systems, fraud detection.	Neo4j, ArangoDB, Amazon Neptune

Demo of connecting to a MongDB via pymongo#

Warning

You would need to create a separate json file to securely store your MongoDB username and password.

Make sure these information are not uploaded to any public server, that means DO NOT commit this credentials.json file. I have disabled the ability to commit this file by default in the .gitignore settings.

The json file should follow the template below:

{
  "host": "<your_host>",
  "port": 27017,
  "username": "<your_username>",
  "password": "<your_password>"
}

Below is the starter code to connect to your MongoDB database, provided that you have created a credentials.json file with the correct information.

from pymongo import MongoClient # import mongo client to connect
import json # import json to load credentials
import urllib.parse

# load credentials from json file
with open('credentials.json') as f:
    login = json.load(f)

# assign credentials to variables
username = login['username']
password = urllib.parse.quote(login['password'])
host = login['host']
url = "mongodb+srv://{}:{}@{}/?retryWrites=true&w=majority".format(username, password, host)

# connect to the database
client = MongoClient(url)

List all db names collections

# list all databases
client.list_database_names()

['sample_airbnb',
 'sample_analytics',
 'sample_geospatial',
 'sample_guides',
 'sample_mflix',
 'sample_restaurants',
 'sample_supplies',
 'sample_training',
 'sample_weatherdata',
 'admin',
 'local']

# list all collections in the sample_mflix database
client.sample_mflix.list_collection_names()

['comments', 'sessions', 'movies', 'theaters', 'users', 'embedded_movies']

# show the first document
client.sample_mflix.movies.find_one()

{'_id': ObjectId('573a1390f29313caabcd50e5'),
 'plot': 'The cartoonist, Winsor McCay, brings the Dinosaurus back to life in the figure of his latest creation, Gertie the Dinosaur.',
 'genres': ['Animation', 'Short', 'Comedy'],
 'runtime': 12,
 'cast': ['Winsor McCay', 'George McManus', 'Roy L. McCardell'],
 'num_mflix_comments': 0,
 'poster': 'https://m.media-amazon.com/images/M/MV5BMTQxNzI4ODQ3NF5BMl5BanBnXkFtZTgwNzY5NzMwMjE@._V1_SY1000_SX677_AL_.jpg',
 'title': 'Gertie the Dinosaur',
 'fullplot': 'Winsor Z. McCay bets another cartoonist that he can animate a dinosaur. So he draws a big friendly herbivore called Gertie. Then he get into his own picture. Gertie walks through the picture, eats a tree, meets her creator, and takes him carefully on her back for a ride.',
 'languages': ['English'],
 'released': datetime.datetime(1914, 9, 15, 0, 0),
 'directors': ['Winsor McCay'],
 'writers': ['Winsor McCay'],
 'awards': {'wins': 1, 'nominations': 0, 'text': '1 win.'},
 'lastupdated': '2015-08-18 01:03:15.313000000',
 'year': 1914,
 'imdb': {'rating': 7.3, 'votes': 1837, 'id': 4008},
 'countries': ['USA'],
 'type': 'movie',
 'tomatoes': {'viewer': {'rating': 3.7, 'numReviews': 29},
  'lastUpdated': datetime.datetime(2015, 8, 10, 19, 20, 3)}}

# list all keys in the first document
keys_all = client.sample_mflix.movies.find_one().keys()
keys_all

dict_keys(['_id', 'plot', 'genres', 'runtime', 'cast', 'num_mflix_comments', 'poster', 'title', 'fullplot', 'languages', 'released', 'directors', 'writers', 'awards', 'lastupdated', 'year', 'imdb', 'countries', 'type', 'tomatoes'])