Lecture 11: Introduction to Non-Relational Databases#
Learning objectives#
By the end of this lecture, students should be able to:
Understand how relational and non-relational database differ in terms of data structure, schema, and query language.
Explain the pros & cons of each type of database and its use case
Explain different types of non-relational databases (i.e., column-base, key-pair, graph, document)
Readings#
Slides#
Note
Download a PDF version here
Supplemental materials#
Key differences between relational and non-relational databases#
Feature |
SQL Databases (Relational) |
NoSQL Databases (Non-Relational) |
---|---|---|
Data Model |
Structured data with predefined schema (tables, rows, columns) |
Flexible schema, supports unstructured and semi-structured data |
Schema Flexibility |
Rigid schema, requires predefined structure |
Schema-less, allowing for dynamic and flexible data models |
Scalability |
Vertical scaling (scaling up by adding more power to a single server) |
Horizontal scaling (scaling out by adding more servers) |
ACID Compliance |
Strong ACID (Atomicity, Consistency, Isolation, Durability) support |
Often favors eventual consistency over strong ACID compliance |
Performance |
Efficient for complex queries and transactions |
Optimized for high-volume reads/writes and specific use cases |
Querying |
Powerful and standardized SQL for complex queries |
Varies by database type; may require custom query languages |
Data Relationships |
Strong support for complex joins and relationships |
Limited or no support for complex joins, relationships are embedded or linked |
Data Integrity |
Enforces data integrity through constraints and normalization |
Data integrity is managed by the application, denormalization is common |
Use Case Suitability |
Best for structured data, complex transactions, and relationships |
Best for big data, real-time analytics, unstructured data, and high scalability needs |
Maintenance |
Requires more maintenance (e.g., schema changes, indexing, tuning) |
Generally requires less maintenance, but can be complex in distributed systems |
Learning Curve |
Standardized and well-documented; widely taught |
Diverse models with a steeper learning curve due to lack of standardization |
Ecosystem |
Mature ecosystem with a wide range of tools and support |
Emerging ecosystem, tools and support vary by database type |
Cost |
Can be expensive at scale due to the need for powerful hardware |
Cost-effective for large-scale systems, often runs on commodity hardware |
Examples |
MySQL, PostgreSQL, Oracle, SQL Server |
MongoDB, Cassandra, Redis, Neo4j |
Different types of non-relational databases and their features#
Type |
Description |
Features |
Key Usage |
Examples |
---|---|---|---|---|
Document Store |
Stores data in documents (typically JSON or BSON format). |
Flexible schema, hierarchical data structures, supports nested data. |
Content management, user profiles, catalogs. |
MongoDB, CouchDB, Firebase Firestore |
Key-Value Store |
Stores data as key-value pairs. |
Simple data model, high performance for lookups by key, easy to scale horizontally. |
Session management, caching, real-time data. |
Redis, DynamoDB, Riak |
Column Family Store |
Stores data in columns rather than rows. |
Optimized for read and write performance on large datasets, supports flexible column families. |
Time-series data, analytics, real-time big data. |
Apache Cassandra, HBase, ScyllaDB |
Graph Database |
Stores data as nodes and edges in a graph structure. |
Optimized for relationships and connections, supports complex queries on graph data. |
Social networks, recommendation systems, fraud detection. |
Neo4j, ArangoDB, Amazon Neptune |
Demo of connecting to a MongDB via pymongo#
Warning
You would need to create a separate json file to securely store your MongoDB username and password.
Make sure these information are not uploaded to any public server, that means DO NOT commit this credentials.json
file. I have disabled the ability to commit this file by default in the .gitignore settings.
The json file should follow the template below:
{
"host": "<your_host>",
"port": 27017,
"username": "<your_username>",
"password": "<your_password>"
}
Below is the starter code to connect to your MongoDB database, provided that you have created a credentials.json
file with the correct information.
from pymongo import MongoClient # import mongo client to connect
import json # import json to load credentials
import urllib.parse
# load credentials from json file
with open('credentials.json') as f:
login = json.load(f)
# assign credentials to variables
username = login['username']
password = urllib.parse.quote(login['password'])
host = login['host']
url = "mongodb+srv://{}:{}@{}/?retryWrites=true&w=majority".format(username, password, host)
# connect to the database
client = MongoClient(url)
List all db names collections
# list all databases
client.list_database_names()
['sample_airbnb',
'sample_analytics',
'sample_geospatial',
'sample_guides',
'sample_mflix',
'sample_restaurants',
'sample_supplies',
'sample_training',
'sample_weatherdata',
'admin',
'local']
# list all collections in the sample_mflix database
client.sample_mflix.list_collection_names()
['comments', 'sessions', 'movies', 'theaters', 'users', 'embedded_movies']
# show the first document
client.sample_mflix.movies.find_one()
{'_id': ObjectId('573a1390f29313caabcd50e5'),
'plot': 'The cartoonist, Winsor McCay, brings the Dinosaurus back to life in the figure of his latest creation, Gertie the Dinosaur.',
'genres': ['Animation', 'Short', 'Comedy'],
'runtime': 12,
'cast': ['Winsor McCay', 'George McManus', 'Roy L. McCardell'],
'num_mflix_comments': 0,
'poster': 'https://m.media-amazon.com/images/M/MV5BMTQxNzI4ODQ3NF5BMl5BanBnXkFtZTgwNzY5NzMwMjE@._V1_SY1000_SX677_AL_.jpg',
'title': 'Gertie the Dinosaur',
'fullplot': 'Winsor Z. McCay bets another cartoonist that he can animate a dinosaur. So he draws a big friendly herbivore called Gertie. Then he get into his own picture. Gertie walks through the picture, eats a tree, meets her creator, and takes him carefully on her back for a ride.',
'languages': ['English'],
'released': datetime.datetime(1914, 9, 15, 0, 0),
'directors': ['Winsor McCay'],
'writers': ['Winsor McCay'],
'awards': {'wins': 1, 'nominations': 0, 'text': '1 win.'},
'lastupdated': '2015-08-18 01:03:15.313000000',
'year': 1914,
'imdb': {'rating': 7.3, 'votes': 1837, 'id': 4008},
'countries': ['USA'],
'type': 'movie',
'tomatoes': {'viewer': {'rating': 3.7, 'numReviews': 29},
'lastUpdated': datetime.datetime(2015, 8, 10, 19, 20, 3)}}
# list all keys in the first document
keys_all = client.sample_mflix.movies.find_one().keys()
keys_all
dict_keys(['_id', 'plot', 'genres', 'runtime', 'cast', 'num_mflix_comments', 'poster', 'title', 'fullplot', 'languages', 'released', 'directors', 'writers', 'awards', 'lastupdated', 'year', 'imdb', 'countries', 'type', 'tomatoes'])