DASC 5410 - Data and Database Management for Data Science#
This course provides a comprehensive introduction to database systems, covering the fundamentals of data types, structures, and modeling techniques. Students will explore structured, semi-structured, and unstructured data, learning to use query languages like SQL. The course emphasizes data quality, Big Data storage (Hadoop, MapReduce), and NoSQL databases, with hands-on exercises in text analysis, temporal and spatial data, and data mining. By the end of the course, students will have the skills to design, manage, and analyze databases effectively in various real-world scenarios.
Learning objectives#
After completing this course, students should be able to:
Demonstrate an understanding of concepts related to data and databases
Perform queries on databases and use the results in other programs and programming languages
Design a conceptual and physical model for a typical transactional database
Set up a local database, import/export data, integrate data from multiple sources, and maintain data integrity
Utilize different measurements for data quality, evaluate data sets for missing data, analyze outliers
Prepare data for analysis by aggregation, dimensionality reduction, and discretization
Identify and articulate the challenges and benefits of a specific type or set of data
Describe the challenges associated with working with Big Data
Manipulate and retrieve data from noSQL databases (MongoDB)
Apply core data mining algorithms and be able to interpret the results.
Schedule & office hours#
Tuesday 12:30 – 13:20 pm in OM2612
Wednesday 12:30-2:20 pm in OM 1241
Office hour: Thu 11:30 AM-12:30 PM at OM 1232. Please book your timeslot here
Communication#
For any course-related questions, such as lectures, assignments, exames, course logistics, please ask them under the discussion forum in Moodle.
For any individual-related questions, such as academic concession, deadline extension, personal circumstances, etc., please email me at lnguyen[at]tru[dot]ca
Response time: I will try our best to reply to your inquiries as soon as possible during the normal working hours (9AM-5PM Mon-Fri). If you send me a message outside of regular working hours, please expect a response on the next working day.
Lectures#
Please find the topics for each week based on the course outline:
Week Number |
Topic |
---|---|
Week 1 |
Introduction to database management systems |
Week 2 |
Relationship in DBMSs |
Week 3 |
Data querry with SQL part 1 |
Week 4 |
Data querry with SQL part 2 |
Week 5 |
Data querry with SQL part 3 |
Week 6 |
Introduction to non-relational database |
Week 7 |
Data modelling in MongoDB |
Week 8 |
Data manipulation in MongoDB |
Week 9 |
Transactions in MongoDB |
Week 10 |
Big data processing with PySpark |
Week 11 |
Spark SQL and data manipulation |
Week 12 |
Spark Machine learning |
Week 13 |
Data ethics and privacy |
Assessment overview#
You are responsible for the following deliverables, which will determine your course grade:
Assignment |
Note |
Weight |
Date |
---|---|---|---|
Weekly worksheet x 10 |
Open-book. Release on Mondays |
20% |
Deadline on Moodle |
Midterm 1 |
Closed-book, Moodle timed-quiz, 45 minutes |
25% |
Wed, Oct 9 |
Midterm 2 |
Closed-book, Moodle timed-quiz, 45 minutes |
25% |
Wed, Nov 6 |
Exam |
Closed-book, Moodle timed-quiz, 90 minutes |
30% |
TBD |
Weekly assignments#
Every week, you will have a worksheet that is worth 2%. These low-stakes assignments consist of multiple choice questions and small exercises that help you consolidate your understanding of the materials and serve as a formative assessment.
The worksheet will be distributed via Github classroom. See deadline of each worksheet on Moodle.
Submission instructions:
Push your work to Github
Make sure all the outputs are properly rendered (click run all cells)
Paste the URL to Moodle assignment AND CLICK SUBMIT!
For those who forgot to click submit, I will still grade your assignment (by searching for your GitHub repo), but I will apply a penalty of -5% for each occurrence (i.e., if you forgot the 2nd time, you’ll get -10%, and so on).
Secondly, please make sure you all your code output are rendered properly on your GitHub. To do so, click restart kernel and run all cells one last time before submitting your notebook. If I click on your notebook and I cannot see the output of the code cell, then you will get -5% of each occurrence.
Mid-terms & exam#
There will be two midterms and one final exam in this course.
The midterms are closed-book format, and they will take place on Moodle for the duration of 45 minutes. It will consist of a mix of multiple choice, fill in the blank, and open-ended questions.
The final exam is closed-book format, taking place on Moodle for the duration of 90 minutes. It will consist of a mix of multiple choice, fill in the blank, and open-ended questions.
The final exam will cover all the materials in the course.
Attendance, late assignments, academic concessions, academic accomodation#
Attendance#
A registered student who does not attend the first two events (e.g., lectures/labs/ etc.) of their course(s) and who has not made prior arrangements acceptable to the instructor(s) may, at the discretion of the instructor(s), be considered to have withdrawn from the course(s) and have their course registration(s) deleted.
Please refer to TRU’s attendance policy. In addition, we will take attendance during class via Moodle’s QR code.
In the CS department, you need to get at least 75% attendance for passing any course.
Academic concessions#
If you encounter situations that may impede your ability to meet course requirements—such as illness, family emergencies, or other significant life events—please notify the instructor at least 24 hours before deadline. Academic concessions, including extensions or alternative assessments, will be considered on a case-by-case basis. You may be required to provide documentation to support your request. Concession requests after the deadline has passed will likely be refused.
Late Assignments#
Assignments are expected to be submitted on time. Late submissions will incur a penalty of 25% per day, up to a maximum of 75%. After 3 days, late assignments will no longer be accepted and will receive a grade of zero. Extensions may be granted in exceptional circumstances, provided that you contact the instructor before the deadline.
Accessbility#
Students registered with the Accessibility Services who require accommodations must provide their Letter of Accommodation to the instructor as soon as possible. This letter will outline the necessary accommodations to ensure an equitable learning environment. Please ensure that this is done early in the term to facilitate timely arrangements.
Policy on the use of generative AI#
Please refer to TRU’s guideline on the use of generative AI tools such as chatGPT or Copilot in this course.
https://libguides.tru.ca/artificialintelligence