DASC 5410 - Data and Database Management for Data Science#

This course provides a comprehensive introduction to database systems, covering the fundamentals of data types, structures, and modeling techniques. Students will explore structured, semi-structured, and unstructured data, learning to use query languages like SQL. The course emphasizes data quality, Big Data storage (Hadoop, MapReduce), and NoSQL databases, with hands-on exercises in text analysis, temporal and spatial data, and data mining. By the end of the course, students will have the skills to design, manage, and analyze databases effectively in various real-world scenarios.

Learning objectives#

After completing this course, students should be able to:

  1. Demonstrate an understanding of concepts related to data and databases

  2. Perform queries on databases and use the results in other programs and programming languages

  3. Design a conceptual and physical model for a typical transactional database

  4. Set up a local database, import/export data, integrate data from multiple sources, and maintain data integrity

  5. Utilize different measurements for data quality, evaluate data sets for missing data, analyze outliers

  6. Prepare data for analysis by aggregation, dimensionality reduction, and discretization

  7. Identify and articulate the challenges and benefits of a specific type or set of data

  8. Describe the challenges associated with working with Big Data

  9. Manipulate and retrieve data from noSQL databases (MongoDB)

  10. Apply core data mining algorithms and be able to interpret the results.

Schedule & office hours#

  • Tuesday 12:30 – 13:20 pm in OM2612

  • Wednesday 12:30-2:20 pm in OM 1241

Office hour: Thu 11:30 AM-12:30 PM at OM 1232. Please book your timeslot here

Communication#

  • For any course-related questions, such as lectures, assignments, exames, course logistics, please ask them under the discussion forum in Moodle.

  • For any individual-related questions, such as academic concession, deadline extension, personal circumstances, etc., please email me at lnguyen[at]tru[dot]ca

  • Response time: I will try our best to reply to your inquiries as soon as possible during the normal working hours (9AM-5PM Mon-Fri). If you send me a message outside of regular working hours, please expect a response on the next working day.

Lectures#

Please find the topics for each week based on the course outline:

Week Number

Topic

Week 1

Introduction to database management systems

Week 2

Relationship in DBMSs

Week 3

Data querry with SQL part 1

Week 4

Data querry with SQL part 2

Week 5

Data querry with SQL part 3

Week 6

Introduction to non-relational database

Week 7

Data modelling in MongoDB

Week 8

Data manipulation in MongoDB

Week 9

Transactions in MongoDB

Week 10

Big data processing with PySpark

Week 11

Spark SQL and data manipulation

Week 12

Spark Machine learning

Week 13

Data ethics and privacy

Assessment overview#

You are responsible for the following deliverables, which will determine your course grade:

Assignment

Note

Weight

Date

Weekly worksheet x 10

Open-book. Release on Mondays

20%

Deadline on Moodle

Midterm 1

Closed-book, Moodle timed-quiz, 45 minutes

25%

Wed, Oct 9

Midterm 2

Closed-book, Moodle timed-quiz, 45 minutes

25%

Wed, Nov 6

Exam

Closed-book, Moodle timed-quiz, 90 minutes

30%

TBD

Weekly assignments#

Every week, you will have a worksheet that is worth 2%. These low-stakes assignments consist of multiple choice questions and small exercises that help you consolidate your understanding of the materials and serve as a formative assessment.

The worksheet will be distributed via Github classroom. See deadline of each worksheet on Moodle.

Submission instructions:

  • Push your work to Github

  • Make sure all the outputs are properly rendered (click run all cells)

  • Paste the URL to Moodle assignment AND CLICK SUBMIT!

For those who forgot to click submit, I will still grade your assignment (by searching for your GitHub repo), but I will apply a penalty of -5% for each occurrence (i.e., if you forgot the 2nd time, you’ll get -10%, and so on).

Secondly, please make sure you all your code output are rendered properly on your GitHub. To do so, click restart kernel and run all cells one last time before submitting your notebook. If I click on your notebook and I cannot see the output of the code cell, then you will get -5% of each occurrence.

Mid-terms & exam#

There will be two midterms and one final exam in this course.

The midterms are closed-book format, and they will take place on Moodle for the duration of 45 minutes. It will consist of a mix of multiple choice, fill in the blank, and open-ended questions.

The final exam is closed-book format, taking place on Moodle for the duration of 90 minutes. It will consist of a mix of multiple choice, fill in the blank, and open-ended questions.

The final exam will cover all the materials in the course.

Attendance, late assignments, academic concessions, academic accomodation#

Attendance#

A registered student who does not attend the first two events (e.g., lectures/labs/ etc.) of their course(s) and who has not made prior arrangements acceptable to the instructor(s) may, at the discretion of the instructor(s), be considered to have withdrawn from the course(s) and have their course registration(s) deleted.

Please refer to TRU’s attendance policy. In addition, we will take attendance during class via Moodle’s QR code.

In the CS department, you need to get at least 75% attendance for passing any course.

Academic concessions#

If you encounter situations that may impede your ability to meet course requirements—such as illness, family emergencies, or other significant life events—please notify the instructor at least 24 hours before deadline. Academic concessions, including extensions or alternative assessments, will be considered on a case-by-case basis. You may be required to provide documentation to support your request. Concession requests after the deadline has passed will likely be refused.

Late Assignments#

Assignments are expected to be submitted on time. Late submissions will incur a penalty of 25% per day, up to a maximum of 75%. After 3 days, late assignments will no longer be accepted and will receive a grade of zero. Extensions may be granted in exceptional circumstances, provided that you contact the instructor before the deadline.

Accessbility#

Students registered with the Accessibility Services who require accommodations must provide their Letter of Accommodation to the instructor as soon as possible. This letter will outline the necessary accommodations to ensure an equitable learning environment. Please ensure that this is done early in the term to facilitate timely arrangements.

Policy on the use of generative AI#

Please refer to TRU’s guideline on the use of generative AI tools such as chatGPT or Copilot in this course.

https://libguides.tru.ca/artificialintelligence