Introduction to Databases
CMPT 321, Fall 2017
The course presents an introduction to database management systems (DBMS), with an emphasis on how to use the system for modeling and querying information. The course will focus on 4 main areas:
- Database design: how can we model the world in terms of data?
- Data analytics: how can we answer questions about the world in terms of questions on such data?
- Concurrency and robustness: how does the DBMS allow multiple users to query and modify the same data? What happens when there is a system failure?
- Efficiency and scalability: how to store very large amounts of data and process them efficiently?
Table of contents |
---|
1. Course Info
1.1. Contacts |
|
---|---|
Instructor: | Marina Barsky |
Lecture hours: | Wed, 5:40- 8:35 PM, FSH 112 |
Office hours: | Fri, 2:00 - 4:00 PM, FSH 111 |
e-mail: | [email protected] |
1.2. Textbook
"Database Systems: The Complete Book" by H. Garcia-Molina, J. D. Ullman, and J. Widom, 2nd Edition.
1.3. Deliverables |
|||
---|---|---|---|
Quizzes | 10% | (in-class) | |
Assignments: | 20% | ||
Midterm test: | 10% | September 27 (in-class) | |
Final exam*: | 30% | Scheduled by the college | |
Final project*: | 30% | Due: last week of classes |
*Score of at least 50% is required in order to pass the course.
2. Lecture handouts
- Intro to Database Management Systems. Slides Intro. Readings*: 1 (The world of database systems).
- Conceptual Design in Relational Model. Entity-relationship diagrams.
Slides 01.01. Model refinements: week entity sets, subclasses.
Slides 01.02. Converting ER diagrams to relations.
Converting schemas to physical tables. SQLite setup and data types.
Slides 01.03. Exercises: ER to tables.
W2.
Exercises: subclasses to tables.
W3.
Readings: 4.1 - 4.6 - Queries in Relational Model. Relational algebra.
Slides 02.01.
Exercises: RA queries on movies.
RA.
General approach for writing relational-algebra queries with solutions.
Slides 02.02.
Bags vs. sets. Extended operators of relational algebra.
Slides 02.03.
Exercises: RA exercises.
W4.
(Solution to the last query at the end of Slides 02.02).
Readings: 2.4 + 5.1-5.2 - Structured Query Language SQL.
Select-From-Where queries. 3-valued logic. Subqueries.
Slides 03.01.
Workshop on NULLs. Instructions.
Script.
Grouping and aggregation.
Slides 03.02.
Set operators.
Slides 03.03.
Summary (what to pay attention to): Slides 03.04. Examples: University DB script. Queries. Exercises: SQL queries. W5. Solutions to part II (run against University database). Solutions to part III (run against Movies database: db script, insert script).
Using views for query readability. Slides 03.05.
Readings: 6.1 - 6.4, 8.1 - Data analytics with SQL.
Basic Constraints: Slides 04.01.
Data Manipulation: INSERT, UPDATE, DELETE. Slides 04.02.
Using SQL for predictions and recommendations. Naol's presentation. Slides 04.03. Chapter from a wonderful book by Toby Segaran.
Interpreting query results. Simpson's paradox. Slides 04.04.
Data analytics with SQL and Excel: Workshop 1. -
Embedding SQL into conventional languages. Slides 03.07.
Python DB app: Workshop 2.
Readings: 9.3+9.6 - Preserving database consistency. PostgreSQL.
Data types Slides 05.01.
Integrity and value constraints. Slides 05.02.
Transactions. Slides 05.03.
Implementing concurrency. Slides 05.04.
Shortened version.
Controlling isolation levels with SQL. Slides 05.05.
Readings: 6.6, 7.1-7.4, 18.1-18.4 - Design theory of relational databases. Introduction to normalization.
Slides 06.01.
General algorithm for BCNF decomposition. Chase test for lossless decomposition.
Problems with BCNF: loosing functional dependencies.
4-th normal form. Slides 06.02.
Summary lecture. Slides 06.03.
Readings: 3.1 - 3.7 . Paper. - Advanced features of relational databases: Indexes.
Flat indexes. B-tree. Index selection.
Slides 07.01.
Readings: 14.1 - 14.2, 8.3 - 8.4 - Alternative data models.. Semantic data modeling.
Triplestores. Graph databases. NoSQL databases.
Slides 08.01.
Map-reduce. Slides 08.02.
Readings: 20.1 - 20.2 (especially 20.1.1), 23.2
Chapter I from "Programming the Semantic Web" by Toby Segaran.
Chapter 2 from "Mining of Massive Datasets" by Leskovec et al. (free online version).
* Unless specified otherwise, readings refer to the chapters in the main textbook.
3. Assignments
Two programming assignments:
A1. Data analytics. Due date: October 29, 11:30 PM.
Cancelled as partly included in the final project.
4. Tests
Sample Midterm with Solutions.
Midterm review: RA queries.
Midterm review: SQL queries.
Exercise: INSERT, UPDATE, DELETE.
Exercise: Constraints.
5. Final Project
Type I. Database application: Instructions.
Type II. Data analytics with SQL: Instructions.