Database System Technology
CSC 443, winter 2017
The course examines advanced topics in database system implementation. It starts with the in-depth coverage of algorithms and data structures used in the implementation of relational DBMSs. These include sorting, hashing, and indexing. The use of these techniques in query processing and query optimization is studied next.
Parallel dataflow algorithms are introduced on the example of MapReduce model.
The techniques used for implementing concurrency control and crash recovery in database systems are studied in the last part of the course.
Students should have taken the courses listed in this calendar entry or have equivalent knowledge in algorithms, data structures, relational algebra, and systems.
Course work involves two major programming assignments, mostly in C/C++.
By the end of the course students should be able to propose and implement an efficient algorithm for an out-of-core problem (i.e. when the input size exceeds the available main memory); understand the internals of relational database system implementation; apply optimization algorithms to SQL queries to produce efficient query plans; apply concurrency control and recovery algorithms to sample transaction workloads; apply the MapReduce framework to a sample big data problem.
Table of contents |
---|
1. Course Info
1.1. Contacts |
|
---|---|
Instructor: | Marina Barsky |
Office hours: | Mon, 3:15 - 5:30 PM, BA3219 |
e-mail: | [email protected] |
TA: | Erkang Zhu |
e-mail: | [email protected] |
TA: | Cheng Chang |
e-mail: | [email protected] |
TA: | Vinay Gera |
e-mail: | [email protected] |
1.2. Textbook
"Database Systems: The Complete Book" by H. Garcia-Molina, J. D. Ullman, and J. Widom, 2nd Edition.
We are interested in Part IV of the book.
1.3. Deliverables |
|||
---|---|---|---|
First Programming Assignment | 25% | ||
Part 1. Blocks: | Due*: Feb 1, 2017 | ||
Part 2. External sort: | Due: March 1, 2017 | ||
Part 3. Sort-merge join: | Due: March 15, 2017 | ||
Second Programming Assignment: MapReduce | 15% | Due: April 3, 2017 | |
Weekly quizzes: | 20% | Weekly (in-class) | |
Final exam**: | 40% | April 17, 9:00 - 12:00, TC 239 |
* Tentative due dates
** Score of at least 50% is required in order to pass the course.
2. Lecture handouts
- Intro to Database Management Systems. Slides Intro. Readings*: 1 (The world of database systems).
- Storage media. Memory Hierarchy. Disks mechanics. Computation Model. Slides 01.01.
Readings: 13.1 - 13.3.
Stable storage. Recovery from disk failures. Intermittent failures. Checksums. Disk crashes. Mirroring as a redundancy technique: Redundant Array of Independent Disks (RAID). RAID level 4. RAID level 5. Coping with multiple disk crashes: RAID level 6. Slides 01.02. Readings: 13.4.
Data layout on disk. Representing data elements. Independent readings: 13.5 - 13.8. - Algorithms for secondary storage. Two-phase, multiway merge sort (2PMMS). Slides 02.01. Analysis of 2PMMS. Impact of block size. Improving performance of 2PMMS. Comparing with tape algorithms. Slides 02.02. Readings: 15.4.1.
- Buffer management. Buffer pool. Pinning records. Replacement policies. Slides 02.03. Readings: 15.7.1 - 15.7.2.
- Data structures for secondary storage. Basic file organizations. Static indexes.
Slides 02.04. Readings: 14.1.
B-Trees. Insertion into B-Trees. Deletion from B-Trees. Slides 02.05. B-tree in action: silent movie. Readings: 14.2.
Secondary-storage hash tables. Dynamic hashing framework. Extendible hashing. Linear hashing. Slides 02.06. Readings: 14.3.
Multi-dimentional indexing. Independent readings (optional): 14.4 - 14.6. Bitmap Indexes. Slides 02.07. Readings: 14.7. - Implementing relational operators.
The model for cost estimation.
Implementing selection and projection. Main techniques: sorting, hashing, indexing.
Pipelining vs. materialization. Slides 03.01.
Readings: 15.1, 15.2.
Implementing joins. Slides 03.02. Readings: 15.3, 15.4.2 - 15.4.9, 15.5, 15.6. - Introduction to query optimization. Optimizing logical query plans.
Pushing selections. Laws for (bag) projections.
Slides 03.03.
Readings: 16.2.1 - 16.2.5, optional: 16.3.
Cost-based transformations. Estimating the size of intermediate outputs. Maintaining statistics. Dynamic Programming to select a join order. Greedy algorithm for selecting a join order. Slides 03.04. Readings: 16.4 - 16.6. - Scalability of systems and algorithms. Map-reduce and parallel dataflow.
Slides 04.01.
Algorithms in map-reduce: inverted index, join, duplicate elimination. Google PageRank algorithm and matrix-vector multiplication in map-reduce. Map-reduce vs RDBMS. Slides 04.02.
Readings: 20.2. Map-reduce paper. GFS paper. From the book on mining massive datasets: chapter on map-reduce. - Concurrency. ACID transactions. Serial and serializable schedules. Conflicts. Serializability/precedence graphs. Enforcing serializability by locks. Two phase locking (2PL) protocol. Shared/Exclusive locks. Upgrading locks. Deadlocks. Solution: Update locks. Transaction isolation levels. Slides 05.01. Readings: 18.1 - 18.4.
- Recovery from failures. Undo Logging. Checkpointing. Nonquiescent Checkpointing. Redo and Undo/Redo logging. Slides 05.02. Readings: 17.1 - 17.4.
* Unless specified otherwise, readings refer to the chapters in the main textbook.
3. Tutorials
Tutorials are held on Fridays, at 2:00 PM
UC 261: For students with family names starting with A-K
LM 158: For students with family names starting with L-Z
- Tutorial 2. 2PMMS exercises. Notes by Vinay Gera and Cheng Chang.
- Tutorial 4. B-trees. Board captures by Erkang Zhu.
- Tutorial 7. Implementing operators. Slides. Quiz 7 solutions by Erkang Zhu.
- Tutorial 8. Query optimization. Slides.
4. Assignments
Programming assignments
Assignment 1:
- Part 1. Exploring storage media. Handout. Due date: February 1. Submit through web interface on Markus.
- Part 2. External-memory sorting. Handout. Starter code: external_merge_starter. Due date: March 1. Submit through web interface on Markus.
- Part 3. Implementing joins. Handout. Due date: March 19. Submit through web interface on Markus.
Assignment 2: Thinking in MapReduce. Handout. Due date: April 10. Submit through web interface on Markus.
5. Exam preparation
Quizzes:
- Quiz1. Raid-6.
- Quiz2. EM-algorithms.
- Quiz3. Buffers.
- Quiz4. B-Tree.
- Quiz5. Dynamic hashes.
- Quiz6. Run-length encoding.
- Quiz7. Implementing relational operators.
- Quiz8. Query optimization.
- Quiz9. Order of joins.
- Quiz10. Serializable schedules.
To think about: Short in-class quizzes