Marina Barsky

Marina Barsky has been teaching Computer Science for awhile. She likes designing original new courses and especially creating interactive stories to demonstrate different concepts. Here is for example a story about Knapsack 01, and here is a Turing Machine demo. She considers Computer Science training to be somewhat similar to the martial arts training: to gain the skill you need a lot of exercise in a relaxed, fully-concentrated mode.

Marina is fascinated with very large strings. She often stares at DNA collections trying to understand what they encode. It does not seem possible to detect any patterns with naked eye, so Marina has no choice but to invent new algorithms which can do the job. In this spirit, she created a new method for discovering all-against-all approximate matches, a new method for indexing genomes using an on-disk suffix tree, and a new scalable method for suffix sorting. In one of her experiments with terabytes of human DNA, she managed to detect presence and absence of certain substrings in raw genome data of different individuals. As you can see here – based on these substrings, our entire Sapiens species can be divided into several populations. If you have your own genome sequenced, you can use this tool to find out to which population you belong.

In addition, Marina likes using her extensive Machine Learning background to discover new interesting patterns from massive datasets. It seems that with some Math and Machine Learning algorithms we can explore the world around us without leaving our room! Recently she discovered that all the coniferous trees on campus can be identified with at most 4 questions, and that Brazil is very different from Italy: Brazilians never sing sad songs, while Italians like to sing about their hurt feelings. This last insight might help her to get more in tune with local populations (cry with Italians and dance with Brazilians) – whenever she finally decides to leave her comfortable chair and start exploring the real world.

Courses

Some course materials.

Machine Learning and Data Mining. Study of learning algorithms which make it possible for a machine to learn on its own without being explicitly instructed.
- Machine Learning 2022: link.
- Machine Learning 2020 (with Jupiter Notebooks): link.
- Machine Learning 2019: link.
Databases. Two-levels: using database systems (DBMS) for developing real-life applications, and in-depth coverage of algorithms and data structures for DBMS implementations.
- Introduction to Databases: link.
- Database System Technology: link.
Algorithms and Data Structures. Data structures in Java. Algorithm design techniques. Extensive sets of problem-solving exercises.
- Data Structures 2022 link.
- Algorithm Design and Analysis 2022 link.
- Algorithm Design and Analysis 2021 link.
- Algorithms and Data Structures: link.
Introduction to Computer Science. Basic concepts and programming paradigms.
- Introduction to CS in Python: link.
- Object-oriented programming with Java: link.
System Programming. Software techniques in a Unix-style environment using scripting languages and C. Inter-process communication.
- System Programming 2016 (140 students): link.
- System Programming 2018 (25 students): link.
Discrete Mathematics. Logic, proofs, sets, trees, graphs, discrete probability, combinatorics - all in context of Computer Science applications.
Web Programming. Browser games. Multi-player games with web sockets. Full-stack web development.
- Web Programming with NodeJS: link.

Projects

Algorithmics
- Suffix Rank: a new scalable algorithm for suffix sorting. VLDB paper. VLDB presentation.
- K-core decomposition of very large graphs. VLDB paper.
- Full-text (substring) indexes in external memory. Book.
- Substring Indexes for inputs larger than main memory. Journal paper.
- Suffix Trees for large genomic sequences. SPE paper.
- Online update of B-trees. CIKM paper.
- All-against-all Approximate Substring Matching. ACM JEA paper.
- Longest Common Subsequence for a set of strings. BIBE paper.
Data Science
- Flipping Correlations. VLDB paper. VLDB slides.
- Clustering Human Genomes. Web application.
- Parsimonous Plant Classification using Decision Trees. Project page.
- Modeling topics of pop-songs in different countries. Project page.