Marina Barsky
Marina Barsky has been teaching Computer Science for awhile. She likes designing original new courses and especially creating interactive stories to demonstrate different concepts. Here is for example a story about Knapsack 01, and here is a Turing Machine demo. She considers Computer Science training to be somewhat similar to the martial arts training: to gain the skill you need a lot of exercise in a relaxed, fully-concentrated mode.
Marina is fascinated with very large strings. She often stares at DNA collections trying to understand what they encode. It does not seem possible to detect any patterns with naked eye, so Marina has no choice but to invent new algorithms which can do the job. In this spirit, she created a new method for discovering all-against-all approximate matches, a new method for indexing genomes using an on-disk suffix tree, and a new scalable method for suffix sorting. In one of her experiments with terabytes of human DNA, she managed to detect presence and absence of certain substrings in raw genome data of different individuals. As you can see here – based on these substrings, our entire Sapiens species can be divided into several populations. If you have your own genome sequenced, you can use this tool to find out to which population you belong.
In addition, Marina likes using her extensive Machine Learning background to discover new interesting patterns from massive datasets. It seems that with some Math and Machine Learning algorithms we can explore the world around us without leaving our room! Recently she discovered that all the coniferous trees on campus can be identified with at most 4 questions, and that Brazil is very different from Italy: Brazilians never sing sad songs, while Italians like to sing about their hurt feelings. This last insight might help her to get more in tune with local populations (cry with Italians and dance with Brazilians) – whenever she finally decides to leave her comfortable chair and start exploring the real world.
Courses
Some course materials.
-
Machine Learning and Data Mining. Study of learning algorithms which make it possible for a machine to learn on its own without being explicitly instructed.
-
Databases. Two-levels: using database systems (DBMS) for developing real-life applications, and in-depth coverage of algorithms and data structures for DBMS implementations.
-
Algorithms and Data Structures. Data structures in Java. Algorithm design techniques. Extensive sets of problem-solving exercises.
-
Introduction to Computer Science. Basic concepts and programming paradigms.
-
System Programming. Software techniques in a Unix-style environment using scripting languages and C. Inter-process communication.
-
Discrete Mathematics. Logic, proofs, sets, trees, graphs, discrete probability, combinatorics - all in context of Computer Science applications.
-
Web Programming. Browser games. Multi-player games with web sockets. Full-stack web development.
- Web Programming with NodeJS: link.
Projects
Algorithmics
- Suffix Rank: a new scalable algorithm for suffix sorting. VLDB paper. VLDB presentation.
- K-core decomposition of very large graphs. VLDB paper.
- Full-text (substring) indexes in external memory. Book.
- Substring Indexes for inputs larger than main memory. Journal paper.
- Suffix Trees for large genomic sequences. SPE paper.
- Online update of B-trees. CIKM paper.
- All-against-all Approximate Substring Matching. ACM JEA paper.
- Longest Common Subsequence for a set of strings. BIBE paper.
Data Science
- Flipping Correlations. VLDB paper. VLDB slides.
- Clustering Human Genomes. Web application.
- Parsimonous Plant Classification using Decision Trees. Project page.
- Modeling topics of pop-songs in different countries. Project page.