Part I. Database

Input: census 1994 data from https://archive.ics.uci.edu/ml/datasets/adult. Dataset description: 1.mp4

Script for creating census.sqlite database: script.

Online CSV to SQL converter: http://www.convertcsv.com/csv-to-sql.htm.
Replace '?' with null. Add BEGIN;END;

Education level update script: script.

Video: 2.mp4

Part II. Queries

  1. Average capital gain for each occupation group: link.
  2. Top 3 occupations according to avg capital gain: link.
  3. Number of people in each occupation group with capital gain above average in their own group: link.
  4. Totals and riches in each occupation group: link.
  5. Mode (most frequent) occupation for each education group: link.
  6. Mode education for each occupation group: link.
  7. How many people from each occupation are in each capital gain group: link.
  8. How many people from each education level are in each capital gain group: link.
  9. Distribution of capital gain groups per marital status: link.
  10. Native country and high salary: link.
  11. Top percentage of high salaries per education level: link.
  12. Working hours per occupation, working hours and high salary: link,link.

Video: 3.mp4

Part III. Analytics

  1. Most frequent demographic characteristics in the high-salry group: link.

    Video: 4.mp4

  2. Predicting salary using look-alike predictor: link.

    Video: 5.mp4

Part IV. Visualization

All queries in files: queries.zip

  1. Result of query 3: above-average capital gain per occupation. 3.csv. Excel visualization: 3.xlsx. Result: 3.pdf.
  2. Result of query 4: high salary per occupation. 4.csv. Excel visualization: 4.xlsx. Result: 4.pdf.
  3. Result of query 8: capital gain per education level. 8.csv. Excel visualization: 8.xlsx. Result: 8.pdf.
  4. Result of query 10: high salary per native country. 10.csv. Excel visualization: 10.xlsx. Result: 10.pdf.

For more analytic methods with SQL refer to this book.