Data analytics with SQL and Excel
Tutorial
In this tutorial we use our SQL skills to perform analytic tasks on a census dataset. We want to find out what demographic characteristics contribute to high earnings: age, education, marital status, native country?
You will need:
- Input files: input.zip
- SQLite installed
- Spreadsheet software (MS Excel, Open office Calc, or Google spreadsheets)
Part I. Database
Input: census 1994 data from https://archive.ics.uci.edu/ml/datasets/adult. Dataset description: 1.mp4
Script for creating census.sqlite database: script.
Online CSV to SQL converter:
http://www.convertcsv.com/csv-to-sql.htm.
Replace '?' with null. Add BEGIN;END;
Education level update script: script.
Video: 2.mp4
Part II. Queries
- Average capital gain for each occupation group: link.
- Top 3 occupations according to avg capital gain: link.
- Number of people in each occupation group with capital gain above average in their own group: link.
- Totals and riches in each occupation group: link.
- Mode (most frequent) occupation for each education group: link.
- Mode education for each occupation group: link.
- How many people from each occupation are in each capital gain group: link.
- How many people from each education level are in each capital gain group: link.
- Distribution of capital gain groups per marital status: link.
- Native country and high salary: link.
- Top percentage of high salaries per education level: link.
- Working hours per occupation, working hours and high salary: link,link.
Video: 3.mp4
Part III. Analytics
Part IV. Visualization
All queries in files: queries.zip
- Result of query 3: above-average capital gain per occupation. 3.csv. Excel visualization: 3.xlsx. Result: 3.pdf.
- Result of query 4: high salary per occupation. 4.csv. Excel visualization: 4.xlsx. Result: 4.pdf.
- Result of query 8: capital gain per education level. 8.csv. Excel visualization: 8.xlsx. Result: 8.pdf.
- Result of query 10: high salary per native country. 10.csv. Excel visualization: 10.xlsx. Result: 10.pdf.
For more analytic methods with SQL refer to this book.