Week 3: Scrubbing and Analysis

Example Data Files
Dirty Data
CitiBike Example Data
CitiBike Data Website
Historical Weather Data

Additional Videos
John Tukey Prim-9
AI and Machine Learning

Tools
OpenRefine
R
R Studio
Python Pandas
ML5js
Ghostery

Guest Speaker
Amber Thomas, Data Journalist, Storyteller and Programmer at The Pudding and Data.World. Watch videos below.

Fall 2021
Spring 2021

Today we will be talking about a necessary evil when it comes to working with data, cleaning it. Unfortunately, if you are relying on someone else collecting the data for you, its usually not in the right format that you need it in and there might be some data points that are incorrect. Once you have scrubbed your data then you can go on to analysis. We will be talking about many tools you can use to analyze your data, but ultimately you should use what you are most comfortable with. During this analysis process we will be visualizing the data for insight only. No need to worry about design at this stage.

We will also briefly talk about how large datasets can be used for creating artificial intelligence using machine learning and deep learning techniques.

Lecture Slides

Homework

Find a data set online, clean it up and put it in a format you can work with. Feel free to use the data you found during our last class.

Once the data is clean, use either Google Sheets or Excel to start to play with the values, visualize for insight and pull out interesting points.

Reading: Chapter “Visualizing for the Mind”, Pages 111-128 and “Images in the Head”, pages 133-146

Decide on a topic to research for your midterm. Find at least two datasets based on that topic, analyze it and visualize for insight. Be prepared to discuss your findings next week.

Example: If you are interested in climate change legislation you can look at data around its cause and effect, i.e. temperature data, extreme weather, carbon emissions, methane emissions, extinction rates, government environmental policies and see if there is any correlation.

Some websites to get data from are:
NYC Data
US Census Data
UN Data

Watch the video of the Week 4 lecture and come with questions next week.