Example Data Files
CitiBike Example Data
CitiBike Data Website
Historical Weather Data
John Tukey Prim-9
AI and Machine Learning
Amber Thomas, Data Journalist, Storyteller and Programmer at The Pudding and Data.World. Watch videos below.
Today we will be talking about a necessary evil when it comes to working with data, cleaning it. Unfortunately, if you are relying on someone else collecting the data for you, its usually not in the right format that you need it in and there might be some data points that are incorrect. Once you have scrubbed your data then you can go on to analysis. We will be talking about many tools you can use to analyze your data, but ultimately you should use what you are most comfortable with. During this analysis process we will be visualizing the data for insight only. No need to worry about design at this stage.
We will also briefly talk about how large datasets can be used for creating artificial intelligence using machine learning and deep learning techniques.
Find a data set online, clean it up and put it in a format you can work with. Feel free to use the data you found during our last class.
Once the data is clean, use either Google Sheets or Excel to start to play with the values, visualize for insight and pull out interesting points.
Reading: Pages 111-146
Decide on a topic to research for your midterm. Find at least two datasets based on that topic, analyze it and visualize for insight. Be prepared to discuss your findings next week.
Example: If you are interested in climate change legislation you can look at data around its cause and effect, i.e. temperature data, extreme weather, carbon emissions, methane emissions, extinction rates, government environmental policies and see if there is any correlation.
Some websites to get data from are:
US Census Data
Watch the video of the Week 4 lecture and come with questions next week.