Journal #1 – The Importance of EDA
Due: Mar 21, 2019 at 11:59 PM

O’Neil and Schutt are strong supporters of using EDA to really learn from your data prior to building any model that will be used to answer questions or predict outcomes. What can you do to ensure that your own processes embrace this step when working as a data scientist/analyst? What are the consequences of skipping EDA and jumping right into model building? 

The very end of Chapter 7: Extracting Meaning from Data presents a brief section on privacy. How do you think privacy concerns will shape data collection methods in the future? Would you raise a flag of concern (be a whistle blower) if you thought your organization was using data inappropriately?

Assignment 2 Warm Up exercise 

This assignment has two parts.

The first part, worth 50 points, uses the HERD data file and requires that you compute basic descriptive statistics, create some boxplots, and perform some comparisons of expenditures for Texas schools only.

The second part uses the breast cancer data file and requires that you examine the variables to determine if there are any that can be used to help identify breast cancer. Boxplots and summary statistics are used to help you determine if any variables show statistically significant differences in measurements between benign and malignant cells.