Day 2 — 66 days of data

Yesterday I learned that the first step of mastering statistics is to master the art of exploring data.

Exploring Data Types

Categorical data is data that can be in groups. They are labels. In the R programming language, they are called factors. Generally categorical data you will use a bar chart or pie chart to explore data. The distribution of categorical data are counts, frequency, or percentage.

For quantitative data, you would use a histogram, line chart, or stem plot ( only if the data is small).

Exploratory Data Analysis (EDA) workflow

  • Study each individual variable
  • Study the relationships between two variables
  • Create graphs of the distribution of variables
  • Last, add numerical summaries of specific aspects of data

Four things to measure the distribution of a variable shape, center, spread, and outliers.

Measures of center — mean and median

Measures of spread- quantile ranges

spread + center gives useful information about the distribution of the data.

shape- the data can have a normal distribution like a bell curve, or can be skewed to the right or left

Tools for finding outliers

standard deviation — measures how much a data point is away from the mean

Full Stack Developer | Aspiring Data Scientist | Northwestern Coding Bootcamp Student | Udacity Scholar | Foodie