For the 66 days of data science challenge, I am going back to the basics and relearning statistics. I am using a textbook titled Introduction to The Practice of Statistics Sixth Edition — Moore, McCabe Craig. Here is what I learned so far.
Data are numbers with context. Before you do any statistical calculation and create data visualizations you need to start with the habit of forming a question. “What does the data tell me?”
The starting point to any statistical analysis is to master the art of examining data.
|Person | Age | Weight
| — -| — -| — -| — -| — -|
|Buttercup| 24 |110|
|Bubbles| 24 |105|
|Blossom| 24 |107|
This is a table that contains data.
Individuals also known as cases, observations, and rows. If you are into programming you can think of rows as objects. Each object is like a noun that describes a (person, place, or thing). Objects have characteristics called variables.
Variables are also known as columns.
When you plan to do an exploratory data analysis(EDA) ask yourself the following questions.
- Why? Is there a specific question that I want to be answered by looking at this data? What is the purpose of this data?
- Who? What population does this data describe?
- What? How many columns does this data set have? How are these variables defined?