Types of Data are important concept in statistics that need to understand for applying statistical measures to the data correctly. Such perspective would come to more importance when we are doing exploratory data analysis (EDA) on some data-set. Usually, there are two types broadly quantitative and Categorical which broken down into as discrete or continuous and ordinal or nominal respectively.
But for EDA we convert categorical data into discrete numeric data. For example, in python we use label encoding or one hot encoding ways for the purpose.
Lets dive deep into the topic. That is surely exciting one.
Table of Content
- Quantitative Data
- Categorical Data
1. Quantitative Data
Numerical data that you can add, subtract, multiply, and divide. For example, Age, blood pressure, BMI, Pulse rate. We divide quantitative data into further two types continuous and discrete.
A. Continuous Data
If the measurements take on any values within some range e.g. weight, height etc.
See below table in which ‘DiabetesPedigreeFunction’ is continuous variable.
Time to complete a homework assignment etc.
“Generally, continuous data come from measurements.”
B. Discrete Data
The data said to be discrete if the measurements are integers or can only take certain values e.g. number of students in school, number of cars enter in a city etc.
Number of students late for class Number
Number of crimes reported to police etc.
“Generally, discrete data are counts”.
See columns No. of Pregnancies, Glucose amount, Skin Thickness and insulin etc are discrete data types.
2. Categorical Data
This type of data arises when observations(row data) fall into separate distinct categories. For example student exam pass or fail, economic status can be
low, middle or high etc.
Categorical data can be further divide into following.
A. Binary Data
Two category data. For example, yes/no, Disease/no Disease, win/lose, Heads/Tails and Dead/Alive.
– smoker, non smoker
– lower classman, upper classman
B. Nominal Data
It contains of unordered categories. Like, the blood type(O, A, B,AB), Marital status or occupation(engineer, doctor or teacher).
Above diagram is Zurich dogs data-set on Kaggle. Here nominal categories are tricolor, Schwarz, braiun and brindle.
– blonde, brown, red, black, etc.
– Caucasian African Caucasian, African etc.
– smoker, non-smoker
C. Ordinal Data
It contains ordered categories. Like Staging breast cancer (I, II, III, or IV), grades (A,B,C,D,F) , Rating on Likert scale (e.g., strongly agree, agree, neutral, disagree, strongly disagree) and Age in categories (10-20, 20-30, etc.) etc.
– fresh, sophomore, junior, senior, super senior
Degree of illness
– none, mild, moderate, severe
Opinion of students about riots
– ticked off, neutral, happy