Types of Data in Statistics

Types of Data are important concept in statistics that need to understand for applying statistical measures to the data correctly. Such perspective would come to more importance when we are doing exploratory data analysis (EDA) on some data-set. Usually, there are two types broadly quantitative and Categorical which broken down into as discrete or continuous and ordinal or nominal respectively.

 

But for EDA we convert categorical data into discrete numeric data. For example, in python we use label encoding or one hot encoding ways for the purpose.

 

Lets dive deep into the topic. That is surely exciting one.

 

 

Table of Content

  1. Quantitative Data
  2. Categorical Data

 

 

1. Quantitative Data

Numerical data that you can add, subtract, multiply, and divide. For example, Age, blood pressure, BMI, Pulse rate. We divide quantitative data into further two types continuous and discrete.

A. Continuous Data

If the measurements take on any values within some range e.g. weight, height etc.

See below table in which ‘DiabetesPedigreeFunction’ is continuous variable.

More Examples:

Cholesterol level

Age

Time to complete a homework assignment etc.

“Generally, continuous data come from measurements.”

 

B. Discrete Data

The data said to be discrete if the measurements are integers or can only take certain values e.g. number of students in school, number of cars enter in a city etc.

More Examples:

SAT scores

Number of students late for class Number

Number of crimes reported to police etc.

“Generally, discrete data are counts”.

See columns No. of Pregnancies, Glucose amount, Skin Thickness and insulin etc are discrete data types.

 

2. Categorical Data

This type of data arises when observations(row data) fall into separate distinct categories. For example student exam pass or fail, economic status can be

low, middle or high etc.

 

Categorical data can be further divide into following.

A. Binary Data

Two category data. For example, yes/no, Disease/no Disease, win/lose, Heads/Tails and Dead/Alive.

More Examples:

Smoking status

– smoker, non smoker

Attendance

–present, absent

Class

– lower classman, upper classman

etc.

 

 

B. Nominal Data

It contains of unordered categories. Like, the blood type(O, A, B,AB), Marital status or occupation(engineer, doctor or teacher).

Above diagram is Zurich dogs data-set on Kaggle. Here nominal categories are tricolor, Schwarz, braiun and brindle.

More Examples:

Hair color

– blonde, brown, red, black, etc.

Race

– Caucasian African Caucasian, African etc.

Smoking status

– smoker, non-smoker

C. Ordinal Data

It contains ordered categories. Like Staging breast cancer (I, II, III, or IV),  grades (A,B,C,D,F) , Rating on Likert scale (e.g., strongly agree, agree, neutral, disagree, strongly disagree) and Age in categories (10-20, 20-30, etc.) etc.

 

More Examples:

Class

– fresh, sophomore, junior, senior, super senior

Degree of illness

– none, mild, moderate, severe

Opinion of students about riots

– ticked off, neutral, happy

Author: HAMMAD ZAHID

I am a Machine learning Enthusiastic.

Leave a Reply

Your email address will not be published. Required fields are marked *