## Slides: Distributions of Continuous Data

DISTRIBUTIONS OF CONTINUOUS DATA

ST101 – DR. ARIC LABARR

A random variable is a numerical description of the outcome of an experiment.

They can be either discrete or continuous.

A discrete random variable may assume either a finite number of values or an infinite sequence of values.

A continuous random variable may assume any numerical value in an interval or collection of intervals.

RANDOM VARIABLES

CONTINUOUS RANDOM VARIABLES

A continuous random variable can assume any value in an interval on the real line or in a collection of intervals on the real line.

It is not possible to talk about the probability of the random variable assuming a particular value.

Instead, we talk about the probability of the random variable assuming a value inside of a given interval.

PROBABILITIES ON INTERVALS

POPULAR CONTINUOUS DISTRIBUTIONS

Uniform

Exponential

Normal

A continuous random variable may assume any numerical value in an interval or collection of intervals.

It is not possible to talk about the probability of the random variable assuming a particular value, but we instead talk about probabilities of intervals.

SUMMARY

UNIFORM DISTRIBUTION

DISTRIBUTIONS OF CONTINUOUS DATA

A random variable follows a uniform distribution whenever the probability is proportional to the interval’s length.

In other words, every value has an equal probability of happening.

The probability density function for the uniform distribution is:

UNIFORM PROBABILITY DISTRIBUTION

Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.

The years of experience ranges from 2-12.

EXAMPLE OF UNIFORM DISTRIBUTION

Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.

EXAMPLE OF UNIFORM DISTRIBUTION

Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.

What is the probability a call is answered by an employee with 10 to 12 years of experience?

EXAMPLE OF UNIFORM DISTRIBUTION

What is the probability a call is answered by an employee with 10 to 12 years of experience?

Area under the curve between 10 and 12.

EXAMPLE OF UNIFORM DISTRIBUTION

Expected Value:

Variance:

MEASURES ON UNIFORM DISTRIBUTION

Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.

What is the expected years of experience of a person answering a new sales call?

EXAMPLE OF UNIFORM DISTRIBUTION

Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.

What is the expected years of experience of a person answering a new sales call?

EXAMPLE OF UNIFORM DISTRIBUTION

Assume that sales calls that go into a company are uniformly distributed by the years of experience of the sales staff so that everyone has the same chance of getting a call.

What is the expected years of experience of a person answering a new sales call?

EXAMPLE OF UNIFORM DISTRIBUTION

A random variable follows a uniform distribution whenever the probability is proportional to the interval’s length.

The probability density function for the uniform distribution is:

SUMMARY

NORMAL DISTRIBUTION

DISTRIBUTIONS OF CONTINUOUS DATA

The Normal probability distribution is one of the most common and important distributions for describing a continuous random variable.

The Normal distribution is the foundation of statistical inference:

Hypothesis Testing

Confidence Intervals

Regression Analysis

Appears in nature and real-world data.

IMPORTANCE

The probability density function for the Normal distribution is defined as:

PROBABILITY DENSITY FUNCTION

The probability density function for the Normal distribution is defined as:

PROBABILITY DENSITY FUNCTION

The probability density function for the Normal distribution is defined as:

PROBABILITY DENSITY FUNCTION

CHARACTERISTICS OF NORMAL DISTRIBUTION

CHARACTERISTICS OF NORMAL DISTRIBUTION

More Probable

Less Probable

CHARACTERISTICS OF NORMAL DISTRIBUTION

Mean can take ANY value

CHARACTERISTICS OF NORMAL DISTRIBUTION

Standard Deviation controls the width

The Normal probability distribution is one of the most common and important distributions for describing a continuous random variable.

The Normal distribution is the foundation of statistical inference.

The Normal distribution has some useful characteristics.

SUMMARY

EMPIRICAL RULE

DISTRIBUTIONS OF CONTINUOUS DATA

The probabilities for the Normal random variable are determined by the area under the curve.

The total area under the curve = 1.

Since the Normal distribution is perfectly symmetric around the mean (and median), then the area of the curve below the mean = above the mean = 0.5.

PROBABILITIES

EMPIRICAL RULE

EMPIRICAL RULE

EMPIRICAL RULE

EMPIRICAL RULE

EMPIRICAL RULE

EMPIRICAL RULE

Assume new employees at a company have previous years of professional experience that follow a Normal distribution where the mean is 7.5 and the standard deviation is 2.5.

What is the probability any random new employee has between 5 and 10 years of experience?

EXAMPLE

What is the probability any random new employee has between 5 and 10 years of experience?

EXAMPLE

What is the probability any random new employee has between 5 and 10 years of experience?

EXAMPLE

Assume new employees at a company have previous years of professional experience that follow a Normal distribution where the mean is 7.5 and the standard deviation is 2.5.

What is the probability any random new employee has between 2.5 and 10 years of experience?

EXAMPLE

What is the probability any random new employee has between 2.5 and 10 years of experience?

EXAMPLE

What is the probability any random new employee has between 2.5 and 10 years of experience?

EXAMPLE

The empirical rule (68, 95, 99.7 rule) is good for quick, fast, rough analysis.

Not good for exact analysis unless your interests are only in the integer standard deviations.

What about fractions of standard deviations away from the mean?

Need another way to quickly calculate area under the curve.

SUMMARY

STANDARD SCORES

DISTRIBUTIONS OF CONTINUOUS DATA

A random variable having a Normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard Normal probability distribution.

All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.

Standard Normal probability tables help calculate area under the curve.

CONVERSION OF NORMAL DISTRIBUTIONS

The standard Normal table is an extension of the empirical rule where the area under the standard Normal curve to the left of any point is calculated up to two decimal points.

STANDARD NORMAL TABLE

z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

.

.

.

.

.

.

.

.

.

.

.

0.5

.6915

.6950

.6985

.7019

.7054

.7088

.7123

.7517

.7190

.7224

0.6

.7257

.7291

.7324

.7357

.7389

.7422

.7454

.7486

.7517

.7549

0.7

.7580

.7611

.7642

.7673

.7704

.7734

.7764

.7794

.7823

.7852

0.8

.7881

.7910

.7939

.7967

.7995

.8023

.8051

.8078

.8106

.8133

0.9

.8159

.8186

.8212

.8238

.8264

.8289

.8315

.8340

.8365

.8389

.

.

.

.

.

.

.

.

.

.

.

The standard Normal table is an extension of the empirical rule where the area under the standard Normal curve to the left of any point is calculated up to two decimal points.

STANDARD NORMAL TABLE

z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

.

.

.

.

.

.

.

.

.

.

.

0.5

.6915

.6950

.6985

.7019

.7054

.7088

.7123

.7517

.7190

.7224

0.6

.7257

.7291

.7324

.7357

.7389

.7422

.7454

.7486

.7517

.7549

0.7

.7580

.7611

.7642

.7673

.7704

.7734

.7764

.7794

.7823

.7852

0.8

.7881

.7910

.7939

.7967

.7995

.8023

.8051

.8078

.8106

.8133

0.9

.8159

.8186

.8212

.8238

.8264

.8289

.8315

.8340

.8365

.8389

.

.

.

.

.

.

.

.

.

.

.

The standard Normal table is an extension of the empirical rule where the area under the standard Normal curve to the left of any point is calculated up to two decimal points.

STANDARD NORMAL TABLE

The standard Normal table is an extension of the empirical rule where the area under the standard Normal curve to the left of any point is calculated up to two decimal points.

To calculate values to the right of any point, use the laws of probability:

CALCULATING OPPOSITE PROBABILITIES

All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.

CONVERSION OF NORMAL DISTRIBUTIONS

All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.

CONVERSION OF NORMAL DISTRIBUTIONS

All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.

CONVERSION OF NORMAL DISTRIBUTIONS

All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.

CONVERSION OF NORMAL DISTRIBUTIONS

Z-SCORES

Assume that the daily number of total users follows a Normal distribution.  The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that any random day has more than 6,000 total users?

Z-SCORES BIKE DATA EXAMPLE

Assume that the daily number of total users follows a Normal distribution.  The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that any random day has more than 6,000 total users?

Z-SCORES BIKE DATA EXAMPLE

Z-SCORES BIKE DATA EXAMPLE

z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

.

.

.

.

.

.

.

.

.

.

.

0.5

.6915

.6950

.6985

.7019

.7054

.7088

.7123

.7517

.7190

.7224

0.6

.7257

.7291

.7324

.7357

.7389

.7422

.7454

.7486

.7517

.7549

0.7

.7580

.7611

.7642

.7673

.7704

.7734

.7764

.7794

.7823

.7852

0.8

.7881

.7910

.7939

.7967

.7995

.8023

.8051

.8078

.8106

.8133

0.9

.8159

.8186

.8212

.8238

.8264

.8289

.8315

.8340

.8365

.8389

.

.

.

.

.

.

.

.

.

.

.

Assume that the daily number of total users follows a Normal distribution.  The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the probability that any random day has more than 6,000 total users?

Z-SCORES BIKE DATA EXAMPLE

Z-SCORES BIKE DATA EXAMPLE

Z-SCORES BIKE DATA EXAMPLE

Assume that the daily number of total users follows a Normal distribution.  The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the number of daily users that would be in the bottom 10% of daily users?

Z-SCORES BIKE DATA EXAMPLE

Z-SCORES BIKE DATA EXAMPLE

z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

.

.

.

.

.

.

.

.

.

.

.

-1.4

.0808

.0793

.0778

.0764

.0749

.0735

.0721

.0708

.0694

.0681

-1.3

.0968

.0951

.0934

.0918

.0901

.0885

.0869

.0853

.0838

.0823

-1.2

.1151

.1131

.1112

.1093

.1075

.1056

.1038

.1020

.1003

.0985

-1.1

.1357

.1335

.1314

.1292

.1271

.1251

.1230

.1210

.1190

.1170

-1.0

.1587

.1562

.1539

.1515

.1492

.1469

.1446

.1423

.1401

.1379

.

.

.

.

.

.

.

.

.

.

.

Assume that the daily number of total users follows a Normal distribution.  The average daily number of total users is 4,504 with a standard deviation of 1,937.  What is the number of daily users that would be in the bottom 10% of daily users?

Z-SCORES BIKE DATA EXAMPLE

A random variable having a Normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard Normal probability distribution.

All Normal distributions can be converted into standard Normal distributions for ease of computing probabilities under the curve.

Standard Normal probability tables help calculate area under the curve.

SUMMARY