TESTING HYPOTHESES WITH DATA

ST101 – DR. ARIC LABARR


A hypothesis test uses data to help evaluate an initial claim about a parameter from the population.



HYPOTHESIS TESTING


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.



HYPOTHESIS TESTING THROUGH EXAMPLE


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.



HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.



HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.



HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads

0.25

Do you still think the coin is fair?


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.



HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads

0.25

3

Heads


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.



HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads

0.25

3

Heads

0.125

Do you still think the coin is fair?


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.



HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads

0.25

3

Heads

0.125

4

Heads

0.0625

Do you still think the coin is fair?


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.



HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads

0.25

3

Heads

0.125

4

Heads

0.0625

5

Heads

0.03125

Do you still think the coin is fair?


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.






No longer believe the coin is fair.




HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads

0.25

3

Heads

0.125

4

Heads

0.0625

5

Heads

0.03125


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.






No longer believe the coin is fair.




HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

P-value

1

Heads

0.50

2

Heads

0.25

3

Heads

0.125

4

Heads

0.0625

5

Heads

0.03125

NULL Hypothesis

Test Statistic

Decision on NULL Hypothesis


According to the CLT, sample means follow a Normal distribution as long as the sample size is big enough.


BIKE DATA EXAMPLE WITH MEANS


You believe the average daily number of total users is 4,000, but you want to know if there is more than that. You collect a sample of 731 days with an average daily number of total users at 4,504 with a standard deviation of 1,937.

BIKE DATA EXAMPLE WITH MEANS


You believe the average daily number of total users is 4,000, but you want to know if there is more than that. You collect a sample of 731 days with an average daily number of total users at 4,504 with a standard deviation of 1,937.

What is the probability you see this under the initial thought of 4,000 for an average?

BIKE DATA EXAMPLE WITH MEANS

 

 

 


What is the probability you see this under the initial thought of 4,000 for an average?

BIKE DATA EXAMPLE WITH MEANS

 

 

 


What is the probability you see this under the initial thought of 4,000 for an average?

BIKE DATA EXAMPLE WITH MEANS

 

 

 

 


What is the probability you see this under the initial thought of 4,000 for an average?  < 0.0001

BIKE DATA EXAMPLE WITH MEANS

 

 

 

 


You believe the average daily number of total users is 4,000, but you want to know if there is more than that. 

You collect a sample of 731 days with an average daily number of total users at 4,504 with a standard deviation of 1,937.

What is the probability you see this under the initial thought of 4,000 for an average?  < 0.0001!

Do you still believe your original hypothesis?


BIKE DATA EXAMPLE WITH MEANS


You believe the average daily number of total users is 4,000 (NULL Hypothesis), but you want to know if there is more than that. 

You collect a sample of 731 days with an average daily number of total users at 4,504 with a standard deviation of 1,937.  Test Statistic

What is the probability you see this under the initial thought of 4,000 for an average?  P-value

Do you still believe your original hypothesis?  Decision on NULL Hypothesis


BIKE DATA EXAMPLE WITH MEANS


A hypothesis test uses data to help evaluate an initial claim about a parameter from the population.

There are 4 main steps to hypothesis testing:

State the hypotheses

Test statistic

P-value

Decision on null hypothesis

SUMMARY


NULL AND ALTERNATIVE HYPOTHESIS

TESTING HYPOTHESES WITH DATA


 

HYPOTHESIS TESTING


It is not always obvious how the null and alternative hypotheses should be formulated.

The context of the situation is very important in determining how the hypotheses should be stated.

In some cases it is easier to identify the alternative hypothesis first!

Typically, the alternative is what we are trying to test and want to collect evidence for.


DEVELOPING NULL AND ALTERNATIVE


The null hypothesis is the status quo, or the initial claim about the data.

For example, the average daily number of total users is 4,000. 

 

 


The null hypothesis is the status quo, or the initial claim about the data.

For example, the average daily number of total users is 4,000. 



The null hypothesis is about the population parameter of interest, NOT sample statistics.

Parameters are unknown, while statistics are known.

 

 

 

 


 

 

 


 

 

 


 

NULL VS. ALTERNATIVE

 

 

 

 

 

 

One-Sided Tests

Two-Sided Test


 

SUMMARY


TEST STATISTIC

TESTING HYPOTHESES WITH DATA


The test statistic summarizes the amount of information provided in the sample.

Imagine this like evidence in a court case.

Test statistics have a common form:


TEST STATISTIC

 


The test statistic summarizes the amount of information provided in the sample.

Imagine this like evidence in a court case.

Test statistics have a common form:


TEST STATISTIC

 

Sample Information


The test statistic summarizes the amount of information provided in the sample.

Imagine this like evidence in a court case.

Test statistics have a common form:


TEST STATISTIC

 

Null Hypothesis Information


The test statistic summarizes the amount of information provided in the sample.

Imagine this like evidence in a court case.

Test statistics have a common form:


TEST STATISTIC

 

Estimated Variability from Sampling Distribution of Statistic


The test statistic summarizes the amount of information provided in the sample.

Sample means need the t-distribution because of the unknown values of the population standard deviation.


TEST STATISTIC FOR MEANS

 


The test statistic summarizes the amount of information provided in the sample.

Sample proportions use the Normal distribution.


TEST STATISTIC FOR PROPORTIONS

 


The test statistic summarizes the amount of information provided in the sample.

The test statistic calculation typically requires 3 pieces of information:

Statistic – information obtained from the sample.

Null value – information about the null hypothesis.

Standard error – measure of variability for the sampling distribution of the statistic.

SUMMARY


P-VALUE AND SIGNIFICANCE LEVEL

TESTING HYPOTHESES WITH DATA


Once the test statistic has been determined, we can calculate the probability that we got the information we did from our sample, assuming that the null hypothesis is true.

The p-value is the probability we got our sample, or a sample more extreme, under the null hypothesis.


P-VALUES


If the p-value is low, this implies that the sample we obtained from the population is extremely rare IF we assume that the null hypothesis is true.

This leads us to question the validity of the null hypothesis – rejecting the null hypothesis if the p-value is low enough.

How low is low enough?


SIGNIFICANCE LEVEL VS. P-VALUE


 

SIGNIFICANCE LEVEL VS. P-VALUE


 

SIGNIFICANCE LEVEL VS. P-VALUE


 

SIGNIFICANCE LEVEL VS. P-VALUE


 

 

 


 

 

 

 

P-value

 

Values are “far apart” according to p-value


 

 

 

 

P-value

 

Values are “close together” according to p-value


 

 

 

 

P-value

 

Values are “far apart” according to p-value


 

 

 

 

 

Values are “far apart” according to p-value

P-value/2

 

P-value/2

 


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.






No longer believe the coin is fair – but could it be?




HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads

0.25

3

Heads

0.125

4

Heads

0.0625

5

Heads

0.03125


I have a coin that you believe is fair to start.

To test if this coin is fair, you ask me to flip the coin repeatedly and record the results.






No longer believe the coin is fair – but could it be?   YES!




HYPOTHESIS TESTING THROUGH EXAMPLE

Flip Number

Result

Probability

1

Heads

0.50

2

Heads

0.25

3

Heads

0.125

4

Heads

0.0625

5

Heads

0.03125


Defines the unlikely values of the sample statistic if the null hypothesis is true.

This area is typically called the rejection region of the sampling distribution.

Selected before the hypothesis test is even run!

Typical values are 0.01, 0.05, 0.10.


 


The p-value is the probability we got our sample, or a sample more extreme, under the null hypothesis.

If the p-value is low, this implies that the sample we obtained from the population is extremely rare IF we assume that the null hypothesis is true.

The significance level defines the unlikely values of the sample statistic if the null hypothesis is true.



SUMMARY


HYPOTHESIS TEST FOR MEANS

TESTING HYPOTHESES WITH DATA


You believe the average daily number of total users is 4,000, but you want to know if there is more than that so you can decide on orders for future bikes to be added. 

You collect a sample of 731 days with an average daily number of total users at 4,504 with a standard deviation of 1,937.

With a significance level of 0.05, conduct a hypothesis test on this claim.


BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST


 

BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST


 

BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 


 

BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 


FINDING P-VALUE

One-Tail

0.25

0.20

0.15

0.10

0.05

0.025

0.01

0.005

0.001

.0005

Two-Tail

0.50

0.40

0.30

0.20

0.10

0.05

0.02

0.01

0.002

0.001

.

.

.

.

.

.

.

.

.

.

.

90

0.677

0.846

1.042

1.291

1.662

1.987

2.368

2.632

3.183

3.402

100

0.677

0.845

1.042

1.290

1.660

1.984

2.364

2.626

3.174

3.390

250

0.675

0.843

1.039

1.285

1.651

1.969

2.341

2.596

3.123

3.330

500

0.675

0.842

1.038

1.283

1.648

1.965

2.334

2.586

3.107

3.310

1000

0.675

0.842

1.037

1.282

1.646

1.962

2.330

2.581

3.098

3.300

.

.

.

.

.

.

.

.

.

.

.


FINDING P-VALUE

One-Tail

0.25

0.20

0.15

0.10

0.05

0.025

0.01

0.005

0.001

.0005

Two-Tail

0.50

0.40

0.30

0.20

0.10

0.05

0.02

0.01

0.002

0.001

.

.

.

.

.

.

.

.

.

.

.

90

0.677

0.846

1.042

1.291

1.662

1.987

2.368

2.632

3.183

3.402

100

0.677

0.845

1.042

1.290

1.660

1.984

2.364

2.626

3.174

3.390

250

0.675

0.843

1.039

1.285

1.651

1.969

2.341

2.596

3.123

3.330

500

0.675

0.842

1.038

1.283

1.648

1.965

2.334

2.586

3.107

3.310

1000

0.675

0.842

1.037

1.282

1.646

1.962

2.330

2.581

3.098

3.300

.

.

.

.

.

.

.

.

.

.

.


FINDING P-VALUE

One-Tail

0.25

0.20

0.15

0.10

0.05

0.025

0.01

0.005

0.001

.0005

Two-Tail

0.50

0.40

0.30

0.20

0.10

0.05

0.02

0.01

0.002

0.001

.

.

.

.

.

.

.

.

.

.

.

90

0.677

0.846

1.042

1.291

1.662

1.987

2.368

2.632

3.183

3.402

100

0.677

0.845

1.042

1.290

1.660

1.984

2.364

2.626

3.174

3.390

250

0.675

0.843

1.039

1.285

1.651

1.969

2.341

2.596

3.123

3.330

500

0.675

0.842

1.038

1.283

1.648

1.965

2.334

2.586

3.107

3.310

1000

0.675

0.842

1.037

1.282

1.646

1.962

2.330

2.581

3.098

3.300

.

.

.

.

.

.

.

.

.

.

.


 

BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 


BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 

 

 

P-value < 0.0005

 


 

BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 


You believe the average daily number of total users is 4,000, but you want to know if there is more than that so you can decide on orders for future bikes to be added OR less than 4,000 so you can pull stock from the streets, so bikes don’t sit unused. 

You collect a sample of 731 days with an average daily number of total users at 4,504 with a standard deviation of 1,937.

With a significance level of 0.05, conduct a hypothesis test on this claim.


BIKE DATA EXAMPLE FOR TWO-TAIL HYPOTHESIS TEST


 

BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 


FINDING P-VALUE

One-Tail

0.25

0.20

0.15

0.10

0.05

0.025

0.01

0.005

0.001

.0005

Two-Tail

0.50

0.40

0.30

0.20

0.10

0.05

0.02

0.01

0.002

0.001

.

.

.

.

.

.

.

.

.

.

.

90

0.677

0.846

1.042

1.291

1.662

1.987

2.368

2.632

3.183

3.402

100

0.677

0.845

1.042

1.290

1.660

1.984

2.364

2.626

3.174

3.390

250

0.675

0.843

1.039

1.285

1.651

1.969

2.341

2.596

3.123

3.330

500

0.675

0.842

1.038

1.283

1.648

1.965

2.334

2.586

3.107

3.310

1000

0.675

0.842

1.037

1.282

1.646

1.962

2.330

2.581

3.098

3.300

.

.

.

.

.

.

.

.

.

.

.


FINDING P-VALUE

One-Tail

0.25

0.20

0.15

0.10

0.05

0.025

0.01

0.005

0.001

.0005

Two-Tail

0.50

0.40

0.30

0.20

0.10

0.05

0.02

0.01

0.002

0.001

.

.

.

.

.

.

.

.

.

.

.

90

0.677

0.846

1.042

1.291

1.662

1.987

2.368

2.632

3.183

3.402

100

0.677

0.845

1.042

1.290

1.660

1.984

2.364

2.626

3.174

3.390

250

0.675

0.843

1.039

1.285

1.651

1.969

2.341

2.596

3.123

3.330

500

0.675

0.842

1.038

1.283

1.648

1.965

2.334

2.586

3.107

3.310

1000

0.675

0.842

1.037

1.282

1.646

1.962

2.330

2.581

3.098

3.300

.

.

.

.

.

.

.

.

.

.

.


FINDING P-VALUE

One-Tail

0.25

0.20

0.15

0.10

0.05

0.025

0.01

0.005

0.001

.0005

Two-Tail

0.50

0.40

0.30

0.20

0.10

0.05

0.02

0.01

0.002

0.001

.

.

.

.

.

.

.

.

.

.

.

90

0.677

0.846

1.042

1.291

1.662

1.987

2.368

2.632

3.183

3.402

100

0.677

0.845

1.042

1.290

1.660

1.984

2.364

2.626

3.174

3.390

250

0.675

0.843

1.039

1.285

1.651

1.969

2.341

2.596

3.123

3.330

500

0.675

0.842

1.038

1.283

1.648

1.965

2.334

2.586

3.107

3.310

1000

0.675

0.842

1.037

1.282

1.646

1.962

2.330

2.581

3.098

3.300

.

.

.

.

.

.

.

.

.

.

.


 

BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 


BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 

P-value/2 < 0.0005

 

P-value/2 < 0.0005

 

 

 

 


 

BIKE DATA EXAMPLE FOR ONE-TAIL HYPOTHESIS TEST

 


ETHICS AROUND INFERENCE WITH DATA

TESTING HYPOTHESES WITH DATA


Hypothesis tests depend on sample data.

Therefore, hypothesis tests may be wrong!

There are two types of errors in hypothesis testing – Type I and Type II errors.


ERRORS IN HYPOTHESIS TESTS


TYPE I VS. TYPE II ERRORS

Correct

Type II 

Type I 

Correct

TRUTH

CHOICE


A Type I error is rejecting the null hypothesis when the null hypothesis was actually true.

In other words, you have a false rejection.

The probability of making a Type I error in a hypothesis test is called the significance level.

Most hypothesis tests are referred to as significance tests because they only control the Type I error.


TYPE I ERROR


A Type II error is accepting the null hypothesis when the null hypothesis was actually false.

In other words, you have falsely accepted.

The probability of NOT making a Type II error in a hypothesis test is called the power.

Difficult to control the Type II error.

Can only control for Type I or Type II at a time.


TYPE II ERROR


What if your sample of data happened to be drawn on data from only summer months with clear days?

Maybe the days would be estimated to have too many users.

This could lead to incorrect actions to be taken.



CAREFUL WITH INFERENCE


What if your sample of data happened to be drawn on data from only summer months with clear days?

Maybe the days would be estimated to have too many users.

This could lead to incorrect actions to be taken.


Hypothesis tests completely depend on the data they are built from.

Garbage in 🡪 Garbage out




CAREFUL WITH INFERENCE


What if your sample of data happened to be drawn on data from only summer months with clear days?

Maybe the days would be estimated to have too many users.

This could lead to incorrect actions to be taken.


Hypothesis tests completely depend on the data they are built from.

Garbage in 🡪 Garbage out


Hypothesis tests results reveal something, but not everything!


CAREFUL WITH INFERENCE


Hypothesis tests results reveal something, but not everything!

People sometimes forget the possibility of errors when making claims from a statistical test. 

For example:

“We know that more than 4,000 bikes per day are used on average.”



CAREFUL ABOUT JUSTIFICATION


Hypothesis tests results reveal something, but not everything!

People sometimes forget the possibility of errors when making claims from a statistical test. 

For example:

“We have strong evidence that more than 4,000 bikes per day are used on average.”



CAREFUL ABOUT JUSTIFICATION


Hypothesis tests results reveal something, but not everything!

People sometimes forget the possibility of errors when making claims from a statistical test. 

For example:

“We have strong evidence that more than 4,000 bikes per day are used on average.”


Remember the analogy of a court case 🡪 we incorrectly claim people are guilty sometimes. Careful about rushing to judgement!



CAREFUL ABOUT JUSTIFICATION


A Type I error is rejecting the null hypothesis when the null hypothesis was actually true.

A Type II error is accepting the null hypothesis when the null hypothesis was actually false.

Hypothesis tests completely depend on the data they are built from.

People sometimes forget the possibility of errors when making claims from a statistical test. 


SUMMARY


Last modified: Monday, October 17, 2022, 1:26 PM