Chapter4.TheScientificMethodTheGoldStandardforEstablishingCausality.pptx
The Scientific Method: The Gold Standard for Establishing Causality
Chapter 4
© 2019 McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or distribution without the prior written consent of McGraw-Hill Education
Learning Objectives
Recall the elements of the scientific method.
Explain how experiments can be used to measure treatment effects.
Execute a hypothesis test concerning a treatment effect using experimental data.
Construct a confidence interval for a treatment effect using experimental data.
Differentiate experimental from nonexperimental data.
Explain why using nonexperimental data presents challenges when trying to measure treatment effects.
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method
The scientific method is a process designed to generate knowledge through the collection and analysis of experimental data.
A classic application is in medicine, where researchers run clinical trial to learn the impact of a new drug on patient’s health outcomes.
Scientific method effectively establishes causality.
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method
The scientific method consists of the following six parts:
Ask a question
Do background research
Formulate a hypothesis
Conduct an experiment to test the hypothesis
Analyze the data from the experiment and draw conclusions
Communicate the findings
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method Process
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method
Step 1: Ask a question. Deciding which question to ask is often motivated by interest in a particular outcome
Step 2: Do background research involves learning more about the issue surrounding the posed question. The purpose is to find information that will help identify a possible answer to the question
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method
Step 3: Formulate a hypothesis involves hypothesizing a possible answer to the question.
Hypothesis
A proposed idea based on limited evidence that leads to further investigation.
Typically grounded in the background research and involves a positive statement about causality
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method
Step 4: Run an experiment
Experiment
A test within a controlled environment designed to examine the validity of a hypothesis
Experimental data
Data that result from an experiment
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method
For hypothesis about causality, the experiment generally involves allocating a binary treatment, or treatment levels, across two or more groups
Treatment
Something that is administered to members of at least one participating group
Treatment effect
The change in the outcome resulting from variation in the treatment
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method
Step 5: Analyze the data and draw conclusions.
Compare the measured outcomes between the group receiving the treatment and those who didn’t
Build a confidence interval for the treatment effect
Is there a causal relationship and how big is it?
Step 6: Communicate the findings. Explain the methodology and findings.
Main conclusion, a confidence level, description of the experiment, reasoning leading to the conclusion, and summary of the statistics used
‹#›
© 2019 McGraw-Hill Education.
Summaries of Scientific Method for Medicine and Business Examples
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method and Causal Inference
A Simple Treatment Framework
The basic goal when running an experiment is to measure a treatment effect
Potential outcomes framework:
Consider a group of subjects who will participate in an experiment. Index each with the letter i, so i = 1 refers to the first subject, i =2 refers to the second subject, etc.
Outcomeit is the outcome realized by the subject i if it receives the treatment t
OutcomeiNT is the outcome realized by that same person if it does not receive the treatment (NT), then:
Treatment Effecti = Outcomeit OutcomeiNT
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method and Causal Inference
The problem in trying to measure the treatment effect is that the subjects cannot be both untreated and treated at the same time
Hence, a single treatment status is chosen at the time of the experiment for any given subject
Two subjects are needed to observe the outcome of subject with treatment and the outcome of subject without treatment
The treatment effect on one subject may be different from the treatment effect on another subject.
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method and Causal Inference
Since we are unable to measure treatment effects for individual subjects, we attempt to estimate the mean treatment effect for the entire population of subjects who may receive the treatment
Average treatment effect (ATE)
The average difference in the treated and untreated outcome across all subjects in a population
The expected value of the treatment effect for a randomly drawn subject from the population written as E[Treatment Effecti]:
ATE = E[Treatment Effecti] = E[OutcomeiT OutcomeiNT]
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method and Causal Inference
From Experiments to Treatment Effects
Treatedi: i = 1 if the subject receives the treatment and i = 0 if the subject does not receive the treatment
Outcomei: This variable equals the outcome actually experienced by the subject i after the experiment.
Mean outcome for the treated group:( = 1)
Mean outcome for the untreated group:( = 0)
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method and Causal Inference
When does the difference in the mean outcomes across the treated and untreated groups yield an unbiased estimate of the ATE?
Participants are a random sample of the population
Assignment into the treated group is random
‹#›
© 2019 McGraw-Hill Education.
The Scientific Method and Causal Inference
Why the mean outcome for the treated might differ from the mean outcome for the untreated?
Non-zero average treatment effect where the treated group responds to the treatment is called the effect of the treatment on the treated (ETT)
If ETT exists, even if both groups have the same mean outcome when not given the treatment, a difference emerges once the group receives treatment
Selection bias the mean outcome for the treated group would differ from the mean outcome for the untreated group in the case where neither receives the treatment
‹#›
© 2019 McGraw-Hill Education.
Data Analysis Using the Scientific Method
Hypothesis Testing for the Treatment Effect
For a given experiment with N participants and a single, binary treatment:
The set of participants is a random sample from the population
The sample size N is large, so that there are at least 30 participants in the treated and untreated groups
Assignment of the treatment is random
The average treatment effect is zero (ATE = 0)
‹#›
© 2019 McGraw-Hill Education.
Data Analysis Using the Scientific Method
The difference in the average outcome for the treated and untreated groups is distributed as:
= 1 – = 0 ~ N (0 , + )
This difference will fall within 1.65 (1.96, 2.58) standard deviations of 0 approximately 90% (95%, 99%) of the time
‹#›
© 2019 McGraw-Hill Education.
Data Analysis Using the Scientific Method
Using t-stats: If the absolute value of the t-stat is greater than 1.65 (1.96, 2.58), reject the deduced distribution for the difference in sample means. Otherwise, fail to reject. The objective degree of support for this inductive argument is 90% (95%, 99%)
Using p-values: If the p-value of the t-stat is less than 0.10 (0.05, 0.01), reject the deduced distribution for the difference in sample means. Otherwise, fail to reject. The objective degree of support for this inductive argument is 90% (95%, 99%)
‹#›
© 2019 McGraw-Hill Education.
Data Analysis Using the Scientific Method
Transposition: If inductive reasoning leads to a rejection of the distribution for the difference in sample means, reject at least one of the assumptions leading to that distribution. If the sample is large, and there is confidence in the random sample and random treating assignment, reject the null hypothesis
‹#›
© 2019 McGraw-Hill Education.
P-Value for T-Stat of 3.466
‹#›
© 2019 McGraw-Hill Education.
95% Confidence Interval When ATE = 0
‹#›
© 2019 McGraw-Hill Education.
Confidence Interval for the Treatment Effect
Deductive reasoning:
IF…
The set of participants are a random sample from the population
The sample size N is large, so that there are at least 30 participants in the treated and untreated groups
Assignment of the treatment is random
Then…
The interval consisting of the difference between the average outcome for the treated and untreated, plus or minus 1.65 (1.96, 2.58) standard deviations for this difference, will contain the average treatment effect approximately 90% (95%, 99%) of the time
‹#›
© 2019 McGraw-Hill Education.
Confidence Interval for the Treatment Effect
Inductive reasoning:
We observe the difference between the average outcome for the treated and untreated
= 1 – = 0, the sample standard deviations for the treated (S1) and untreated (S0), and the number of subjects receiving the treatment (N1) and not receiving the treatment (N0). We conclude the ATE is contained in the interval
= 1 – = 0 1.65 ( + )
‹#›
© 2019 McGraw-Hill Education.
Confidence Interval for the Treatment Effect
The objective degree of support for this inductive argument is 90%. If we use the intervals
= 1 – = 0 1.96 ( + )
= 1 – = 0 2.58 ( + )
The objective degree of support becomes 95% and 99%
‹#›
© 2019 McGraw-Hill Education.
Experimental Data vs Nonexperimental Data
Experimental data are well-suited toward measuring causal effects of treatments
Most data that are available to businesses are nonexperimental
Nonexperimental data is data that were not produced during an experiment
No longer able to control how the treatment is administered
Treatment is very seldom randomly assigned, which can interfere with estimating the treatment effect
‹#›
© 2019 McGraw-Hill Education.
Examples of Nonexperimental Business Treatments and Outcomes
‹#›
© 2019 McGraw-Hill Education.
IF THESE WERE EXPERIMENTAL DATA TO BE USED TO MEASURE A TREATMENT EFFECT, THE PRICE WOULD HAVE VARIED RANDOMLY ACROSS THE REGIONS AND TIME.
Panel Data on Price and Sales
‹#›
© 2019 McGraw-Hill Education.
Experimental Data vs Nonexperimental Data
Consequences of Using Nonexperimental Data to Estimate Treatment Effects
High likelihood that the treatment is not randomly assigned
If treatment assignment is nonrandom, then we risk the possibility that ETT ≠ ATE, Selection Bias ≠ 0, or both
Comparing the means between the treated and the untreated groups is no longer a proper estimator for the ATE
‹#›
© 2019 McGraw-Hill Education.