# Chapter9.PredictionforaDichotomousVariable.pptx

Prediction for Dichotomous Variable

Chapter 9

© 2019 McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or distribution without the prior written consent of McGraw-Hill Education

Learning Objectives

Identify a limited dependent variable and its applications

Describe the linear probability model

Identify merits and shortcomings of the linear probability model

Model probit and logit models as determined by the realization of latent variable

Calculate marginal effects for logit and probit models

Execute estimation of a probit and logit model via maximum likelihood

Identify the merits and shortcomings of the probit and logit models in practice

‹#›

© 2019 McGraw-Hill Education.

Limited Dependent Variable

Limited dependent variable

A dependent variable whose range of possible values has consequential constraints

Some constraints include upper and/or lower bounds or the ability to take on only discrete values

‹#›

© 2019 McGraw-Hill Education.

IN THIS TABLE, A RANDOM VARIABLE SPENTit REPRESENTING THE AMOUNT OF MONEY SPENT BUYING PRODUCTS ONLINE BY HOUSEHOLD i IN WEEK k.

PRODUCTS ESSENTIALLY ALWAYS HAVE NON-NEGATIVE PRICES, SO THIS RANDOM VARIABLE IS CONSTRAINED TO BE AT LEAST ZERO.

IN THIS TABLE, WE SEE SEVERAL OBSERVATIONS WITH SPENTit VALUES EXACTLY EQUALS THE CONSTRAINT OF ZERO.

Limited Dependent Variables

‹#›

© 2019 McGraw-Hill Education.

Dichotomous (or binary) dependent variable

A limited dependent variable that can take on just two values, typically recorded as 0 and 1

Measure many different types of outcomes: purchase/don’t purchase, project success/project failure, employed/unemployed, approve/disapprove

Limited Dependent Variables

‹#›

© 2019 McGraw-Hill Education.

Linear probability model defined as regression analysis applied to a dichotomous dependent variable

Widely used model

The act of fitting the equation Purchase = α + βSubFee to the data by solving the moment condition is an application of a linear probability model.

The Linear Probability Model

‹#›

© 2019 McGraw-Hill Education.

Data on Subscription Fees and Purchase Decisions for SaferContent

‹#›

© 2019 McGraw-Hill Education.

The following table shows regression estimates that fit the function Purchase = α + βSubFee to the data:

Based on the estimates in the above table the determining function would be: Purchasei = 1.65 – 0.05 × SubFeei

The Linear Probability Model

‹#›

© 2019 McGraw-Hill Education.

If we assume the data-generating process for Y to be:

Y1 = α + β1X1i + … + βKXKi + Ui

Y is a dichotomous dependent variable

THEN:

Β1, …, βK represent the change in the probability of Y equaling one with a one unit increase in X1, … ,XK (respectively), holding all other Xs constant. For example, we can express β1 as: β1= Pr(Y = 1|X1 + 1, X2, …, XK) – Pr(Y = 1|X1 + 1, X2, …, XK)

The Linear Probability Model

‹#›

© 2019 McGraw-Hill Education.

Merits

Imposes no restrictions on the associated regression analysis, so all methods discussed earlier (use of dummy variables, selecting controls, instrumental variables, panel data methods) seamlessly apply.

Shortcomings

It ignores the limitation of the dependent variable

The lack of restrictions on the range of predicted values of the outcome

Merits and Shortcomings of the Linear Probability Model

‹#›

© 2019 McGraw-Hill Education.

Limit-violating prediction is a predicted value for a limited dependent variable that does not fall within that variable’s limits

For many applications, limit-violating predictions may not be a problem in practice

Could engineer the Xs in such a way as to preclude predictions of Y outside of 0-1

Merits and Shortcomings of the Linear Probability Model

‹#›

© 2019 McGraw-Hill Education.

To overcome the limitations of the linear probability model, probit and logit models are used

The choice between a linear probability model and the alternative models is not obvious

There is no universally ”right” model

Probit and Logit Models

‹#›

© 2019 McGraw-Hill Education.

The key difference the linear probability model and the alternative models (logit and probit models) is the connection between the determining function, the unobservables, and the dependent variables.

Purchase = α + βSubFee

Two Shortcomings of the linear probability model:

It is hard to believe that the determining function and unobservables always add up to exactly 0 or 1

Predictions about the effect of subscription fee (SubFee) on purchases may be unrealistic

Probit and Logit Models

‹#›

© 2019 McGraw-Hill Education.

Rather than setting the dependent variable equal to the sum of the determining function and the unobservables, let the value of the dependent variable depend on this sum but in a coarse way

The sum of the determining function and the unobservables equal a latent variable

A latent variable is a variable that cannot be observed, but information about it can be inferred from other observed variables

Probit and Logit Models

‹#›

© 2019 McGraw-Hill Education.

We define the sum of defining function ( α + βSubFeei) and the unobservables (Ui) to be a latent variable

If we call latent variable Utility, then:

Utilityi = α + βSubFeei + Ui

We assume a purchase occurs if utility is positive (> 0) and a purchase does not occur if utility is not positive (≤ 0)

We can express the purchase decision as:

Purchasei =

Probit and Logit Models

‹#›

© 2019 McGraw-Hill Education.

Examples of Dichotomous Dependent Variables Coupled with Latent Variables

‹#›

© 2019 McGraw-Hill Education.

Define the latent variable, Y*, as the sum of the determining function and the unobservables:

Y*i = α + β1X1i + … + βKXKi + Ui

Then define the dependent variable, Yi , to be 1 if the latent variable exceeds 0, and otherwise:

Yi =

Notice we do not need the determining function and unobservables to add up exactly to 0 or 1

Probit and Logit Models

‹#›

© 2019 McGraw-Hill Education.

The latent variable formation for Y also prevents unreasonable predictions about the probability of Y equaling 1

Pr(Yi = 1|X1i, …, XKi) = Pr(Y*i > 0|X1i , …, XKi)

This equation states that the probability the outcome (Y) equals 1, given the values for the Xs, is equal to the probability that the latent variable (Y*) is greater than 0, given the values for the Xs

Probit and Logit Models

‹#›

© 2019 McGraw-Hill Education.

Probit and Logit Models

Pr(Yi = 1|X1i, …, XKi) = Pr(α + β1X1i + … + βKXKi + Ui > 0|X1i , …, XKi)

Uncertainty about Y is due to uncertainty about U.

Pr(Yi = 1|X1i, …, XKi) = Pr(Ui > ‒ α ‒ β1X1i ‒ … ‒ βKXKi|X1i , …, XKi)

While the determining function is unconstrained, the probability that Y equals 1 is explicitly defined to be a probability in terms of U, so constrained to be between 0 and 1

‹#›

© 2019 McGraw-Hill Education.

Probit model a latent variable formulation for a dichotomous dependent variable that assumes a standard normal distribution for the unobservables

The probability that Y equals 1 for given values of the Xs using formula:

Pr(Yi = 1|X1i, …, XKi) = Pr(Ui > ‒ α ‒ β1X1i ‒ … ‒ βKXKi|X1i , …, XKi)

Probit and Logit Models

‹#›

© 2019 McGraw-Hill Education.

Probit and Logit Models

Probability Y Equals 1 for Given Xs, Assuming Standard Normal Distribution for U

IN THE GRAPH ϕ(U) IS THE PROBABILITY DENSITY FUNCTION (pdf) FOR THE STANDARD NORMAL DISTRIBUTION.

‹#›

© 2019 McGraw-Hill Education.

Probit and Logit Models

Since we know Ui is a standard normal random variable, simplify the expression for Pr(Yi = 1|X1i, …, XKi)

Normal distribution is symmetric around its mean (which is 0 for U)

Pr(Ui > ‒ α ‒ β1X1i ‒ … ‒ βKXKi|X1i , …, XKi) =

Pr(Ui < α + β1X1i + … + βKXKi|X1i , …, XKi)

Define ɸ(.) as the cumulative distribution function (cdf) for a standard normal random variable U, where ɸ(m) = Pr(U < m)

Pr(Yi = 1|X1i, …, XKi) = ɸ(α + β1X1i + … + βKXKi)

‹#›

© 2019 McGraw-Hill Education.

Probit and Logit Models

Probability Y Equals 1 for Given Xs, Assuming Standard Normal Distribution for U and Using cdf for U

‹#›

© 2019 McGraw-Hill Education.

Logit model is a latent variable formulation for a dichotomous dependent variable that assumes a Logistic(0,1) distribution for the unobservables

The logistic distribution generates a simple formula for the probability of Y equaling 1 for a given set of Xs

When we assume that Ui~Logistic(0,1), the probability that Y equals 1 for given values of the Xs can be expressed as:

Pr(Yi = 1|X1i, …, XKi) =

Probit and Logit Models

‹#›

© 2019 McGraw-Hill Education.

Marginal effect is the rate of change in the probability of a dichotomous dependent variable equaling 1 with one-unit increase in an independent variable (holding all other independent variables constant)

For the linear probability model, the βs in the determining function measure marginal effects

Marginal Effects

‹#›

© 2019 McGraw-Hill Education.

Marginal Effects

Consider the following general latent variable model:

Y*i = α + β1X1i + … + βKXKi + Ui

The marginal effect of Xj is:

MargEffxj = Pr(Yi = 1|X1i, …, Xji +1,…, XK) ‒ Pr(Yi = 1|X1i, …, Xji , …, XKi)

‹#›

© 2019 McGraw-Hill Education.

Marginal Effects

For Probit:

MargEffxj = ɸ(α + β1X1i + … βj(Xji +1) + … + βKXKi) ‒ ɸ(α + β1X1i + … + βjXji + … + βKXKi)

For Logit:

MargEffxj = ‒

‹#›

© 2019 McGraw-Hill Education.

Marginal Effects

Probit and logit marginal effects generally depend on the magnitude of the change in the independent variable

Probit and logit marginal effects generally differ depending on the level of X from which a change is being considered

Because the marginal effects we measure depend on the starting point of X, there is not an obvious, single number as the marginal effect of X

In practice, it is common to attempt to summarize the marginal effect of x for a probit or logit model using a single number

‹#›

© 2019 McGraw-Hill Education.

We have taken the parameters (e.g., α, β) as given, in practice we get estimates for these parameters using the data

For the linear probability model, solve for the parameters using the sample moment equations

Maximum likelihood estimation (MLE) using this approach population level parameters are estimated using values that make the observed outcomes as likely as possible for a given model

Estimation and Interpretation

‹#›

© 2019 McGraw-Hill Education.

Maximum Likelihood Estimation

Consider the following general latent variable model:

Y*i = α + β1X1i + … + βKXKi + Ui

Let Yi be 1 if the latent variable exceeds 0, and 0 if otherwise

Assuming a probit model:

Pr(Yi = 1|X1i, …, XKi) = ɸ(α + β1X1i + … + βKXKi)

To get estimates for our parameters , collect a sample of Ys and Xs of size N

‹#›

© 2019 McGraw-Hill Education.

Maximum Likelihood Estimation

The probability of this observation is:

1 ‒ ɸ(α + β1X1i + … + βKXKi)

Assuming the logit model:

Everything is the same as in the probit example, except the probabilty formulas

‹#›

© 2019 McGraw-Hill Education.

Probit Results for SaferContent Data

‹#›

© 2019 McGraw-Hill Education.

Logit Results for SaferContent Data

‹#›

© 2019 McGraw-Hill Education.

Merits and Shortcomings

Merits of probit and logit models

Help overcome shortcomings of the linear probability model

The latent variable formation places no restrictions per se on the relationship between the determining function and unobservables

Both models predict probabilities rather than the actual value (0 or 1) for the dependent variable

‹#›

© 2019 McGraw-Hill Education.

Merits and Shortcomings

Shortcomings of probit and logit models

The probabilities implied by the probit and logit models directly depend on the assumption of a normal or logistic distribution for the unobservables

Added complexity of calculating marginal effects, relative to the linear probability model

Use of instrumental variables and fixed effects

‹#›

© 2019 McGraw-Hill Education.