第2章多元回归分析
Consider the case wherek 2, i.e.
yˆ bˆ0 bˆ1x1 bˆ2 x2 , then
bˆ1 rˆi1yi
rˆi12 , w hererˆi1 are
the residuals from the estimated
regression xˆ1 ˆ0 ˆ2 xˆ2
yi y2 is the totalsum of squares(SST) yˆi y2 is the explained sum of squares(SSE) uˆi2 is the residualsum of squares(SSR)
Then SST SSE SSR
13
Goodness-of-Fit (continued)
What determines the person to commit crime? (the dependent variable is the number of times the man was arrested during 1986, narr86)
第二章 多元回归分析:估计
y = b0 + b1x1 + b2x2 + . . . bkxk + u
1
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u
1. Estimation
2
Parallels with Simple Regression
This means only the part of xi1 that is uncorrelated with xi2 are being related to yi so we’re estimating the effect of x1 on y after x2 has been “partialled out”
line. The population regression line is
E( y | x) b0 b1x1 b2 x2 L bk xk
5
Interpreting Multiple Regression
yˆ bˆ0 bˆ1x1 bˆ2 x2 ... bˆk xk , so yˆ bˆ1x1 bˆ2x2 ... bˆk xk ,
How do we think about how well our sample regression line fits our sample data?
Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression
The STATA command
Use [path]wage1.dta (insheet using [path]wage1.raw/wage1.txt) Reg wage educ exper tenure Reg lwage educ exper tenure
7
A “Partialling Out” Interpretation
yˆ bˆ0 bˆ1x1 bˆ2x2 L bˆk xk
The above estimated equation is called the OLS regression line or the sample regression function (SRF)
the above equation is the estimated equation, is
Now, we first regress educ on exper and tenure to patial out the exper and tenure’s effects. Then we regress wage on the residuals of educ on exper and tenure. Whether we get the same result.?
The estimated equations without tenure
wage=3.3910.644educ+0.070exper log(wage)=0.2170.098educ+0.0103exper
wage=0.9050.541educ log(wage)=0.5840.083educ
so holding x2,...,xk fixed implies that
yˆ bˆ1x1, that is each b has
a ceteris paribus interpretation
6
An Example (Wooldridge, p76)
The determination of wage (dollars per hour), wage:
educ=13.575-0.0738exper+0.048tenure wage=5.896+0.599resid log(wage)=1.623+0.092resid
We can see that the coefficient of resid is the same of the coefficien of the variable educ in the first estimated equation. And the same to log(wage) in the second equation.
wage=b0b1educ+b2exper+b3tenure+u log(wage)=b0b1educ+b2exper+b3tenure+u
The estimated equation as below:
wage=2.8730.599educ+0.022exper+0.169tenure log(wage)=0.2840.092educ+0.0041exper+0.022tenure
R2 = SSE/SST = 1 – SSR/SST
14
Goodness-of-Fit (continued)
We can also think of R2 as being equal to
the squaredcorrelation coefficient between
the actual yi and the values yˆi
8
“Partialling Out” continued
Previous equation implies that regressing y on x1 and x2 gives same effect of x1 as regressing y on residuals from a regression of x1 on x2
assumption, so now assume that E(u|x1,x2, …,xk) = 0 Still minimizing the sum of squared residuals,
so have k+1 first order conditions
3
Obtaining OLS Estimates
12
Goodness-of-Fit
We can thinkof each observation as being made up of an explained part, and an unexplained part, yi yˆi uˆi Wethen define the following :
10
Simple vs Multiple Reg Estimate
Compare thesimple regression ~y b~0 b~1x1 with themultiple regression yˆ bˆ0 bˆ1x1 bˆ2x2 Generally, b~1 bˆ1 unless : bˆ2 0 (i.e. no partial effectof x2 ) OR
fIrnomthethgeefniersrtalocrdaeser cwointhdiktioinnd, ewpeencdanengtevtakriab1les,
lwtiynˆhneeerasyebreiˆf0eeoqkrbeuˆeb,0aˆs1mttxiio1mibnnˆ1aisxLmtiie1niszkbeLˆb0ˆtk,h1xbeˆkb1uˆs,nkKuxkmink,oboˆwkf ns0inqsubtˆha0re,ebedˆ1q,ruKeistii,dobˆunka:ls:
x y inn1
ii11 i1 i
yi
bˆ0bˆ0
bˆ1bxˆi11xi1 L L
bˆkbxˆkikxik2
0
n xi2 yi bˆ0 bˆ1xi1 L bˆk xik 0
i 1
M
n 0.
i 1
4
Obtaining OLS Estimates, cont.
9
The wage determinations
The estimated equation as below:
wage=2.8730.599educ+0.022exper+0.169tenure log(wage)=0.2840.092educ+0.0041exper+0.022tenure
not the really equation. The really equation is
population regression line which we don’t know.