Econometrics Problem Set 1 11210690112 张婷The code is simplified in the hard copy1 Compute the sample mean for average log of earnings (lnY) for both men and women. Compute 90% and 95% confidence intervals for the population mean of lnY. Formulate a statistical test of whether the mean of lnY are different for men and women. Calculate the p-value of the test. Do you reject H0 at the 1% level of significance?CODE:Library(foreign);F=read.dta("C:/earningsdata_females.dta");M=read.dta("C:/earningsdata_males.dta");MeanlnF=mean(F[0,1]);lnM=mean(M[0,1]);mean(lnF);[1] 7.68859mean(lnM);[1] 7.94615One sample t-test90% t.test(F[1],conf.level=0.90) t.test(M[1],conf.level=0.90)95% t.test(F[1],conf.level=0.95) t.test(M[1],conf.level=0.95)t.test(F[1],M[1],conf.level=0.99) we get that means of lnF and lnM is differentwelch two sample t-testdata: M[1] and F[1]t = 39.7296, df = 7272.124, p-value < 2.2e-16alternative hypothesis: true difference in means is not equal to 099 percent confidence interval:0.2408862 0.2742963sample estimates:mean of x mean of y7.946150 7.688559Final answer:Female malemean 7.94615 7.6885990% confidence level [7.939378, 7.952921] [7.938081,7.954219]95% confidence level [7.680317, 7.696801] [7.678737, 7.698380]Test the means of lnF and lnM are different, run a t-testH0: ln Ym=ln YfCalculate p value=(lnYm −lnYf)/√var (lnYm )Nm +var (ln Yf )Nf =2.2e-16<0.01We get the conclusion that the means of lnF and lnM are statistically different at 1% significance level2 Compute Now compute the sample mean for earnings (mean(Y)). Do you find that mean(Y)=exp(lnY)? ExplainCODE:lnF=F[0,1];lnM=M[0,1];lnY=c(lnF,lnM); lnY represent the total ln of F and MY=exp(lnY);mean(Y)[1] 2730.177mean(lnY);[1] 7.854298Y1=mean(lnY);exp(Y1);[1] 2576.787We can get that Y is not equal to Y1Final answer:Mean(Y) is not equal to mean(exp(lnY))Y=2730.177 Y2=2576.787Y is the arithmetic mean and Y1 is the geometric mean3 Use the data set for males and estimate the regression equation lnY i = α+ βi S i +u i Specify your assumptions about the error terms u i : What assumptions are needed for the OLS estimator to be i) unbiased, ii) consistent and iii) asymptotically normally distributed?CODE:s=M[ ,2];lm_M=lm(Male$ln_y_~1+s)Summary(lm_M)Call:lm(formula=Male$ln_y_ ~ 1+s)Residuals:Min 1Q Median 3Q Max-1.62232 -0.18988 -0.03525 0.15989 1.77426Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 7.703346 0.009361 822.88 <2e-16 ***s 0.047029 0.001652 28.47 <2e-16 ***Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.2953 on 5857 degrees of freedomMultiple R-squared: 0.1215, Adjusted R-squared: 0.1214F-statistic: 810.4 on 1 and 5857 DF, p-value: < 2.2e-16Final answer:LnY=7.703346+0.0470092SAssumptions:(1)Unbiased(mean value of residual is zero(2)Consistent(mean value of residual is zero and residual is uncorrelated with Yi)(3)Asymptotically normally(mean value of residual is zero and residual is uncorrelated with Xi)4 According to your estimates, what is the conditional expectation E(lnY i |S) for men and women, respectively? Calculate the expected log earnings for a man and woman, respectively, with 12 year schooling. Does schooling account for a large fraction of the variance in earnings across individuals? Explain.CODEs=F[ ,2];Lm_F=lm(Female$ln_y_~1+s)Summary(lm_F);Call:Lm(formula== Female$ln_y_ ~ 1 +s)Residuals:Min 1Q Median 3Q Max-1.28203 -0.14014 -0.00748 0.12850 1.99665Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 7.449845 0.011325 657.81 <2e-16 ***Female$s 0.043848 0.001897 23.11 <2e-16 ***Significance codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.2646 on 3245 degrees of freedomMultiple R-squared: 0.1413, Adjusted R-squared: 0.1411F-statistic: 534 on 1 and 3245 DF, p-value: < 2.2e-16Final answer:As the answers from the code, we can get the regression function for lnYf:lnYf=7.449845+0.043848sif s=12 years CE of lnYm=7.703346+0.047029*(12-7)=7.935 (Males) (according to answer of Q3) CE of lnYf = 7.449845+0.043848*(12-7)=7.6695(Females)The power of s is not significant, so I think school years do not affect the var of earnings very much5 Give a 95% confidence interval for β1. What is your estimated expected marginal return to schooling, ∂E(lnYi|Si)/ ∂S i? What is the estimated percentage increase in income of one additional year of schooling? Do these calculations both for men and women.CODE:Alpha=0.05A=summary(lm_M)$coefficientsDf=lm_male$df.residualLeft=A[,1]-A[,2]*qt(1-alpha/2,df)Right=A[,1]+A[,2]*qt(1-alpha/2,df)Rowname=dimnames(A)[[1]]Colname=c("Estimates","Left","Right")matrix(c(A[,1], left, right), ncol=3,dimnames = list(rowname, colname))Estimates Left Right(Intercept) 7.70334617 7.68499439 7.72169794Male$s 0.04702921 0.04379056 0.05026786B=summary(lm_F)$coefficientsDf=lm_female$df.residualLeft=B[,1]-B[,2]*qt(1-alpha/2,df)Right=B[,1]+B[,2]*qt(1-alpha/2,df)Rownam=dimnames(B)[[1]]Colname=c("Estimates","Left","Right")matrix(c(B[,1], left, right), ncol=3,dimnames = list(rowname, colname))Estimates Left Right(Intercept) 7.44984480 7.42763963 7.47204996Female$s 0.04384815 0.04012787 0.04756843Final answer:When school years increase from 0-1, the income of male increases 0.61% and female increases 0.59% respectively.6 We want to test whether the returns to schooling is different for men and women. Set up an appropriate test statistic and carry out a formal statistical test. Discuss the results.CODE:New=rbind(data.frame(Male,dummy=1),data.frame(Female,dummy=0))summary(lm(New[[1]]~New[[2]]+New[[2]]*New[[34]]))Call:lm(formula = New[[1]] ~ New[[2]] + New[[2]] * New[[34]])Residuals:Min 1Q Median 3Q Max-1.62232 -0.17157 -0.02197 0.14736 1.99665Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 7.449845 0.012189 611.176 <2e-16 ***New[[2]] 0.043848 0.002042 21.471 <2e-16 ***New[[34]] 0.253501 0.015167 16.714 <2e-16 ***New[[2]]:New[[34]] 0.003181 0.002590 1.228 0.219---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.2847 on 9102 degrees of freedomMultiple R-squared: 0.2505, Adjusted R-squared: 0.2503F-statistic: 1014 on 3 and 9102 DF, p-value: < 2.2e-16Final answer:First we set dummy variables to get the following function:LnY=α+β1*s+β2dummy+β3*dummy*s+μDummy=0, we get the male dataDummy=1, we get the female dataThe null assumption H0: β3=0 , H1: β3 is not equal to 0From R, we get P value of β3 is 0.219 which is bigger than the Sign.levelwe can ‘t reject HO, so the effect of s on earnings is not different.7 if the main interest parameter is β1, i.e. the marginal returns to schooling, do you see any problems with running a univariate regression? Discuss possible sources of bias of your estimates.Answer:From Q above, we know that we can’t explain all the changes in the earning relying on only single factors.If we want to run a unvariate regression, we must also consider the effect of other factors on the factor we are analyzing and that becomes quite complex8 Now try out for yourself some of the other possible explanatory variables in your data set. Choose one of the data sets. Compare different univariate regressions and compare their R squared. Comment on the results. What does R squared say? Which of the regresses you try appear to explain most of the variation in the earnings data.CODE:For e( experience)Malelm_male2=lm(Male$ln_y_~1+Male$e)summary(lm_male2)we getResidual standard error: 0.3148 on 5857 degrees of freedom Multiple R-squared: 0.002108, Adjusted R-squared: 0.001937F-statistic: 12.37 on 1 and 5857 DF, p-value: 0.0004395Femalelm_female2=lm(Female$ln_y_~1+Female$e)summary(lm_female2)we getResidual standard error: 0.2844 on 3245 degrees of freedom Multiple R-squared: 0.007875, Adjusted R-squared: 0.007569F-statistic: 25.76 on 1 and 3245 DF, p-value: 4.091e-07For e^2(experience squared)Malelm_male3=lm(Male$ln_y_~1+Male$e_2)summary(lm_male3)we getResidual standard error: 0.315 on 5857 degrees of freedom Multiple R-squared: 0.0007927, Adjusted R-squared: 0.0006221F-statistic: 4.646 on 1 and 5857 DF, p-value: 0.03116Femalelm_female3=lm(Female$ln_y_~1+Female$e_2)summary(lm_female3)we getResidual standard error: 0.2844 on 3245 degrees of freedom Multiple R-squared: 0.007814, Adjusted R-squared: 0.007508F-statistic: 25.56 on 1 and 3245 DF, p-value: 4.534e-07For public(occupation in public service)Malelm_male4=lm(Male$ln_y_~1+Male$public)summary(lm_male4)we getResidual standard error: 0.315 on 5857 degrees of freedom Multiple R-squared: 0.0007927, Adjusted R-squared: 0.0006221 F-statistic: 4.646 on 1 and 5857 DF, p-value: 0.03116Femalelm_female4=lm(Female$ln_y_~1+Female$public)summary(lm_female4)we getResidual standard error: 0.2844 on 3245 degrees of freedom Multiple R-squared: 0.007814, Adjusted R-squared: 0.007508F-statistic: 25.56 on 1 and 3245 DF, p-value: 4.534e-07。