实验:建立ARIMA模型(综合性实验)实验题目:某城市连续14年的月度婴儿出生率数据如下表所示:26.663 23.598 26.931 24.740 25.806 24.364 24.477 23.90123.175 23.227 21.672 21.870 21.439 21.089 23.709 21.66921.752 20.761 23.479 23.824 23.105 23.110 21.759 22.07321.937 20.035 23.590 21.672 22.222 22.123 23.950 23.50422.238 23.142 21.059 21.573 21.548 20.000 22.424 20.61521.761 22.874 24.104 23.748 23.262 22.907 21.519 22.02522.604 20.894 24.677 23.673 25.320 23.583 24.671 24.45424.122 24.252 22.084 22.991 23.287 23.049 25.076 24.03724.430 24.667 26.451 25.618 25.014 25.110 22.964 23.98123.798 22.270 24.775 22.646 23.988 24.737 26.276 25.81625.210 25.199 23.162 24.707 24.364 22.644 25.565 24.06225.431 24.635 27.009 26.606 26.268 26.462 25.246 25.18024.657 23.304 26.982 26.199 27.210 26.122 26.706 26.87826.152 26.379 24.712 25.688 24.990 24.239 26.721 23.47524.767 26.219 28.361 28.599 27.914 27.784 25.693 26.88126.217 24.218 27.914 26.975 28.527 27.139 28.982 28.16928.056 29.136 26.291 26.987 26.589 24.848 27.543 26.89628.878 27.390 28.065 28.141 29.048 28.484 26.634 27.73527.132 24.924 28.963 26.589 27.931 28.009 29.229 28.75928.405 27.945 25.912 26.619 26.076 25.286 27.660 25.95126.398 25.565 28.865 30.000 29.261 29.012 26.992 27.897(1)选择适当模型拟和该序列的发展(2)使用拟合模型预测下一年度该城市月度婴儿出生率实验内容:给出实际问题的非平稳时间序列,要求学生利用R统计软件,对该序列进行分析,通过平稳性检验、差分运算、白噪声检验、拟合ARMA模型,建立ARIMA模型,在此基础上进行预测。
实验要求:处理数据,掌握非平稳时间序列的ARIMA建模方法,并根据具体的实验题目要求完成实验报告,并及时上传到给定的FTP和课程网站。
实验步骤:第一步:编程建立R数据集;第二步:调用plot.ts程序对数据绘制时序图。
第三步:从时序图中利用平稳时间序列的定义判断是否平稳?第四步:若不满足平稳性,则可利用差分运算是否能使序列平稳?重复第三步步骤第五步:根据Box.test纯随机检验结果,利用LB统计量和白噪声特性检验最后处理的时间序列是否为纯随机序列?第六步:在序列判断为平稳非白噪声序列后,求出该观察值序列的样本自相关系数(ACF)和样本偏自相关系数(PACF)的值,选择阶数适当的ARIMA(p,d,q)模型进行拟合,并估计模型中未知参数的值。
第七步:检验模型的有效性。
如果拟合模型通不过检验,转向步骤6,重新选择模型再拟合。
第八步:模型优化。
如果拟合模型通过检验,仍然转向步骤6,充分考虑各种可能建立多个拟合模型,从所有通过检验的拟合模型中选择最优模型。
第九步:利用最优拟合模型,预测下一年度该城市月度婴儿出生率。
ex5.2=ts(scan("ex5.2.txt"), frequency=4)Read 168 itemsplot.ts(ex5.2)从图中看出序列一开始有下降趋势,后面有明显上升趋势,所以序列不平稳。
d12ex5.2 = diff(ex5.2,lag=12)acf(d12ex5.2,48)plot(d12ex5.2)从上面的自相关图中可以看出改做滞后12期差分后为平稳。
Box.test(d12ex5.2, lag=17, type="Ljung-Box")Box-Ljung testdata: d12ex5.2X-squared = 147.9254, df = 17, p-value < 2.2e-16P值小于0.05,可以认为是非白噪声序列。
par(mfrow=c(2,1)); acf(d12ex5.2, 48); pacf(d12ex5.2, 48)ARIMA(0,0,3)、ARIMA(0,0,4)、ARIMA(1,0,3)、ARIMA(1,0,4)四个模型分别进行拟合检验(rec.ols = arima(d12ex5.2,order=c(0,0,3)))Call:arima(x = d12ex5.2, order = c(0, 0, 3))Coefficients:ma1 ma2 ma3 intercept0.7949 0.4480 0.1156 0.2150s.e. 0.0839 0.0832 0.0885 0.1744sigma^2 estimated as 0.8621: log likelihood = -210.12, aic = 430.25rec.pr = predict(rec.ols, n.ahead=5)U = rec.pr$pred + 1.96*rec.pr$seL = rec.pr$pred - 1.96*rec.pr$seminx = min(d12ex5.2,L)maxx = max(d12ex5.2,U)ts.plot(d12ex5.2, rec.pr$pred, ylim=c(minx,maxx))lines(rec.pr$pred, col="red", type="o") lines(U, col="blue", lty="dashed") lines(L, col="blue", lty="dashed")qqnorm(rec.ols$resid)qqline(rec.ols$resid)shapiro.test(rec.ols$resid)Shapiro-Wilk normality testdata: rec.ols$residW = 0.9777, p-value = 0.0125用shapiro检验,发现p值为0.0125,在5%的显著性水平下显著,所以为ARIMA(0,0,3)模型不合理。
(rec.ols = arima(d12ex5.2,order=c(0,0,4)))Call:arima(x = d12ex5.2, order = c(0, 0, 4))Coefficients:ma1 ma2 ma3 ma4 intercept0.8306 0.4943 0.2254 0.2070 0.2041s.e. 0.0902 0.1158 0.0925 0.0889 0.1994sigma^2 estimated as 0.828: log likelihood = -207.07, aic = 426.15rec.pr = predict(rec.ols, n.ahead=5)U = rec.pr$pred + 1.96*rec.pr$seL = rec.pr$pred - 1.96*rec.pr$seminx = min(d12ex5.2,L)maxx = max(d12ex5.2,U)ts.plot(d12ex5.2, rec.pr$pred, ylim=c(minx,maxx))lines(rec.pr$pred, col="red", type="o")lines(U, col="blue", lty="dashed")lines(L, col="blue", lty="dashed")qqnorm(rec.ols$resid)qqline(rec.ols$resid)shapiro.test(rec.ols$resid)Shapiro-Wilk normality testdata: rec.ols$residW = 0.9689, p-value = 0.001363用shapiro检验,发现p值为0.001363,在5%的显著性水平下显著,所以为ARIMA(0,0,4)模型不合理。
(rec.ols = arima(d12ex5.2,order=c(1,0,3)))Call:arima(x = d12ex5.2, order = c(1, 0, 3))Coefficients:ar1 ma1 ma2 ma3 intercept0.9288 -0.1369 -0.2156 -0.1586 0.0240s.e. 0.0669 0.1065 0.0921 0.0879 0.4984sigma^2 estimated as 0.7986: log likelihood = -204.35, aic = 420.7rec.pr = predict(rec.ols, n.ahead=5)U = rec.pr$pred + 1.96*rec.pr$seL = rec.pr$pred - 1.96*rec.pr$seminx = min(d12ex5.2,L)maxx = max(d12ex5.2,U)ts.plot(d12ex5.2, rec.pr$pred, ylim=c(minx,maxx)) lines(rec.pr$pred, col="red", type="o")lines(U, col="blue", lty="dashed")lines(L, col="blue", lty="dashed")qqnorm(rec.ols$resid)qqline(rec.ols$resid)shapiro.test(rec.ols$resid)Shapiro-Wilk normality testdata: rec.ols$residW = 0.9783, p-value = 0.01454(rec.ols = arima(d12ex5.2,order=c(1,0,4)))Call:arima(x = d12ex5.2, order = c(1, 0, 4))Coefficients:ar1 ma1 ma2 ma3 ma4 intercept0.9084 -0.1288 -0.2457 -0.1511 0.1309 0.0493 s.e. 0.0725 0.1078 0.1021 0.0778 0.0960 0.4684sigma^2 estimated as 0.7891: log likelihood = -203.47, aic = 420.94rec.pr = predict(rec.ols, n.ahead=5)U = rec.pr$pred + 1.96*rec.pr$seL = rec.pr$pred - 1.96*rec.pr$seminx = min(d12ex5.2,L)maxx = max(d12ex5.2,U)ts.plot(d12ex5.2, rec.pr$pred, ylim=c(minx,maxx))lines(rec.pr$pred, col="red", type="o")lines(U, col="blue", lty="dashed")lines(L, col="blue", lty="dashed")qqnorm(rec.ols$resid)qqline(rec.ols$resid)shapiro.test(rec.ols$resid)Shapiro-Wilk normality testdata: rec.ols$residW = 0.978, p-value = 0.01341。