R语言房价分析
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4525.753 24474.054 0.185 0.8537
Taxes 38.135 6.815 5.596 2.16e-07 ***
F. Transform any variables as necessary. Explain your decisions. If you transformed any
of the variables, make additional visualizations of the relationship between the new
Baths 0.5948543 0.49222235 1.0000000 0.25148095 0.5582533 0.6582247
New 0.3808741 0.04931556 0.2514810 1.00000000 0.4732608 0.3843277
Price 0.8419802 0.39395702 0.5582533 0.47326080 1.0000000 0.8337848
mydata=na.omit(data)
B
plot(mydata[,-1])
从上面的图中我们可以发现尺寸和税收可能会影响房价
C. Using -ggplot- suite
colnames(mydata)
[1] "case" "Taxes" "Beds" "Baths" "New" "Price" "Size"
library(ggplot2)
ggplot(mydata,aes(x =Size,y =Price)) +
geom_point(aes( )) +
geom_smooth()
ggplot(mydata,aes(x =Taxes,y =Price)) +
geom_point(aes( )) +
geom_smooth()
your answer by showing any relevant statistics or graphs
ggplot(mydata,aes(x =(Size) ,y =log(Price))) +
geom_point(aes( )) +
geom_smooth()
ggplot(mydata,aes(x =(Taxes),y =log(Price))) +
D. Do your visualizations show a positive, negative,
or no relationship?
从图形中我们可以发现税收和面积对房价有正向的关系
E. Is there evidence that you may need to transform any of your variables? Why? Motivate
data=read.table("data.txt",header=T)
head(data)
case Taxes Beds Baths New Price Size
1 1 3104 4 2 0 279900 2048
2 2 1173 2 1 0 146500 912
3 3 3076 4 2 0 237700 1654
Size 0.8187958 0.54478311 0.6582247 0.38432773 0.8337848 1.0000000
从相关系数矩阵中我们可以发现,哪些变量是高度相关的
H. Fit a multiple regression to the data. Notice that your coefficients are really large, as
summary(lm(Price~.,data=data[,-1]))
Call:
lm(formula = Price ~ ., data = data[, -1])
Residuals:
Min 1Q Median 3Q Max
-182112 -24377 -2046 21306 161870
Coefficients:
variable and the dependent variable
ggplot(mydata,aes(x =(Taxes^2),y =log(Price))) +
geom_point(aes( )) +
geom_smooth()
G. Estimate the correlation between any continuous independent variables and the dependent variable.
geom_point(aes( )) +
geom_smooth()
attach(mydata)
cor(Taxes,Price)
[1] 0.8419802
cor( (Taxes)^2, (Price))
[1] 0.856277
从散点图的形状来看,可以发现税收和价格是非线性关系,因此可以对税收变量进行平方化
What do they mean?
cor(data[,-1])
Taxes Beds Baths New Price Size
Taxes 1.0000000 0.47392873 0.5948543 0.38087410 0.8419802 0.8187958
Beds 0.4739287 1源自00000000 0.4922224 0.04931556 0.3939570 0.5447831
the dependent variable is measured in dollars. The norm is to rescale such dependent
variables (divide price by 1000), so that the coefficients are smaller.
4 4 1608 3 2 0 200000 2068
5 5 1454 3 3 0 159900 1477
6 6 2997 3 2 1 499900 3153
A. Please open the dataset, omit any missing values, and name it mydata.