R语言实战笔记

最新推荐文章于 2024-11-24 17:53:50 发布

原创最新推荐文章于 2024-11-24 17:53:50 发布 · 3.8k 阅读

5 ·

本内容遵循CC 4.0 BY-SA版权协议

R语言基础专栏收录该内容

10 篇文章

订阅专栏

这篇博客详细介绍了R语言中的各种图形绘制方法，包括条形图、棘状图、直方图、核密度图、箱线图、点图、气泡图、相关图以及OLS回归等。同时，讲解了如何进行多元线性回归分析，包括变量相关性检验、简单线性回归、回归诊断、向后回归和全子集回归。此外，还探讨了广义线性模型中的Logistic回归。内容深入浅出，结合实例展示，适合R语言初学者和进阶者学习。

par(no.readonly=TRUE) 修改当前图形参数，会话结束前一直有效。

符号与线条：pch 绘制点时使用的符号；cex 符号大小；lty 线条类型；ldy 线条宽度

legend 图例标签

基本图形

条形图

barplot(height, width = 1, space = NULL,

names.arg = NULL, legend.text = NULL, beside = FALSE,

horiz = FALSE, density = NULL, angle = 45,

col = NULL, border = par("fg"),

main = NULL, sub = NULL, xlab = NULL, ylab = NULL,

xlim = NULL, ylim = NULL, xpd = TRUE, log = "",

axes = TRUE, axisnames = TRUE,

cex.axis = par("cex.axis"), cex.names = par("cex.axis"),

inside = TRUE, plot = TRUE, axis.lty = 0, offset = 0,

add = FALSE, args.legend = NULL, ...)

beside为F，则为堆砌条形图，为T，则为分组条形图

棘状图

library(vcd)

attach(Arthritis)

counts<-table(Treatment,Improved)

spine(counts,main="Spinogram")

直方图

hist(mtcars$mpg,freq=F,breaks=12,col="red",xlab="Miles perGallon",main="Histogram,rug plot,density curve")

rug(jitter(mtcars$mpg))

lines(density(mtcars$mpg),col="blue",lwd=2)#density为核密度图

freq

logical; if TRUE, the histogramgraphic is a representation of frequencies,thecounts component of the result;if FALSE, probability densities, componentdensity,are plotted (so that the histogram has a total area of one).Defaults to TRUE if and onlyif breaks are equidistant(and probability is notspecified).

核密度图

par(lwd=2) #双倍线条宽度

library(sm)

attach(mtcars)

cyl.f<-factor(cyl,levels=c(4,6,8),labels=c("4cylinders","6cylinders","8cylinders"))#创建分组因子

sm.density.compare(mpg,cyl,xlab="miles per gallon")#绘制密度图

title(main="MPG Distribution by Car Cylinders")

colfill<-c(2:(1+length(levels(cyl.f)))) #通过鼠标点击添加图例

legend(locator(1),levels(cyl.f),fill=colfill)

detach(mtcars)

箱线图

boxplot(mpg~cyl,data=mtcars,main="Car Mileage Data",xlab="Numberof Cylinders",ylab="Miles perGallon")

mtcars$cyl.f<-factor(mtcars$cyl,levels=c(4,6,8),labels=c("4","6","8"))

mtcars$am.f<-factor(mtcars$am,levels=c(0,1),labels=c("auto","standard"))

boxplot(mpg~am.f*cyl.f,data=mtcars,varwidth=T,col=c("gold","green"),main="MPGDistribution by Auto Type",xlab="Auto Type")

点图

dotchart(x,labels=)

x<-mtcars[order(mtcars$mpg),]

x$cyl<-factor(x$cyl)

x$color[x$cyl==4]<-"red"

x$color[x$cyl==6]<-"blue"

x$color[x$cyl==8]<-"darkgreen"

dotchart(x$mpg,labels=row.names(x),cex=.7,groups=x$cyl,gcolor="black",color=x$color,pch=19,main="GasMileage for Car Models",xlab="Miles per Gallon")

scatterplot(weight~height,data=women,spread=F,lty.smooth=2,pch=19,main="WomenAge 30-39",xlab="Height",ylab="Weight")

scatterplotMatrix(x, var.labels=colnames(x),

diagonal=c("density", "boxplot", "histogram", "oned", "qqplot","none"),

adjust=1,nclass,

plot.points=TRUE, smoother=loessLine, smoother.args=list(), smooth,span,

spread =!by.groups, reg.line=lm,

transform=FALSE, family=c("bcPower", "yjPower"),

ellipse=FALSE, levels=c(.5, .95), robust=TRUE,

groups=NULL,by.groups=FALSE,

use=c("complete.obs", "pairwise.complete.obs"),

labels,id.method="mahal", id.n=0, id.cex=1, id.col=palette()[1],

col=if(n.groups == 1) palette()[3:1] else rep(palette(),length=n.groups),

pch=1:n.groups, lwd=1, lty=1,

cex=par("cex"), cex.axis=par("cex.axis"), cex.labels=NULL,

cex.main=par("cex.main"),

legend.plot=length(levels(groups)) > 1,legend.pos=NULL, row1attop=TRUE, ...)

气泡图

symbols（x,y,circle=radius）

马赛克图

Library(vcd)

mosaic(Titanic,shade=T,legend=T)

mosaic(~Class+Sex+Age+Survived,data=Titanic,shade=T,legend=T) #the two are same

缺失值图

library(mice)

md.pattern(sleep)

   BodyWgt BrainWgt Pred Exp Danger Sleep Span Gest Dream NonD

42       1        1    1   1      1     1    1    1     1    1  0

 2       1        1    1   1      1     1    0    1     1    1  1

 3       1        1    1   1      1     1    1    0     1    1  1

 9       1        1    1   1      1     1    1    1     0    0  2

 2       1        1    1   1      1     0    1    1     1    0  2

 1       1        1    1   1      1     1    0    0     1    1  2

 2       1        1    1   1      1     0    1    1     0    0  3

 1       1        1    1   1      1     1    0    1     0    0  3

         0        0    0   0      0     4    4    4    12   14 38

library(VIM)

aggr(sleep,prop=F,numbers=T)

描述性统计

aggregate(mtcars[vars],by=list(am=mtcars$am),mean)

am mpg hp wt

1 0 17.14737 160.2632 3.768895

2 1 24.39231 126.8462 2.411000

aggregate()仅允许在每次调用中使用平均数、标准差这样的单返回值函数。

列联表：

library(vcd) #use data Arthritis

library(gmodels)

CrossTable(Arthritis$Treatment,Arthritis$Improved)

Cell Contents

|-------------------------|

| N |

| Chi-square contribution |

| N / Row Total |

| N / Col Total |

| N / Table Total |

|-------------------------|

Total Observations in Table: 84

|Arthritis$Improved

--------------------|-----------|-----------|-----------|-----------|

Placebo| 29| 7| 7| 43 |

| 2.616| 0.004| 3.752| |

| 0.674| 0.163| 0.163| 0.512 |

| 0.690| 0.500| 0.250| |

| 0.345| 0.083| 0.083| |

--------------------|-----------|-----------|-----------|-----------|

Treated| 13| 7| 21| 41 |

| 2.744| 0.004 | 3.935| |

| 0.317| 0.171| 0.512| 0.488 |

| 0.310| 0.500| 0.750| |

| 0.155| 0.083| 0.250| |

--------------------|-----------|-----------|-----------|-----------|

Column Total| 42| 14| 28| 84 |

| 0.500| 0.167| 0.333| |

--------------------|-----------|-----------|-----------|-----------|

OLS回归

lm(formula, data, subset, weights, na.action,

method = "qr", model =TRUE, x = FALSE, y = FALSE, qr = TRUE,

singular.ok = TRUE,contrasts = NULL, offset, ...)

多元线性回归

检验变量相关性

states<-as.data.frame(state.x77[,c("Murder","Population","Illiteracy","Income","Frost")])

cor(states)

Murder PopulationIlliteracy Income Frost

Murder 1.0000000 0.3436428 0.7029752-0.2300776 -0.5388834

Population 0.3436428 1.0000000 0.1076224 0.2082276-0.3321525

Illiteracy 0.7029752 0.1076224 1.0000000 -0.4370752 -0.6719470

Income -0.2300776 0.2082276 -0.4370752 1.0000000 0.2262822

Frost -0.5388834 -0.3321525 -0.6719470 0.2262822 1.0000000

library(car)

scatterplotMatrix(states,spread=F,lty.smooth=2,main="Scatter PlotMatrix")

简单线性回归

lm(formular,data=mydata)

For example：

fit<- lm(weight~height+I(height^2),data=women)

coef(fit) #output the result ofcoefficient

(Intercept) height I(height^2)

261.87818358 -7.34831933 0.08306399

回归诊断

confint(fit)

2.5% 97.5 %

(Intercept) 206.97913605 316.77723111

height -9.04276525 -5.65387341

I(height^2) 0.07003547 0.09609252

95%的置信区间从2.5%~97.5%。如果置信区间包含0，则该变量对方程无影响。

par(mfrow=c(2,2))

plot(fit)

向后回归

library(MASS)

fit1<-lm(Murder~Population+Illiteracy+Income+Frost,data=states)

stepAIC(fit1,direction="backward")

Start: AIC=97.75

Murder ~ Population + Illiteracy + Income + Frost

Df Sum of Sq RSS AIC

-Frost 1 0.021 289.19 95.753

-Income 1 0.057 289.22 95.759

289.17 97.749

- Population 1 39.238328.41 102.111

- Illiteracy 1 144.264 433.43 115.986

Step: AIC=95.75

Murder ~ Population + Illiteracy + Income

Df Sum of Sq RSS AIC

-Income 1 0.057 289.25 93.763

289.19 95.753

- Population 1 43.658332.85 100.783

- Illiteracy 1 236.196 525.38 123.605

Step: AIC=93.76

Murder ~ Population + Illiteracy

Df Sum of Sq RSS AIC

289.25 93.763

- Population 1 48.517337.76 99.516

- Illiteracy 1 299.646 588.89 127.311

Call:

lm(formula = Murder ~ Population + Illiteracy, data =states)

Coefficients:

(Intercept) Population Illiteracy

1.6515497 0.0002242 4.0807366

全子集回归

regsubsets(x=, data=, weights=NULL, nbest=1,nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE,method=c("exhaustive", "backward", "forward", "seqrep"),really.big=FALSE,...)

x	design matrix or model formula for full model,or biglm object
data	Optional data frame
y	response vector
weights	weight vector
nbest	number of subsets of each size to record
nvmax	maximum size of subsets to examine
force.in	index to columns of design matrix that should be in allmodels
force.out	index to columns of design matrix that should be in nomodels
intercept	Add an intercept?
method	Use exhaustive search, forward selection, backward selection orsequential replacement to search.
really.big	Must be TRUE to perform exhaustive search on more than 50variables.
object	regsubsets object
all.best	Show all the best subsets or just one of each size
matrix	Show a matrix of the variables in each model or just summarystatistics
matrix.logical	With matrix=TRUE, the matrix islogical TRUE/FALSE orstring "*"/" "
df	Specify a number of degrees of freedom for the summarystatistics. The default is n-1
id	Which model or models (ordered as in the summary output) toreturn coefficients and variance matrix for
vcov	If TRUE, return the variance-covariance matrixas an attribute