reg_norm_trans.txt jfm 2/24/2012 Regression w/ normal deviates ********************************************* > x.100 <- c(1:100) > y.norm <- rnorm(100, 0, 5) > yy <- abs(x.100 + y.norm) > plot(x.100, yy) ********************************************* Now, generate values that violate equal variances assumption > yz <- abs(x.100 + log(x.100)*y.norm) > plot(x.100, yz) > # Variance in Y increases w/ larger Y > # Try log-transforming Y > yz.trans <- log(yz + 1) > plot(x.100, yz.trans) > # Result: too strong: variance in Y now DECREASES w/ larger Y > # also changed distribution from linear to curved > # Instead, try square root transform of Y > yz.sqrt <- sqrt(yz + 0.5) > plot(x.100, yz.sqrt) > # Result: Variance in Y about equal throughout, but pattern still curved > # Try transforming X > x.trans <- sqrt(x.100 + 0.5) > plot(x.trans, yz.sqrt) > # Better, although still not perfect > # Try fitting regression line, then adding line to plot > reg.trans <- lm(yz.sqrt ~ x.trans) > abline(reg.trans) > summary(reg.trans) Call: lm(formula = yz.sqrt ~ x.trans) Residuals: Min 1Q Median 3Q Max -3.70343 -0.95159 0.02777 1.10129 2.83043 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.42822 0.40860 -1.048 0.297 x.trans 1.02547 0.05722 17.923 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.317 on 98 degrees of freedom Multiple R-squared: 0.7662, Adjusted R-squared: 0.7639 F-statistic: 321.2 on 1 and 98 DF, p-value: < 2.2e-16 > # Now plot/view residuals > plot(x.trans, reg.trans$residuals) > # Add reference line at Y=0 > min(x.trans) [1] 1.224745 > max(x.trans) [1] 10.02497 > abline(1.22, 10.0, 0, 0)