# Stat 481: Project 2

# Summary of Problem

The Professional Bowlers Association maintains records of each of their bowlers. It is in question whether three of the bowlers in their dataset have different average bowling scores, and whether the bowlers have much variability in their scores.

# Data Description

The dataset contains an identifying number for each bowler as well as their scores from various games and the corresponding game numbers. Each bowler bowled in 56 games and following are the descriptive statistics for their scores.

```
## Bowler 7 Bowler 8 Bowler 12
## Minimum 156.0000 144.0000 126.0000
## First Quartile 186.7500 193.7500 179.0000
## Median 206.5000 208.0000 190.5000
## Mean 209.5357 208.4286 194.7500
## Third Quartile 229.2500 221.0000 209.5000
## Maximum 276.0000 264.0000 263.0000
```

```
## Bowler 7 Bowler 8 Bowler 12
## Standard Deviation of Score: 29.69418 25.86529 26.53043
```

Histograms and boxplots representing the scores of each bowler:

# Differences in Average Score

First, it must be determined whether the bowlers have differences in mean score. To do this, we regress score by bowler.

```
##
## Call:
## lm(formula = Score ~ Bowler)
##
## Coefficients:
## (Intercept) Bowler8 Bowler12
## 209.536 -1.107 -14.786
```

Bowler8 and Bowler12 are dummy variables which should be set equal to 1 when true. If one wishes to predict the score for Bowler7, both Bowler8 and Bowler12 will hold a value of 0, meaning the resulting predicted score is 209.536. The corresponding ANOVA table for this model is:

```
## Analysis of Variance Table
##
## Response: Score
## Df Sum Sq Mean Sq F value Pr(>F)
## Bowler 2 7596 3798.2 5.0538 0.007409 **
## Residuals 165 124004 751.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We can conclude scores are different from each bowler using the following hypothesis test:

H

_{0}: µ_{7}= µ_{8}= µ_{12}

H_{1}: At least one is different

P-value = 0.007409 (from ANOVA table)

P-value is less than α = 0.05, so reject H_{0}

All three means are not equal

Next, we can use a multiple comparison test to determine which specific means are different. In this case, Bonferroni is the best test to use because we know ahead of time that we want to run pairwise comparisons. The results of this test are:

```
##
## Pairwise comparisons using t tests with pooled SD
##
## data: Score and Bowler
##
## 7 8
## 8 1.000 -
## 12 0.015 0.027
##
## P value adjustment method: bonferroni
```

This table gives p-values which can be used for each pair-specific hypothesis test. For example, following is a hypothesis test for a difference in µ_{7} and µ_{8}:

H

_{0}: µ_{7}= µ_{8}

H_{1}: µ_{7}≠ µ_{8}

P-value = 1

P-value is greater than α = 0.05, so do not reject H_{0}

There is no difference between the mean scores of Bowler 7 and Bowler 8

Similarly, tests for differences between Bowler 7 and Bowler 12 and between Bowler 8 and Bowler 12 can be conducted. Because the corresponding p-values for these tests from the above table are both less than an α of 0.05, we would reject the null hypotheses in both of these cases, concluding that differences in mean scores for both of these pairs are present.

# Testing Assumptions

We must test the regression assumptions for the model used above.

### Normality of Residuals

This line is nearly linear, suggesting normality. Just to be sure, we can more formally conduct a Shapiro-Wilk hypothesis test.

```
##
## Shapiro-Wilk normality test
##
## data: fit$residual
## W = 0.98914, p-value = 0.2247
```

H

_{0}: ε_{i}∼ Normal

H_{1}: ε_{i}≁ Normal

P-value = 0.2247

P-value is greater than α = 0.05, so do not reject H_{0}

The errors follow a normal distribution, so this assumption is met

### Equal Variance of Residuals

We use this assumption later when testing for significant variability, so it’s important to verify its factuality. We can do so by conducting Levene’s test.

```
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.7429 0.1782
## 165
```

H

_{0}: Population variances are equal

H_{1}: Population variances are not equal

P-value = 0.1782

P-value is greater than α = 0.05, so do not reject H_{0}

The population variances are equal, so this assumption is met

# Variability in Bowlers

Next, we can test to see if there is evidence of significant variability in scores. It is assumed in the linear model that the variances of all three bowlers are equal, so we must run a hypothesis test to determine whether that value is significantly different from zero.

To conduct this test, we must use the random effects model. However, the F-statistic for the random effects model will be the same as the F-statistic in the model shown above because it is still equal to MSTR/MSE. Therefore, we can use the p-value of 0.007409 once again for this test. The assumption checking is the same as for the first model above.

To estimate the value of σH

_{0}: σ_{τ}^{2}= 0

H_{1}: σ_{τ}^{2}> 0 (Variance cannot be negative)

P-value = 0.007409

P-value is less than α = 0.05, so reject H_{0}

The variance in scores for each bowler is non-zero

_{τ}

^{2}, one can use the formula:

The variance in mean scores for each bowler is about 54.41.

# Conclusions

Of the three bowlers in the provided dataset, Bowler 7 and Bowler 8 are the only ones without a significant difference in mean scores. This makes sense, as their means shown in the descriptive statistics table are very close to each other. The means of Bowler 7 and 12 and Bowler 8 and 12 have a significant difference. The variances in mean scores of these three bowlers are assumed to be equal and proven to be non-zero. Their variance, σ_{τ}^{2}, can be estimated to about 54.41.

# R Code

*Please see my R code for this project:*

```
dat = read.csv("C:/Users/Britney/Documents/R/STAT 481 Project 2/P2_Dataset2.csv")
dat
library(moments)
#Seperating out each individual bowler
bowler7 = dat[c(1:56),c(1:3)]
bowler7
bowler8 = dat[c(57:112),c(1:3)]
bowler8
bowler12 = dat[c(113:168),c(1:3)]
bowler12
#Summary statistics for each bowler
summary(bowler7)
sd(bowler7$Score)
kurtosis(bowler7$Score)
skewness(bowler7$Score)
summary(bowler8)
sd(bowler8$Score)
kurtosis(bowler8$Score)
skewness(bowler8$Score)
summary(bowler12)
sd(bowler12$Score)
kurtosis(bowler12$Score)
skewness(bowler12$Score)
#Histograms and boxplots for each bowler's scores
par(mfrow=c(1,3))
hist(bowler7$Score, main = "Bowler 7 Histogram ", xlab = "Score", xlim=c(100,300))
hist(bowler8$Score, main = "Bowler 8 Histogram", xlab = "Score", xlim=c(100,300))
hist(bowler12$Score, main = "Bowler 12 Histogram", xlab = "Score", xlim=c(100,300))
boxplot(bowler8$Score, main = "Bowler 8 Boxplot", ylab = "Score", ylim=c(100,300))
boxplot(bowler7$Score, main = "Bowler 7 Boxplot", ylab = "Score", ylim=c(100,300))
boxplot(bowler12$Score, main = "Bowler 12 Boxplot", ylab = "Score", ylim=c(100,300))
attach(dat)
#Turn bowler into a factor instead of an int
Bowler=as.factor(Bowler)
class(Bowler)
#Question 1: Differences in means
#Linear model to predict score
fit = lm(Score~Bowler)
fit
summary(fit)
anova(fit)
#Bonferroni is best since we want pairwise and we know beforehand
#Bonferroni Test
pairwise.t.test(x = Score,
g = Bowler,
p.adj = "bonferroni")
#Provides p-values
#Question 2: Variability
fit2 = lm(Score~Bowler)
summary(fit2)
anova(fit2)
#Checking equal variance assumption
library(car)
leveneTest(fit)
#Checking normality of errors
par(mfrow=c(1,1))
qqnorm(fit$residual)
qqline(fit$residual)
shapiro.test(fit$residual)
knitr::opts_chunk$set(echo = TRUE)
```