hw-04.Rmd
from the course calendar and save it to your newly created “hw-04” folder.hw-04.Rmd
document in RStudio. Update the YAML, changing the author name to your name, and knit the document to PDF..pdf
file with the same name in the same directory.Other options for automated styling in RStudio include the built-in Code > Reformat Code option, or the formatR
package.
We spent the better part of a class period describing the features of R style guidelines, but anything with rules (like style guidelines) can easily be automated. And since RStudio is created by programmers, there are tools within RStudio to help us adhere to a particular style. One option for automated styling is the styler addin.
Read the following two pages about RStudio addins and the styler add-in:
Install the styler addin, and then complete the following exercises.
Consider the following code:
data(iris) #read in the data set
= cor(iris$Sepal.Length,iris$Sepal.Width);r
r %>%group_by(Species) %>% summarize(r.sepal=cor(Sepal.Length,Sepal.Width), r.petal=cor(Petal.Length,Petal.Width),mean.sepal=mean(Sepal.Length),mean.petal=mean(Petal.Length)) #correlation by species
iris%>%ggplot(aes(x=Sepal.Length,y=Sepal.Width,col=Species))+
irisgeom_point()+theme_minimal()+ggtitle(`Sepal Width vs Height by Species`)
#here is some basic math
1*2^5
#and a function
=function ( b,h ){#calculate area of a triangle with base b and height h
triangle_area1/2)*b*h} (
Copy and paste the code above into an R chunk in your hw-04.Rmd
file. Select the code and run styler addin on the selection.
Create a bulleted list describing which features in the code changed and how they were styled, e.g.,
=
to <-
when used as assignment== read_csv("http://math.montana.edu/ahoegh/teaching/stat408/datasets/titanic.csv")
titanic
titanic %>% filter(!is.na(Age)) %>% # removed passengers without age
mutate(Pclass = factor(Pclass)) %>% # changed class to factor
ggplot(y = Age, x = Pclass)) %>%
geom_boxplot(outlier.shape = NA) +
geom_jitter(color = Sex) +
theme_bw() +
xlab(Passenger Class) +
ggtitle('Passenger age by class and gender on Titanic')
There is a classic probability problem called “the hat problem”:
Suppose \(n\) people go to a fancy restaurant. Each person is wearing a hat and checks his/her hat at the door as he/she arrives. The hat-check attendant gets tipsy throughout the evening, and returns a random hat to each person as they leave. The patrons leave in a random order. What is the probability that no one gets his or her own hat back?
The goal of this part of the homework is to write a function that will take an argument \(n\) and return the estimated probability that no one gets their own hat back.
In order to estimate this probability, we need to simulate the random process many times, and then calculate the proportion of times no one got their hat back.
The code below will simulate one trial of the hat process for \(n = 20\) and return the number of people who got their hat back.
<- sample(1:20, 20, replace = FALSE)
hats <- sample(1:20, 20, replace = FALSE)
heads sum(hats == heads)
Try running the code and make sure you understand each line of the code before proceeding.
n_matches
.<- vector("integer", 1000)
n_matches for(i in seq_along(n_matches)) {
# add code here to simulate hat process
# store the result from each iteration
# in n_matches
}
Hint: The code n_matches == 0
will generate a logical vector. Since logical vectors are treated as 0’s (FALSE) and 1’s (TRUE), mean(n_matches == 0)
will return the proportion of TRUEs in the vector.
Use the n_matches
object created in the last exercise to estimate the probability of zero hat matches when \(n = 20\).
Now, create a function called FindHatProbability
that takes arguments n
(number of hats) and reps
(number of times to simulate the process), and returns an estimated probability of zero hat matches. Set the default value for reps
to 10000.
It can be shown that the limit of this probability as \(n\) goes to infinity is \(1/e\)!
For a short video of the Monty Hall problem see from 21 with Kevin Spacey or from numb3rs tv show.
Another fun probability problem is the “Monty Hall problem”. Deriving the probabilities in the hat problem or the Monty Hall problem requires some advanced probability knowledge, but estimating the probabilities through simulation takes only a bit of coding!
<- function(num.sims, print){
MontyHallMonteCarlo # Function to simulate Monty Hall winning probability when switching doors
# ARGS: number of simulations (as integer or double), print command
# that accepts TRUE or FALSE as to whether to print simulation results
# Returns: list containing winning probability and (if print = TRUE)
# vector of results with strings "Win" or "Lose" for each simulation
if (!num.spins %% 1 == 0) stop('Please enter an integer or double')
<- rep(FALSE,num.sims)
results for (i in 1:num.sims){
# randomly choose door with car
<- sample(3,1)
car.door # randomly choose door for participant to select
<- sample(1,3)
select.door # you win when switching if the door with a car is not the
# one you initally selected
if (car.door = select.door) {
<- FALSE
results
}
}<- mean(results)
win.prob ifelse(print, return(list(win.prob,results)),return(list(win.prob))
}
MonteHallMonteCarlo(8.1,print=T)
MonteHallMonteCarlo('8.1',print=T)
MonteHallMonteCarlo(8,print=T)
MonteHallMonteCarlo(10000,print=F)
Write the sources you used to complete this assignment at the end of your .Rmd document, adhering to the “Guidance on Citing Sources” bullet points in the collaboration policy section on our course syllabus.