#Question 1
You are a data scientist for the state of California Department of Public Health. You’ve been tasked with giving a report to Governor Newsom every morning about the most current COVID-19 conditions at the county level. As it stands, the California Department of Public Health maintains a watch list of counties that are being monitored for worsening coronavirus trends. There are six criteria used to place counties on the watch list: Doing fewer than 150 tests per 100,000 residents daily (over a 7-day average) More than 100 new cases per 100,000 residents over the past 14 days… 25 new cases per 100,000 residents and an 8% test positivity rate 10% or greater increase in COVID-19 hospitalized patients over the past 3 days Fewer than 20% of ICU beds available Fewer than 25% ventilators available Of these 6 conditions, you are in charge of monitoring condition number 2. To do this job well, you should set up a reproducible framework to communicate the following in a way that can be updated every time new data is released: -cumulative cases in the 5 worst counties -total NEW cases in the 5 worst counties -A list of safe counties -A text report describing the total new cases, total cumulative cases and number of safe counties. You should build this analysis in such a way that running it will extract the most current data straight from the NY-Times URL, the state name is a parameter that can be changed allowing this report to be run for other states.
library(tidyverse)
library(knitr)
library(readxl)
library(zoo)
library(ggthemes)
library(scales)
Pop = read_excel("../data/lab-02-materials 2/PopulationEstimates.xls", skip = 2) %>%
select(pop2019 = POP_ESTIMATE_2019, fips = FIPStxt, Area_Name)
housing_prices = read_csv("../data/lab-02-materials 2/landdata-states.csv")
covid_data = read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv")
#Question 1
library(tidyverse)
library(knitr)
library(readxl)
library(zoo)
library(ggthemes)
library(scales)
q1subset = inner_join(Pop, covid_data,by = 'fips') %>%
filter(state == "California") %>%
group_by(county) %>%
mutate(daily_new_cases = cases - lag(cases)) %>%
ungroup()
mostcases = q1subset %>%
filter(date == max(date)) %>%
slice_max(cases, n = 5) %>%
select(county, cases)
knitr::kable(mostcases, caption = paste("California", 'counties with the most cumulative cases'), col.names = c('County', 'Cases'))
County | Cases |
---|---|
Los Angeles | 253985 |
Riverside | 55073 |
Orange | 52121 |
San Bernardino | 50699 |
San Diego | 42742 |
mostcases_new = q1subset %>%
filter(date == max(date)) %>%
slice_max(daily_new_cases, n = 5) %>%
select(county, daily_new_cases)
knitr::kable(mostcases_new, caption = paste("California", 'counties with the most new cases'), col.names = c('County', 'New Cases'))
County | New Cases |
---|---|
Los Angeles | 809 |
San Diego | 265 |
Orange | 185 |
Fresno | 159 |
San Bernardino | 156 |
library(tidyverse)
last14days = q1subset %>%
filter(date > max(date) - 13) %>%
group_by(county, pop2019) %>%
summarise(daily_new_cases = sum(daily_new_cases, na.rm = TRUE)) %>%
ungroup() %>%
mutate(casePer100000 = daily_new_cases / (pop2019 / 100000)) %>%
filter(casePer100000 <= 100) %>%
select(county, casePer100000)
knitr::kable(last14days, caption = c("Pop Lowest Cases in CA Counties"), col.names = c('County', 'Lowest New Cases per 100000'))
County | Lowest New Cases per 100000 |
---|---|
Alpine | 0.00000 |
Amador | 95.59267 |
Del Norte | 21.57342 |
El Dorado | 34.74329 |
Humboldt | 43.52381 |
Inyo | 94.24026 |
Lake | 91.63483 |
Lassen | 39.25032 |
Mariposa | 11.62588 |
Mendocino | 97.98384 |
Mono | 6.92329 |
Napa | 90.74805 |
Nevada | 36.08842 |
Placer | 78.57826 |
Plumas | 15.95151 |
Riverside | 87.59197 |
San Francisco | 91.65684 |
San Luis Obispo | 85.12562 |
Shasta | 23.32297 |
Sierra | 0.00000 |
Siskiyou | 41.34224 |
Solano | 68.80483 |
Tehama | 79.89675 |
Trinity | 16.28002 |
Tuolumne | 56.90370 |
###Results: Cumulative case counts of Covid-19 are the highest in these counties: Los Angeles, Riverside, Orange, San Bernardino, and San Diego. The counties with the highest number of recent, new cases include Los Angeles, San Bernardino, Fresno, Sacramento, and Orange. According to my data frame of “last14days” there are 12 counties with cases per 100,000 less than 100. These counties can be said to be safe. Based on these findings, it is safe to say that population density and total population play a role in whether or not a county has seen a lot of cases, or none at all.
Question 2 #In this question, we are going to look at the story of 4 states and the impact scale can have on data interprtation. The states include: New York, California, Louisiana, and Florida.
#Your task is to make a faceted bar plot showing the number of daily, new cases at the state level.
fourStates = covid_data %>%
filter(state %in% c("New York", "California", "Louisiana", "Florida")) %>%
group_by(state, date) %>%
summarise(totalcases = sum(cases)) %>%
ungroup() %>%
group_by(state) %>%
mutate(newCases = totalcases - lag(totalcases)) %>%
mutate(roll7 = zoo::rollmean(newCases, 7, fill = NA, allign = "right")) %>%
ungroup()
ggplot(data = fourStates, aes(x = date)) +
geom_col(aes(y = newCases), col = NA, fill = "pink") +
geom_line(aes(y = roll7)) +
ggthemes::theme_economist() +
labs(title = paste("Daily New Cases in NY, CA, LA, FL")) +
facet_wrap(~state) +
theme(plot.background = element_rect(fill = "white"), panel.background = element_rect(fill = "white"),
plot.title = element_text(size = 10)) +
theme(aspect.ratio = .5)
fourStates_pc = covid_data %>%
filter(state %in% c("New York", "California", "Louisiana", "Florida")) %>%
left_join(Pop, by = c("state" = "Area_Name")) %>%
group_by(state, date) %>%
summarise(totalcases = sum(cases), pop2019 = pop2019[1]) %>%
ungroup() %>%
group_by(state) %>%
mutate(newCases = (totalcases - lag(totalcases)) / pop2019) %>%
mutate(roll7 = (zoo::rollmean(newCases, 7, fill = NA, allign = "right"))) %>%
ungroup()
ggplot(data = fourStates_pc, aes(x = date)) +
geom_col(aes(y = newCases), col = NA, fill = "pink") +
geom_line(aes(y = roll7)) +
ggthemes::theme_economist() +
labs(title = paste("Daily New Cases in NY, CA, LA, FL")) +
facet_wrap(~state) +
theme(plot.background = element_rect(fill = "white"), panel.background = element_rect(fill = "white"),
plot.title = element_text(size = 10)) +
theme(aspect.ratio = .5)
###Results: By scaling the covid-19 data, It makes California look a lot better. They all have the same shape in graph when compared with one another, but California can be seen to have the most dramatic change.