#Question 1

You are a data scientist for the state of California Department of Public Health. You’ve been tasked with giving a report to Governor Newsom every morning about the most current COVID-19 conditions at the county level. As it stands, the California Department of Public Health maintains a watch list of counties that are being monitored for worsening coronavirus trends. There are six criteria used to place counties on the watch list: Doing fewer than 150 tests per 100,000 residents daily (over a 7-day average) More than 100 new cases per 100,000 residents over the past 14 days… 25 new cases per 100,000 residents and an 8% test positivity rate 10% or greater increase in COVID-19 hospitalized patients over the past 3 days Fewer than 20% of ICU beds available Fewer than 25% ventilators available Of these 6 conditions, you are in charge of monitoring condition number 2. To do this job well, you should set up a reproducible framework to communicate the following in a way that can be updated every time new data is released: -cumulative cases in the 5 worst counties -total NEW cases in the 5 worst counties -A list of safe counties -A text report describing the total new cases, total cumulative cases and number of safe counties. You should build this analysis in such a way that running it will extract the most current data straight from the NY-Times URL, the state name is a parameter that can be changed allowing this report to be run for other states.

library(tidyverse)
library(knitr)
library(readxl)
library(zoo)
library(ggthemes)
library(scales)
Pop = read_excel("../data/lab-02-materials 2/PopulationEstimates.xls", skip = 2) %>% 
  select(pop2019 = POP_ESTIMATE_2019, fips = FIPStxt, Area_Name)

housing_prices = read_csv("../data/lab-02-materials 2/landdata-states.csv")

covid_data = read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv")
#Question 1
library(tidyverse)
library(knitr)
library(readxl)
library(zoo)
library(ggthemes)
library(scales)


q1subset = inner_join(Pop, covid_data,by = 'fips') %>%
  filter(state == "California") %>%
  group_by(county) %>%
  mutate(daily_new_cases = cases - lag(cases)) %>%
  ungroup()
mostcases = q1subset %>%
  filter(date == max(date)) %>%
  slice_max(cases, n = 5) %>% 
  select(county, cases)

knitr::kable(mostcases, caption = paste("California", 'counties with the most cumulative cases'), col.names = c('County', 'Cases'))
California counties with the most cumulative cases
County Cases
Los Angeles 253985
Riverside 55073
Orange 52121
San Bernardino 50699
San Diego 42742
mostcases_new = q1subset %>%
  filter(date == max(date)) %>%
  slice_max(daily_new_cases, n = 5) %>% 
  select(county, daily_new_cases)

knitr::kable(mostcases_new, caption = paste("California", 'counties with the most new cases'), col.names = c('County', 'New Cases'))
California counties with the most new cases
County New Cases
Los Angeles 809
San Diego 265
Orange 185
Fresno 159
San Bernardino 156
library(tidyverse)
last14days = q1subset %>% 
  filter(date > max(date) - 13) %>% 
  group_by(county, pop2019) %>% 
  summarise(daily_new_cases = sum(daily_new_cases, na.rm = TRUE)) %>% 
  ungroup() %>% 
  mutate(casePer100000 = daily_new_cases / (pop2019 / 100000)) %>% 
  filter(casePer100000 <= 100) %>% 
  select(county, casePer100000)

knitr::kable(last14days, caption = c("Pop Lowest Cases in CA Counties"), col.names = c('County', 'Lowest New Cases per 100000'))
Pop Lowest Cases in CA Counties
County Lowest New Cases per 100000
Alpine 0.00000
Amador 95.59267
Del Norte 21.57342
El Dorado 34.74329
Humboldt 43.52381
Inyo 94.24026
Lake 91.63483
Lassen 39.25032
Mariposa 11.62588
Mendocino 97.98384
Mono 6.92329
Napa 90.74805
Nevada 36.08842
Placer 78.57826
Plumas 15.95151
Riverside 87.59197
San Francisco 91.65684
San Luis Obispo 85.12562
Shasta 23.32297
Sierra 0.00000
Siskiyou 41.34224
Solano 68.80483
Tehama 79.89675
Trinity 16.28002
Tuolumne 56.90370

###Results: Cumulative case counts of Covid-19 are the highest in these counties: Los Angeles, Riverside, Orange, San Bernardino, and San Diego. The counties with the highest number of recent, new cases include Los Angeles, San Bernardino, Fresno, Sacramento, and Orange. According to my data frame of “last14days” there are 12 counties with cases per 100,000 less than 100. These counties can be said to be safe. Based on these findings, it is safe to say that population density and total population play a role in whether or not a county has seen a lot of cases, or none at all.

Question 2 #In this question, we are going to look at the story of 4 states and the impact scale can have on data interprtation. The states include: New York, California, Louisiana, and Florida.

#Your task is to make a faceted bar plot showing the number of daily, new cases at the state level.

fourStates = covid_data %>%
  filter(state %in% c("New York", "California", "Louisiana", "Florida")) %>%
  group_by(state, date) %>%
  summarise(totalcases = sum(cases)) %>%
  ungroup() %>%
  group_by(state) %>%
  mutate(newCases = totalcases - lag(totalcases)) %>%
  mutate(roll7 = zoo::rollmean(newCases, 7, fill = NA, allign = "right")) %>% 
  ungroup()

ggplot(data = fourStates, aes(x = date)) + 
  geom_col(aes(y = newCases), col = NA, fill = "pink") +
  geom_line(aes(y = roll7)) +
  ggthemes::theme_economist() +
  labs(title = paste("Daily New Cases in NY, CA, LA, FL")) +
  facet_wrap(~state) +
  theme(plot.background = element_rect(fill = "white"), panel.background = element_rect(fill = "white"), 
        plot.title = element_text(size = 10)) +
  theme(aspect.ratio  = .5)

fourStates_pc = covid_data %>%
  filter(state %in% c("New York", "California", "Louisiana", "Florida")) %>%
  left_join(Pop, by = c("state" = "Area_Name")) %>% 
  group_by(state, date) %>%
  summarise(totalcases = sum(cases), pop2019 = pop2019[1]) %>%
  ungroup() %>%
  group_by(state) %>%
  mutate(newCases = (totalcases - lag(totalcases)) / pop2019) %>%
  mutate(roll7 = (zoo::rollmean(newCases, 7, fill = NA, allign = "right"))) %>% 
  ungroup()


ggplot(data = fourStates_pc, aes(x = date)) + 
  geom_col(aes(y = newCases), col = NA, fill = "pink") +
  geom_line(aes(y = roll7)) +
  ggthemes::theme_economist() +
  labs(title = paste("Daily New Cases in NY, CA, LA, FL")) +
  facet_wrap(~state) +
  theme(plot.background = element_rect(fill = "white"), panel.background = element_rect(fill = "white"), 
        plot.title = element_text(size = 10)) +
  theme(aspect.ratio  = .5)

###Results: By scaling the covid-19 data, It makes California look a lot better. They all have the same shape in graph when compared with one another, but California can be seen to have the most dramatic change.