Catégories
Non classé

Does 100 m equal 1 km?

Climbing or not climbing?

Photo : Alexis Martín

In trail running or orienteering people say that if you have to run 100 m of elevation up and down it would take the same time as running flat for 1000 m. We can find a similar old rule of thumb and more recently, some researchers (Davey, Hayes & Norman, 1994 ; Scarf, 2007) added a little science to this vernacular knowledge but with few data points (2 athletes and 300 race results, respectively).

Could we add some modern massive data to check this saying ? Using our dataset scraped from ITRA on 16 949 race results from 2 802 runners, we can fit a basic non linear model to estimate the parameters :

> (model <- results %>% 
>   nls(heures ~ (dist_tot + deniv_tot / k) / v, data = .,
>      start = list(k = 100, v = 8)))

Nonlinear regression model
  model: heures ~ (dist_tot + deniv_tot/k)/v
   data: .
     k      v 
87.239  9.565 
 residual sum-of-squares: 313542

Number of iterations to convergence: 3 
Achieved convergence tolerance: 1.292e-06

> confint(model)

Waiting for profiling to be done...
       2.5%     97.5%
k 81.672165 93.296957
v  9.332102  9.809812

So we see that the average flat speed sustainable over a long period of our sample (which is biased towards elite runners) is around 9.6 km⋅h-1 and that 1 km flat is equivalent to 87 m [82 – 93] of height gain, not far from the old 100 m. Of course these values will vary according the athlete shape, the total race length and profile and many other parameters…

References

Davey, R. C., Hayes, M., & Norman, J. M. (1994). Running uphill : An experimental result and its applications. The Journal of the Operational Research Society, 45(1), 25. https://doi.org/10.2307/2583947

Scarf, P. (2007). Route choice in mountain navigation, Naismith’s rule, and the equivalence of distance and climb. Journal of Sports Sciences, 25(6), 719‑726. https://doi.org/10.1080/02640410600874906

Catégories
Non classé

Playing with Roman numerals

What is the longest year number (yet) written in Roman numerals ?

library(tidyverse)

tibble(y = 1:2020) %>%
  mutate(r = as.roman(y),
         l = str_length(r)) %>%
  slice_max(l)
# A tibble: 1 x 3
      y r                 l
  <int> <roman>       <int>
1  1888 MDCCCLXXXVIII    13

It is year 1888, with 13 characters…

And the largest writable number being 3899, according to the strict rules in R (however some say it’s 3999),

tibble(y = 1:5000) %>%
	mutate(r = as.roman(y),
	       l = str_length(r)) %>%
	filter(lead(is.na(l))) %>% 
  	slice_min(l)
# A tibble: 1 x 3
      y r               l
  <int> <roman>     <int>
1  3899 MMMDCCCXCIX    11

the longest overall year will be year 3888 with 15 characters.

tibble(y = 1:3899) %>%
	mutate(r = as.roman(y),
	       l = str_length(r)) %>%
	slice_max(l)
# A tibble: 1 x 3
      y r                   l
  <int> <roman>         <int>
1  3888 MMMDCCCLXXXVIII    15

Nice pattern :

tibble(y = 1:3899) %>%
	mutate(r = as.roman(y),
	       l = str_length(r)) %>%
	ggplot(aes(y, l)) +
	# geom_col(width = 1) +
	geom_point(alpha = .2) +
	# geom_line(alpha = .5) +
	geom_smooth() +
	labs(title = "Characters length of roman numeral years",
		 x = "year",
		 y = "characters")
Characters length of roman numeral years

And there are only eleven palindromic years :

tibble(y = 1:3899) %>%
	mutate(r = as.character(as.roman(y)),
	       rr = stringi::stri_reverse(r)) %>% 
	filter(r == rr,
	       str_length(r) > 1)
# A tibble: 11 x 3
       y r     rr   
   <int> <chr> <chr>
 1     2 II    II   
 2     3 III   III  
 3    19 XIX   XIX  
 4    20 XX    XX   
 5    30 XXX   XXX  
 6   190 CXC   CXC  
 7   200 CC    CC   
 8   300 CCC   CCC  
 9  1900 MCM   MCM  
10  2000 MM    MM   
11  3000 MMM   MMM  
Catégories
Non classé

Trail running : is it worth starting?

Should I stay or should I go ?

the Clash

https://www.flickr.com/photos/alexismarod/33815546074/

Trail running can be hard (Malliaropoulos, Mertyri and Tsaklis 2015; Hespanhol Junior, van Mechelen and Verhagen 2017; Istvan et al. 2019; Messier et al. 2018). During one, I was wondering if we can avoid the race completely and predict the race time according to distance, elevation change, sex and performance index (so yes, we have to run sometimes to measure it).

We need a dataset to play with. ITRA maintains a quite comprehensive database (International Trail Running Association 2020), but no current dataset is openly available. However we can use the hidden JSON API to scrape a sample. I created a R package {itra} for this purpose ; with it we can get some runners and their results:

NOTE (2021-09) : the ITRA website has changed, so the package doesn’t work any longer.

remotes::install_gitlab("mdagr/itra")
library(itra)
library(tidyverse)
library(glue)
library(hms)
library(lubridate)
library(gghighlight)
library(progress)

A test:

itra_runners("jornet")
#> # A tibble: 5 x 15
#>   id_coureur     id  cote nom   prenom quartz team      M L     XL      XXL
#>        <int>  <int> <int> <chr> <chr>   <int> <chr> <int> <lgl> <lgl> <int>
#> 1       2704 2.70e3   950 JORN… Kilian     20 ""      898 NA    NA      896
#> 2     707322 7.07e5   538 RENU… Arturo      0 ""       NA NA    NA       NA
#> 3    2667939 2.67e6   520 PARE… Samuel      0 ""       NA NA    NA       NA
#> 4    2673933 2.67e6   468 PÉRE… Anna        0 ""       NA NA    NA       NA
#> 5    2673935 2.67e6   468 RIUS… Sergi       0 ""       NA NA    NA       NA
#> # … with 4 more variables: S <int>, XS <int>, XXS <lgl>, palmares <chr>

It works…

Get the data

So, we’re going to build a daframe of single letters and random page index to create our dataset of runners. I’ll manually add some low index to oversample the elite athletes otherwise we will probably miss them entirely.

# helpers -----------------------------------------------------------------

# compute the mode (to find the runner sex from its details ; see below)
stat_mode <- function(x, na.rm = FALSE) {
  if(na.rm) {
    x <- x[!is.na(x)]
  }
  
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}


# sample -------------------------------------------------------------

# progress bar versions of ITRA functions (scraping will be slow, we want to know if it works)
pb_itra_runners <- function(...) {
  pb$tick()
  itra_runners(...)
}

pb_itra_runner_results <- function(...) {
  pb$tick()
  itra_runner_results(...)
}

pb_itra_runner_details <- function(...) {
  pb$tick()
  itra_runner_details(...)
}

# dataset generation from a "random" set of index
#
oversample_high_score <- c(0, 50, 100, 250, 500)
my_sample <- tibble(l = letters,
                 n = c(oversample_high_score, 
                       sample(0:100000,
                              length(letters) - length(oversample_high_score)))) %>% 
  expand(l, n) %>% 
  distinct(l, n) 

We have 676 tuples of (letter, index). We can now start querying the website, we should get 5 runners (by default) for each tuple. But we will have some duplicates because for example at index 0 we will get « Kilian Jornet » for the letters « k », « i », « l »…

I use purr::map2 to iterate along our sample, using slowly to avoid overloading the website and possibly to get NA instead of stopping errors.

You’ll see that the JSON variables have french or english names!

#  querying runners --------

pb <- progress_bar$new(format = "[:bar] :current/:total remaining: :eta",
             total = nrow(my_sample))

my_dataset <- my_sample %>% 
  mutate(runners = map2(l, n, possibly(slowly(pb_itra_runners, rate_delay(.5)), NA)))

# runners list
runners <- my_dataset %>% 
  drop_na(runners) %>% 
  unnest(runners) %>% 
  distinct(id_coureur, .keep_all = TRUE) %>% 
  select(-l, -n)

Here instead of the nominal 3380 runners I get only 2796…

Then we can query for their race results and their sex.

pb <- progress_bar$new(format = "[:bar] :current/:total remaining: :eta",
                       total = nrow(runners))
results <- runners %>% 
  select(id, first_name = prenom, last_name = nom) %>% 
  mutate(runner_results = pmap(., possibly(slowly(pb_itra_runner_results, rate_delay(.5)), NA))) %>% 
  unnest(runner_results)

# performance index by trail category and sex
pb <- progress_bar$new(format = "[:bar] :current/:total remaining: :eta",
             total = nrow(runners))
runners_details <- runners %>% 
  select(id) %>% 
  mutate(details = map(id, possibly(slowly(pb_itra_runner_details, rate_delay(.5)), NA))) %>% 
  unnest(details)

Some preprocessing is necessary. Sex is not a direct property of the runners, we must get it from runners_details. We then join our 3 dataframes and compute some new variables.

Levels come from:

https://itra.run/content/faq-indice-perfomance
# preprocessing ----
runners_sex <- runners_details %>% 
  group_by(id) %>% 
  summarise(sex = stat_mode(sexe, na.rm = TRUE))

results_clean <- results %>% 
  rename(cote_course = cote) %>% 
  left_join(select(runners, -nom, -prenom), by = "id") %>% 
  left_join(runners_sex, by = "id") %>% 
  mutate(dist_tot = as.numeric(dist_tot),
       dt = ymd(dt),
       time = lubridate::hms(temps),
       hours = as.numeric(time, "hours"),
       kme = dist_tot + deniv_tot / 100,
       level = factor(
                  case_when(
                   (cote > 825 & sex == "H") | (cote > 700 & sex == "F") ~ "elite",
                   (cote > 750 & sex == "H") | (cote > 625 & sex == "F") ~ "expert",
                   (cote > 650 & sex == "H") | (cote > 550 & sex == "F") ~ "advanced",
                   (cote > 500 & sex == "H") | (cote > 475 & sex == "F") ~ "strong",
                   (cote > 350 & sex == "H") | (cote > 350 & sex == "F") ~ "intermediate",
                   is.na(sex) ~ NA_character_,
                   TRUE ~ "novice"), 
                 levels = c("novice", "intermediate", "strong", "advanced", "expert", "elite")))
  

Exploratory analysis

Having a quick look at the dataset…

# analysis -----------------------------------------------------------------

# runners' ITRA performance index
runners %>% 
  left_join(runners_sex) %>% 
  ggplot(aes(cote, fill = sex)) +
    geom_histogram(binwidth = 10)
Most athletes have an ITRA score between 300 and 700. We can also see the results of our oversampling with many athletes added above 700. We can notice trail running is sadly a male (H) sport.
# distance ran
results_clean %>% 
  ggplot(aes(dist_tot)) +
    geom_histogram(binwidth = 2)
We can see the common distances at 25, 50, 100 km and 160 km (100 miles)
# times ran
results_clean %>% 
  ggplot(aes(hours)) +
    geom_histogram(binwidth = 1 / 6)
Most athletes run about 4 hours
# by level
results_clean %>% 
  drop_na(level) %>% 
  filter(hours != 0) %>% 
  ggplot(aes(kme, hours, color = level)) +
    geom_point(alpha = .3) +
    gghighlight() +
    coord_cartesian(xlim = c(0, 320), ylim = c(0, 65)) +
    facet_wrap(~ level)
Novices are slow and don’t run long races. Elites are fast from short races to ultra-trail

Prediction

Now, back to my initial question. Can I just predict my race time and avoid a long pain by not running?

I will use a random forest. Since the rows are not independent (we have several results per runner), I use a group kfold on the runner id.

library(caret)
library(ranger)

results_rf <- results_clean %>% 
	drop_na(hours, cote)
	
# grouped CV folds
group_fit_control <- trainControl(
	index = groupKFold(results_rf$id, k = 10),
	method = "cv")

rf_fit <- train(hours ~ dist_tot + deniv_tot + deniv_neg_tot + cote + sex, 
				data = results_rf, 
				method = "rf",
				trControl = group_fit_control)
rf_fit
Random Forest 

18339 samples
    5 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 16649, 16704, 16338, 16524, 16407, 16223, ... 
Resampling results across tuning parameters:

  mtry  splitrule   RMSE      Rsquared   MAE      
  2     variance    1.822388  0.9472348  0.8747978
  2     extratrees  1.844228  0.9463288  0.8810525
  3     variance    1.827556  0.9469227  0.8801065
  3     extratrees  1.847274  0.9459055  0.8800262
  5     variance    1.873645  0.9440697  0.9021319
  5     extratrees  1.853316  0.9455854  0.8882530

Tuning parameter 'min.node.size' was held constant at a value of 5
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were mtry = 2, splitrule = variance
 and min.node.size = 5.

So with a root mean square error of 1.82 hours, I can’t say that my model is super precise. However in my case, my last 16 km (01:40:00) is pretty well predicted!

predict(rf_fit, newdata = tibble(dist_tot = 16,
								 deniv_tot = 600,
								 deniv_neg_tot = 600,
								 cote = 518,
								 sex = "H")) %>% 
	magrittr::multiply_by(3600) %>% 
	floor() %>% 
	seconds_to_period() %>% 
	hms::hms()
# 01:37:42

So next time I’ll just run the model instead of running the race.

I’ll leave the tuning to the specialists. While writing this post I found another (Fogliato, Oliveira, and Yurko 2020) potential similar use for the {itra} package.

References

  • Fogliato, Riccardo, Natalia L. Oliveira and Ronald Yurko. « TRAP: A Predictive Framework for Trail Running Assessment of Performance ». arXiv:2002.01328 [stat], 12 july 2020. http://arxiv.org/abs/2002.01328.
  • Hespanhol Junior, Luiz Carlos, Willem van Mechelen and Evert Verhagen. 2017. « Health and Economic Burden of Running-Related Injuries in Dutch Trailrunners: A Prospective Cohort Study ». Sports Medicine 47 (2): 367‑77. https://doi.org/10.1007/s40279-016-0551-8.
  • International Trail Running Association. « ITRA », 2020. https://itra.run/.
  • Istvan, Albertus Oosthuizen, Paul Yvonne, Jeremy Ellapen Terry, Barnard Marco, Bongani Qumbu Timothy, Valery Hammill Henriëtte and Lukas Strydom Gert. 2019. « Common ultramarathon trail running injuries and illnesses: A review (2007-2016) ». International Journal of Medicine and Medical Sciences 11 (4): 36‑42. https://doi.org/10.5897/IJMMS2018.1386.
  • Malliaropoulos, Nikolaos, Dimitra Mertyri and Panagiotis Tsaklis. 2015. « Prevalence of Injury in Ultra Trail Running ». Human Movement 16 (2). https://doi.org/10.1515/humo-2015-0026.
  • Messier, Stephen P., David F. Martin, Shannon L. Mihalko, Edward Ip, Paul DeVita, D. Wayne Cannon, Monica Love et al. 2018. « A 2-Year Prospective Cohort Study of Overuse Running Injuries: The Runners and Injury Longitudinal Study (TRAILS) ». The American Journal of Sports Medicine 46 (9): 2211‑21. https://doi.org/10.1177/0363546518773755.
Catégories
Non classé

Using the geofacet package to spatially arrange plots

The {geofacet} package allows to « arrange a sequence of plots of data for different geographical entities into a grid that strives to preserve some of the original geographical orientation of the entities ».

Like the previous post, it’s interesting if you view each entity as a unit and don’t care for its real size or weight, and don’t want to spend too much time manually finding the best grid.

We will again use the same COVID-19 dataset. We manually add the overseas départements once we have found the right grid (by trying different seeds) and adjust Corsica position.

COVID-19 deceased in hospital, by département, for 100 000 inhab.
# packages ----------------------------------------------------------------
library(tidyverse)
library(httr)
library(fs)
library(sf)
library(readxl)
library(janitor)
library(glue)
library(geofacet)
# also install ragg

# sources -----------------------------------------------------------------

# https://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-lepidemie-de-covid-19/
fichier_covid <- "donnees/covid.csv"
url_donnees_covid <- "https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7"

# https://www.insee.fr/fr/statistiques/2012713#tableau-TCRD_004_tab1_departements
fichier_pop <- "donnees/pop.xls"
url_donnees_pop <- "https://www.insee.fr/fr/statistiques/fichier/2012713/TCRD_004.xls"

# Adminexpress : à télécharger manuellement
# https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#admin-express
aex <- path_expand("~/Downloads/ADMIN-EXPRESS_2-2__SHP__FRA_2020-02-24/ADMIN-EXPRESS/1_DONNEES_LIVRAISON_2020-02-24")


# config ------------------------------------------------------------------

options(scipen = 999)

force_download <- FALSE # retélécharger même si le fichier existe et a été téléchargé aujourd'hui ?


# téléchargement -------------------------------------------------

if (!dir_exists("donnees")) dir_create("donnees")
if (!dir_exists("resultats")) dir_create("resultats")
if (!dir_exists("resultats/animation_spf")) dir_create("resultats/animation_spf")

if (!file_exists(fichier_covid) |
    file_info(fichier_covid)$modification_time < Sys.Date() |
    force_download) {
  GET(url_donnees_covid,
      progress(),
      write_disk(fichier_covid, overwrite = TRUE)) %>%
    stop_for_status()
}

if (!file_exists(fichier_pop)) {
  GET(url_donnees_pop,
      progress(),
      write_disk(fichier_pop)) %>%
    stop_for_status()
}

covid <- read_csv2(fichier_covid)

pop <- read_xls(fichier_pop, skip = 2) %>%
  clean_names()

# adminexpress prétéléchargé
dep <- read_sf(path(aex, "ADE_2-2_SHP_LAMB93_FR/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  st_set_crs(2154)


# construction de la grille ----------------------------------------

grid_fr <- dep %>%
  select(insee_dep, nom_dep) %>%
  grid_auto(names = "nom_dep", codes = "insee_dep", seed = 4) %>%
  add_row(row = 8,
          col = 1,
          name_nom_dep = "Guadeloupe",
          code_insee_dep = "971") %>%
  add_row(row = 9,
          col = 1,
          name_nom_dep = "Martinique",
          code_insee_dep = "972") %>%
  add_row(row = 10,
          col = 1,
          name_nom_dep = "Guyane",
          code_insee_dep = "973") %>%
  add_row(row = 7,
          col = 13,
          name_nom_dep = "Mayotte",
          code_insee_dep = "976") %>%
  add_row(row = 8,
          col = 13,
          name_nom_dep = "La Réunion",
          code_insee_dep = "974")

grid_fr[grid_fr$code_insee_dep %in% c("2A", "2B"), "col"] <- 13
grid_fr[grid_fr$code_insee_dep %in% c("2A", "2B"), "row"] <- grid_fr[grid_fr$code_insee_dep %in% c("2A", "2B"), "row"] - 1


# graphique -----------------------------------------------------

df <- covid %>%
  filter(sexe == 0) %>%
  rename(deces = dc,
         reanim = rea,
         hospit = hosp) %>%
  left_join(pop,
            by = c("dep" = "x1")) %>%
  mutate(incidence = deces / x2020_p * 100000) %>%
  rename(insee_dep = dep) %>%
  left_join(grid_fr %>%
              select(nom_dep = name_nom_dep,
                     insee_dep = code_insee_dep)) %>%
  drop_na(insee_dep) %>%
  ggplot(aes(jour, incidence)) +
    geom_area() +
    facet_geo(~ nom_dep, grid = grid_fr) +
    labs(title = "Mortalité",
       subtitle = "COVID-19 - France",
       x = "date",
       y = "décès pour\n100 000 hab.",
       caption = glue("http://r.iresmi.net/\ndonnées SPF {Sys.Date()}")) +
    theme_minimal() +
    theme(strip.text = element_text(hjust = 0, size = 7))

ggsave(glue("resultats/covid_fr_mortalite_geofacette_{Sys.Date()}.png"),
       width = 25, height = 20, units = "cm", scaling = .8, res = 300, device = ragg::agg_png)

Catégories
Non classé

Generalized Monty Hall problem

A simulation of the Monty Hall problem outcomes for n doors (k opened) à la Tidyverse…

library(tidyverse)

# sample vectors whether they have one or more elements
resample <- function(x, ...) x[sample.int(length(x), ...)]

monty <- function(doors = 3, monty_opens_doors = 1, n = 10000, seed = 0) {
	set.seed(seed)
	tibble(car = sample(doors, n, replace = TRUE),
	       choice = sample(doors, n, replace = TRUE)) %>% 
	  rowwise() %>% 
	  mutate(monty_chose = list(resample(setdiff(1:doors, c(car, choice)), monty_opens_doors)),
	         win_no_switch = car == choice,
	         win_switch = car == resample(setdiff(1:doors, unlist(c(choice, monty_chose))), 1)) %>% 
	  ungroup() %>% 
	  summarise(win_if_not_switching = sum(win_no_switch) / n() * 100,
	            win_with_switching = sum(win_switch) / n() * 100)
}
> monty() # classic values
# A tibble: 1 x 2
  win_if_not_switching win_with_switching
                 <dbl>              <dbl>
1                 33.4               66.6
> monty(10) # more doors (10), 1 opened
# A tibble: 1 x 2
  win_if_not_switching win_with_switching
                 <dbl>              <dbl>
1                 10.4               11.0
> monty(10, 3) # 10 doors, 3 opened
# A tibble: 1 x 2
  win_if_not_switching win_with_switching
                 <dbl>              <dbl>
1                 10.4               15.2

So, switch…

Catégories
Non classé

Polygons to hexagons

Hexagon tessellation using the great {geogrid} package.

The départements are the second level of administrative government in France. They neither have the same area nor the same population and this heterogeneity provides a few challenges for a fair and accurate map representation (see the post on smoothing).

However if we are just interested in the départements as units, we can use a regular grid for visualization. Since France is often called the hexagon, we could even use an hexagon tiling (a fractal map !)…

Creating the grid and conserving minimal topological relations and the general shape can be time consuming, but thanks to Geogrid it’s quite easy. The geogrid dev page provides nice examples. We will reuse our code of the COVID19 animation. The resulting GIS file is provided below.

# Carto décès COVID 19 hexagones
# France métro. + DOM
# Animation
# DONNEES SPF


# packages ----------------------------------------------------------------
library(tidyverse)
library(httr)
library(fs)
library(sf)
library(readxl)
library(janitor)
library(glue)
library(tmap)
library(grid)
library(classInt)
library(magick)
library(geogrid)


# sources -----------------------------------------------------------------

# https://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-lepidemie-de-covid-19/
fichier_covid <- "donnees/covid.csv"
url_donnees_covid <- "https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7"

# https://www.insee.fr/fr/statistiques/2012713#tableau-TCRD_004_tab1_departements
fichier_pop <- "donnees/pop.xls"
url_donnees_pop <- "https://www.insee.fr/fr/statistiques/fichier/2012713/TCRD_004.xls"

# Adminexpress : à télécharger manuellement
# https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#admin-express
aex <- path_expand("~/Downloads/ADMIN-EXPRESS_2-2__SHP__FRA_2020-02-24/ADMIN-EXPRESS/1_DONNEES_LIVRAISON_2020-02-24")


# config ------------------------------------------------------------------

force_download <- FALSE # retélécharger même si le fichier existe et a été téléchargé aujourd'hui ?


# téléchargement ------------------------------------------------------

if (!dir_exists("donnees")) dir_create("donnees")
if (!dir_exists("resultats")) dir_create("resultats")
if (!dir_exists("resultats/animation_spf_hex")) dir_create("resultats/animation_spf_hex")

if (!file_exists(fichier_covid) |
    file_info(fichier_covid)$modification_time < Sys.Date() |
    force_download) {
  GET(url_donnees_covid,
      progress(),
      write_disk(fichier_covid, overwrite = TRUE)) %>%
    stop_for_status()
}

if (!file_exists(fichier_pop)) {
  GET(url_donnees_pop,
      progress(),
      write_disk(fichier_pop)) %>%
    stop_for_status()
}


# données -----------------------------------------------------------------

covid <- read_csv2(fichier_covid)

# adminexpress prétéléchargé
dep <- read_sf(path(aex, "ADE_2-2_SHP_LAMB93_FR/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  mutate(surf_ha = st_area(geometry) * 10000) %>%
  st_set_crs(2154)

# grille hexagonale
dep_cells_hex <- calculate_grid(shape = dep, grid_type = "hexagonal", seed = 3)
dep_hex <- assign_polygons(dep, dep_cells_hex) %>%
  st_set_crs(2154)

# Pour les DOM on duplique et déplace un département existant
d971 <- dep_hex[dep_hex$insee_dep == "29", ]
d971$geometry[[1]] <- d971$geometry[[1]] + st_point(c(0, -150000))
d971$insee_dep <- "971"

d972 <- dep_hex[dep_hex$insee_dep == "29", ]
d972$geometry[[1]] <- d972$geometry[[1]] + st_point(c(0, -250000))
d972$insee_dep <- "972"

d973 <- dep_hex[dep_hex$insee_dep == "29", ]
d973$geometry[[1]] <- d973$geometry[[1]] + st_point(c(0, -350000))
d973$insee_dep <- "973"

d974 <- dep_hex[dep_hex$insee_dep == "2A", ]
d974$geometry[[1]] <- d974$geometry[[1]] + st_point(c(0, 250000))
d974$insee_dep <- "974"

d976 <- dep_hex[dep_hex$insee_dep == "2A", ]
d976$geometry[[1]] <- d976$geometry[[1]] + st_point(c(0, 350000))
d976$insee_dep <- "976"

dep_hex <- rbind(dep_hex, d971, d972, d973, d974, d976)

# population
pop <- read_xls(fichier_pop, skip = 2) %>%
  clean_names()

# lignes de séparation DOM / métropole
encarts <- st_multilinestring(
  list(st_linestring(matrix(c(1100000, 6500000,
                              1100000, 6257000,
                              1240000, 6257000), byrow = TRUE, nrow = 3)),
       st_linestring(matrix(c(230000, 6692000,
                              230000, 6391000), byrow = TRUE, nrow = 2)))) %>%
  st_sfc() %>%
  st_sf(id = 1, geometry = .) %>%
  st_set_crs(2154)

# traitement --------------------------------------------------------------

# jointures des données
creer_df <- function(territoire, date = NULL) {
  territoire %>%
    left_join(pop, by = c("insee_dep" = "x1")) %>%
    left_join(
      covid %>%
        filter(jour == if_else(is.null(date), max(jour), date),
               sexe == 0) %>%
        rename(deces = dc,
               reanim = rea,
               hospit = hosp),
      by = c("insee_dep" = "dep")) %>%
    mutate(incidence = deces / x2020_p * 100000)
}

incidence <- creer_df(dep_hex)

set.seed(1234)
classes <- classIntervals(incidence$incidence, n = 6, style = "kmeans", dataPrecision = 0)$brks

# carto -------------------------------------------------------------------
# décès cate du dernier jour dispo

carte <- tm_layout(title = glue("COVID-19\nFrance\n{max(covid$jour)}"),
                         legend.position = c("left", "bottom"),
                         frame = FALSE) +
  tm_shape(incidence) +
  tm_polygons(col = "incidence", title = "décés\ncumulés pour\n100 000 hab.",
              breaks = classes,
              palette = "viridis",
              legend.reverse = TRUE,
              legend.format = list(text.separator = "à moins de",
                                   digits = 0)) +
  tm_text("insee_dep", size = .8) +
  tm_shape(encarts) +
  tm_lines(lty = 3) +
  tm_credits(glue("http://r.iresmi.net/
                    classif. kmeans
                    données départementales Santé Publique France,
                    INSEE RP 2020, d'après IGN Adminexpress 2020"),
             position = c(.6, 0),
             size = .5)

fichier_carto <- glue("resultats/covid_hex_fr_{max(covid$jour)}.png")

tmap_save(carte, fichier_carto, width = 900, height = 900, scale = .4)


# animation ---------------------------------------------------------------

image_animation <- function(date) {
  message(glue("\n\n{date}\n==========\n"))

  m <- creer_df(dep_hex, date) %>%
    tm_shape() +
    tm_polygons(col = "incidence", title = "décés\ncumulés pour\n100 000 hab.",
                breaks = classes,
                palette = "viridis",
                legend.reverse = TRUE,
                legend.format = list(text.separator = "à moins de",
                                     digits = 0)) +
    tm_text("insee_dep", size = .8) +
    tm_shape(encarts) +
    tm_lines(lty = 3) +
    tm_layout(title = glue("COVID-19\nFrance\n{date}"),
              legend.position = c("left", "bottom"),
              frame = FALSE) +
    tm_credits(glue("http://r.iresmi.net/
                    classif. kmeans
                    données départementales Santé Publique France,
                    INSEE RP 2020, d'après IGN Adminexpress 2020"),
               position = c(.6, 0),
               size = .5)

  tmap_save(m, glue("resultats/animation_spf_hex/covid_fr_{date}.png"),
            width = 800, height = 800, scale = .4,)
}

unique(covid$jour) %>%
  walk(image_animation)

animation <- glue("resultats/deces_covid19_fr_hex_spf_{max(covid$jour)}.gif")

dir_ls("resultats/animation_spf_hex") %>%
  map(image_read) %>%
  image_join() %>%
  image_animate(fps = 2, optimize = TRUE) %>%
  image_write(animation)

COVID decease

The global shape and relations are well rendered. Deformations are quite important for the small départements around Paris, but the map remains legible.

Shift
Catégories
Non classé

Europe COVID-19 death map

COVID-19 deaths in Europe
# Europe COVID-19 deaths animated map
# http://r.iresmi.net/
# data European Centre for Disease Prevention and Control


# packages ----------------------------------------------------------------
library(tidyverse)
library(httr)
library(fs)
library(sf)
library(readxl)
library(janitor)
library(glue)
library(tmap)
library(grid)
library(classInt)
library(magick)
# + btb, raster, fasterize, plyr


# sources -----------------------------------------------------------------

# https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data
covid_file <- "covid_eu.csv"
covid_url <- "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv"

countries_file <- "ne_50m_admin_0_countries.shp"
countries_url <- "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip"


# config ------------------------------------------------------------------

radius <- 600000 # smoothing radius (m)
pixel <- 100000 # grid resolution (m)

force_download <- FALSE # download even if already downloaded today ?

#' Kernel weighted smoothing with arbitrary bounding area
#'
#' @param df sf object (points)
#' @param field weight field in the df
#' @param bandwidth kernel bandwidth (map units)
#' @param resolution output grid resolution (map units)
#' @param zone sf study zone (polygon)
#' @param out_crs EPSG (should be an equal-area projection)
#'
#' @return a raster object
#' @import btb, raster, fasterize, dplyr, plyr, sf
lissage <- function(df, field, bandwidth, resolution, zone, out_crs = 3035) {
  if (st_crs(zone)$epsg != out_crs) {
    message("reprojecting data...")
    zone <- st_transform(zone, out_crs)
  }

  if (st_crs(df)$epsg != out_crs) {
    message("reprojecting study zone...")
    df <- st_transform(df, out_crs)
  }

  zone_bbox <- st_bbox(zone)

  # grid generation
  message("generating reference grid...")
  zone_xy <- zone %>%
    dplyr::select(geometry) %>%
    st_make_grid(
      cellsize = resolution,
      offset = c(plyr::round_any(zone_bbox[1] - bandwidth, resolution, f = floor),
                 plyr::round_any(zone_bbox[2] - bandwidth, resolution, f = floor)),
      what = "centers") %>%
    st_sf() %>%
    st_join(zone, join = st_intersects, left = FALSE) %>%
    st_coordinates() %>%
    as_tibble() %>%
    dplyr::select(x = X, y = Y)

  # kernel
  message("computing kernel...")
  kernel <- df %>%
    cbind(., st_coordinates(.)) %>%
    st_set_geometry(NULL) %>%
    dplyr::select(x = X, y = Y, field) %>%
    btb::kernelSmoothing(
      dfObservations = .,
      sEPSG = out_crs,
      iCellSize = resolution,
      iBandwidth = bandwidth,
      vQuantiles = NULL,
      dfCentroids = zone_xy
    )

  # rasterization
  message("\nrasterizing...")
  raster::raster(
    xmn = plyr::round_any(zone_bbox[1] - bandwidth, resolution, f = floor),
    ymn = plyr::round_any(zone_bbox[2] - bandwidth, resolution, f = floor),
    xmx = plyr::round_any(zone_bbox[3] + bandwidth, resolution, f = ceiling),
    ymx = plyr::round_any(zone_bbox[4] + bandwidth, resolution, f = ceiling),
    resolution = resolution
  ) %>%
    fasterize::fasterize(kernel, ., field = field)
}


# download data ------------------------------------------------------------

if (!dir_exists("data")) dir_create("data")
if (!dir_exists("results")) dir_create("results")
if (!dir_exists("results/animation_eu")) dir_create("results/animation_eu")

if (!file_exists(path("data", covid_file)) |
    file_info(path("data", covid_file))$modification_time < Sys.Date() |
    force_download) {
  GET(covid_url,
      progress(),
      write_disk(path("data", covid_file), overwrite = TRUE)) %>%
    stop_for_status()
}

if (!file_exists(path("data", countries_file))) {
  dl <- file_temp()

  GET(countries_url,
      progress(),
      write_disk(dl)) %>%
    stop_for_status()

  unzip(dl, exdir = "data")
}


# data --------------------------------------------------------------------

# some countries doesn't have data for the first or latest days ; we fill with latest
# data
covid <- read_csv(path("data", covid_file),
                  col_types = cols(dateRep = col_date(format = "%d/%m/%Y")),
                  na = c("N/A", "")) %>%
  clean_names() %>%
  complete(geo_id, date_rep) %>%
  replace_na(list(deaths = 0)) %>%
  group_by(geo_id) %>%
  arrange(date_rep) %>%
  mutate(deaths_cum = cumsum(deaths)) %>%
  fill(countryterritory_code, countries_and_territories, pop_data2018, continent_exp, .direction = "up") %>%
  ungroup() %>%
  select(-c(day, month, year, cases))

# keep only european countries minus Russia and adding TUR and CYP
# remove overseas territories, reproject in LAEA
countries <- read_sf(path("data", countries_file)) %>%
  clean_names() %>%
  filter(continent == "Europe" & iso_a3_eh != "RUS" | iso_a3_eh %in% c("TUR", "CYP")) %>%
  st_cast("POLYGON") %>%
  st_set_crs(4326) %>%
  st_join(c(xmin = -20, xmax = 35, ymin = 35, ymax = 70) %>%
            st_bbox() %>%
            st_as_sfc() %>%
            st_as_sf() %>%
            st_set_crs(4326),
          left = FALSE) %>%
  group_by(iso_a3_eh) %>%
  summarise(geometry = st_combine(geometry)) %>%
  st_transform(3035)

# pretreatment -----------------------------------------------------------


# mask to generate grid : union all countries
unioned_countries_file <- "data/eu.rds"

if (!file_exists(unioned_countries_file)) {
  unioned_countries <- countries %>%
    st_union() %>%
    st_sf() %>%
    write_rds(unioned_countries_file)
} else {
  unioned_countries <- read_rds(unioned_countries_file)
}

# join countries/data for a specific date
create_df <- function(territory, date = NULL) {
  covid %>%
    filter(date_rep == if_else(is.null(date), max(date_rep), date)) %>%
    right_join(countries,
              by = c("countryterritory_code" = "iso_a3_eh")) %>%
    st_as_sf() %>%
    st_point_on_surface() %>% 
    drop_na(deaths_cum) %>% 
    st_as_sf()
}

covid_geo <- create_df(countries)


# smoothing for last date ---------------------------------------------------

# deaths
d <- covid_geo %>%
  lissage("deaths_cum", radius, pixel, unioned_countries)

# population 
p <- covid_geo %>%
  lissage("pop_data2018", radius, pixel, unioned_countries)

# grid per 100000 inhab
death_pop <- d * 100000 / p


# carto -------------------------------------------------------------------

# classification for last date to be reused in animation
set.seed(1234)
classes <- classIntervals(raster::values(death_pop), n = 6, style = "kmeans", dataPrecision = 0)$brks


# animation ---------------------------------------------------------------

image_animation <- function(date) {
  message(glue("\n\n{date}\n==========\n"))

  m <- create_df(countries, date) %>%
    lissage("deaths_cum", radius, pixel, unioned_countries) %>%
    magrittr::divide_by(p) %>%
    magrittr::multiply_by(100000) %>%
    tm_shape() +
    tm_raster(title = glue("deaths
                         per 100 000 inhab."),
              style = "fixed",
              breaks = classes,
              palette = "viridis",
              legend.format = list(text.separator = "to less than",
                                   digits = 0),
              legend.reverse = TRUE) +
    tm_layout(title = glue("COVID-19 - Europe\ncumulative as of {date}"),
              legend.position = c("right", "top"),
              frame = FALSE) +
    #tm_shape(countries, bbox = death_pop) +
    #tm_borders() +
    tm_credits(glue("http://r.iresmi.net/
                  bisquare kernel smoothing {radius / 1000} km on {pixel / 1000} km grid
                  classif. kmeans, LAEA Europe projection
                  data European Centre for Disease Prevention and Control / map Naturalearth"),
               size = .5,
               position = c(.5, .025))
  
  message("saving map...")
  tmap_save(m, glue("results/animation_eu/covid_eu_{date}.png"),
            width = 800, height = 800, scale = .4,)
}

covid %>% 
  filter(date_rep >= "2020-03-15") %>% 
  pull(date_rep) %>% 
  unique() %>%
  walk(image_animation)

animation <- glue("results/deaths_covid19_eu_{max(covid$date_rep)}.gif")

dir_ls("results/animation_eu") %>%
  map(image_read) %>%
  image_join() %>%
  #image_scale("500x500") %>%
  image_morph(frames = 1) %>%
  image_animate(fps = 2, optimize = TRUE) %>%
  image_write(animation)
Catégories
Non classé

COVID-19 decease animation map

Coronavirus decease in France
# Animation carto décès COVID 19 France
# avec lissage

# packages -----------------------------------------------------------------
library(tidyverse)
library(httr)
library(fs)
library(sf)
library(readxl)
library(janitor)
library(glue)
library(tmap)
library(grid)
library(classInt)
library(magick)
# + btb, raster, fasterize, plyr

# sources -----------------------------------------------------------------

# https://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-lepidemie-de-covid-19/
fichier_covid <- "donnees/covid.csv"
url_donnees_covid <- "https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7"

# https://www.insee.fr/fr/statistiques/2012713#tableau-TCRD_004_tab1_departements
fichier_pop <- "donnees/pop.xls"
url_donnees_pop <- "https://www.insee.fr/fr/statistiques/fichier/2012713/TCRD_004.xls"

# Adminexpress : à télécharger manuellement
# https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#admin-express
aex <- path_expand("~/Downloads/ADMIN-EXPRESS_2-2__SHP__FRA_2020-02-24/ADMIN-EXPRESS/1_DONNEES_LIVRAISON_2020-02-24")

# config ------------------------------------------------------------------

rayon <- 100000 # distance de lissage (m)
pixel <- 10000 # résolution grille (m)

force_download <- FALSE # retélécharger même si le fichier existe et a été téléchargé aujourd'hui ?

#' Kernel weighted smoothing with arbitrary bounding area
#'
#' @param df sf object (points)
#' @param field weight field in the df
#' @param bandwidth kernel bandwidth (map units)
#' @param resolution output grid resolution (map units)
#' @param zone sf study zone (polygon)
#' @param out_crs EPSG (should be an equal-area projection)
#'
#' @return a raster object
#' @import btb, raster, fasterize, dplyr, plyr, sf
lissage <- function(df, field, bandwidth, resolution, zone, out_crs = 3035) {
  if (st_crs(zone)$epsg != out_crs) {
    message("reprojecting data...")
    zone <- st_transform(zone, out_crs)
  }
  
  if (st_crs(df)$epsg != out_crs) {
    message("reprojecting study zone...")
    df <- st_transform(df, out_crs)
  }
  
  zone_bbox <- st_bbox(zone)
  
  # grid generation
  message("generating reference grid...")
  zone_xy <- zone %>%
    dplyr::select(geometry) %>%
    st_make_grid(
      cellsize = resolution,
      offset = c(
        plyr::round_any(zone_bbox[1] - bandwidth, resolution, f = floor),
        plyr::round_any(zone_bbox[2] - bandwidth, resolution, f = floor)
      ),
      what = "centers"
    ) %>%
    st_sf() %>%
    st_join(zone, join = st_intersects, left = FALSE) %>%
    st_coordinates() %>%
    as_tibble() %>%
    dplyr::select(x = X, y = Y)
  
  # kernel
  message("computing kernel...")
  kernel <- df %>%
    cbind(., st_coordinates(.)) %>%
    st_set_geometry(NULL) %>%
    dplyr::select(x = X, y = Y, field) %>%
    btb::kernelSmoothing(
      dfObservations = .,
      sEPSG = out_crs,
      iCellSize = resolution,
      iBandwidth = bandwidth,
      vQuantiles = NULL,
      dfCentroids = zone_xy
    )
  
  # rasterization
  message("\nrasterizing...")
  raster::raster(
    xmn = plyr::round_any(zone_bbox[1] - bandwidth, resolution, f = floor),
    ymn = plyr::round_any(zone_bbox[2] - bandwidth, resolution, f = floor),
    xmx = plyr::round_any(zone_bbox[3] + bandwidth, resolution, f = ceiling),
    ymx = plyr::round_any(zone_bbox[4] + bandwidth, resolution, f = ceiling),
    resolution = resolution
  ) %>%
    fasterize::fasterize(kernel, ., field = field)
}


# téléchargement--------------------------------------------------------------

if (!dir_exists("donnees")) dir_create("donnees")
if (!dir_exists("resultats")) dir_create("resultats")
if (!dir_exists("resultats/animation")) dir_create("resultats/animation")

if (!file_exists(fichier_covid) |
    file_info(fichier_covid)$modification_time < Sys.Date() |
    force_download) {
  GET(url_donnees_covid,
      progress(),
      write_disk(fichier_covid, overwrite = TRUE))
}

if (!file_exists(fichier_pop)) {
  GET(url_donnees_pop,
      progress(),
      write_disk(fichier_pop))
}


# données -----------------------------------------------------------------

covid <- read_csv2(fichier_covid)

# adminexpress prétéléchargé
dep <- read_sf(path(aex, "ADE_2-2_SHP_LAMB93_FR/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  st_set_crs(2154)

pop <- read_xls(fichier_pop, skip = 2) %>%
  clean_names()


# prétraitement -----------------------------------------------------------

# contour métropole pour grille de référence
fichier_fr <- "donnees/fr.rds"

if (!file_exists(fichier_fr)) {
  fr <- dep %>%
    st_union() %>%
    st_sf() %>%
    write_rds(fichier_fr)
} else {
  fr <- read_rds(fichier_fr)
}

# jointures des données
creer_df <- function(territoire, date = NULL) {
  territoire %>%
    left_join(pop, by = c("insee_dep" = "x1")) %>%
    left_join(
      covid %>%
        filter(jour == if_else(is.null(date), max(jour), date),
               sexe == 0) %>%
               rename(deces = dc,
                      reanim = rea,
                      hospit = hosp),
      by = c("insee_dep" = "dep")) %>%
    st_point_on_surface()
}

covid_geo_pop <- creer_df(dep)


# lissage -----------------------------------------------------------------
# génération de la dernière grille mortalité
# et création des grilles pour 100000 habitants

# décès métropole 
d <- covid_geo_pop %>%
  lissage("deces", rayon, pixel, fr)


# population métropole et DOM
p <- covid_geo_pop %>%
  lissage("x2020_p", rayon, pixel, fr)

# grilles pour 100000 hab
d100k <- d * 100000 / p


# classification à réutiliser pour les autres cartes
set.seed(1234)
classes <- classIntervals(raster::values(d100k), n = 6, style = "kmeans", dataPrecision = 0)$brks


# animation ---------------------------------------------------------------

image_animation <- function(date) {
  m <- creer_df(dep, date) %>%
    lissage("deces", rayon, pixel, fr) %>%
    magrittr::divide_by(p) %>%
    magrittr::multiply_by(100000) %>%
    tm_shape() +
    tm_raster(title = glue("décès à l'hôpital
                         pour 100 000 hab."),
              style = "fixed",
              breaks = classes,
              palette = "viridis",
              legend.format = list(text.separator = "à moins de",
                                   digits = 0),
              legend.reverse = TRUE) +
    tm_shape(dep) +
    tm_borders() +
    tm_layout(title = glue("COVID-19 - France métropolitaine - cumul au {date}"),
              legend.position = c("left", "bottom"),
              frame = FALSE) +
    tm_credits(glue("http://r.iresmi.net/
                  lissage noyau bisquare {rayon / 1000} km sur grille {pixel / 1000} km
                  classif. kmeans
                  projection LAEA Europe
                  données départementales Santé publique France,
                  INSEE RP 2020, IGN Adminexpress 2020"),
               size = .5,
               position = c(.5, .025))
  
  tmap_save(m, glue("resultats/animation/covid_fr_{date}.png"),
            width = 800, height = 800, scale = .4,)
}

unique(covid$jour) %>%
  walk(image_animation)

animation <- glue("resultats/deces_covid19_fr_{max(covid$jour)}.gif")

dir_ls("resultats/animation") %>%
  map(image_read) %>%
  image_join() %>%
  #image_scale("500x500") %>%
  image_morph(frames = 1) %>%
  image_animate(fps = 2, optimize = TRUE) %>%
  image_write(animation)


Catégories
Non classé

Coronavirus : spatially smoothed decease in France

Coronavirus decease in France

See also the animated map.

From the official data by Santé Publique France, we spatially smooth the decease (produced by SPF at the département scale) and normalize by a similarly smoothed population grid. For that we use the {btb} package.

# Carto décès COVID 19 France
# avec lissage


# sources -----------------------------------------------------------------

# https://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-lepidemie-de-covid-19/
fichier_covid <- "donnees/covid.csv"
url_donnees_covid <- "https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7"

# https://www.insee.fr/fr/statistiques/2012713#tableau-TCRD_004_tab1_departements
fichier_pop <- "donnees/pop.xls"
url_donnees_pop <- "https://www.insee.fr/fr/statistiques/fichier/2012713/TCRD_004.xls"

# Adminexpress : à télécharger manuellement
# https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#admin-express
#aex <- "donnees/1_DONNEES_LIVRAISON_2019-03-14/"
aex <- path_expand("~/Downloads/ADMIN-EXPRESS_2-2__SHP__FRA_2020-02-24/ADMIN-EXPRESS/1_DONNEES_LIVRAISON_2020-02-24")

# config ------------------------------------------------------------------
library(tidyverse)
library(httr)
library(fs)
library(sf)
library(readxl)
library(janitor)
library(glue)
library(tmap)
library(grid)
library(classInt)
# + btb, raster, fasterize, plyr

rayon <- 100000 # distance de lissage (m)
pixel <- 10000 # résolution grille (m)

force_download <- TRUE # retélécharger même si le fichier existe et a été téléchargé aujourd'hui ?

#' Kernel weighted smoothing with arbitrary bounding area
#'
#' @param df sf object (points)
#' @param field weight field in the df
#' @param bandwidth kernel bandwidth (map units)
#' @param resolution output grid resolution (map units)
#' @param zone sf study zone (polygon)
#' @param out_crs EPSG (should be an equal-area projection)
#'
#' @return a raster object
#' @import btb, raster, fasterize, dplyr, plyr, sf
lissage <- function(df, field, bandwidth, resolution, zone, out_crs = 3035) {
    if (st_crs(zone)$epsg != out_crs) {
      message("reprojecting data...")
      zone <- st_transform(zone, out_crs)
    }

    if (st_crs(df)$epsg != out_crs) {
      message("reprojecting study zone...")
      df <- st_transform(df, out_crs)
    }

    zone_bbox <- st_bbox(zone)

    # grid generation
    message("generating reference grid...")
    zone_xy <- zone %>%
      dplyr::select(geometry) %>%
      st_make_grid(
        cellsize = resolution,
        offset = c(
          plyr::round_any(zone_bbox[1] - bandwidth, resolution, f = floor),
          plyr::round_any(zone_bbox[2] - bandwidth, resolution, f = floor)
        ),
        what = "centers"
      ) %>%
      st_sf() %>%
      st_join(zone, join = st_intersects, left = FALSE) %>%
      st_coordinates() %>%
      as_tibble() %>%
      dplyr::select(x = X, y = Y)

    # kernel
    message("computing kernel...")
    kernel <- df %>%
      cbind(., st_coordinates(.)) %>%
      st_set_geometry(NULL) %>%
      dplyr::select(x = X, y = Y, field) %>%
      btb::kernelSmoothing(
        dfObservations = .,
        sEPSG = out_crs,
        iCellSize = resolution,
        iBandwidth = bandwidth,
        vQuantiles = NULL,
        dfCentroids = zone_xy
      )

    # rasterization
    message("\nrasterizing...")
    raster::raster(
      xmn = plyr::round_any(zone_bbox[1] - bandwidth, resolution, f = floor),
      ymn = plyr::round_any(zone_bbox[2] - bandwidth, resolution, f = floor),
      xmx = plyr::round_any(zone_bbox[3] + bandwidth, resolution, f = ceiling),
      ymx = plyr::round_any(zone_bbox[4] + bandwidth, resolution, f = ceiling),
      resolution = resolution
    ) %>%
      fasterize::fasterize(kernel, ., field = field)
  }


# téléchargement--------------------------------------------------------------
if (!file_exists(fichier_covid) |
    file_info(fichier_covid)$modification_time < Sys.Date() |
    force_download) {
  GET(url_donnees_covid,
      progress(),
      write_disk(fichier_covid, overwrite = TRUE))
}

if (!file_exists(fichier_pop)) {
  GET(url_donnees_pop,
      progress(),
      write_disk(fichier_pop))
}


# données -----------------------------------------------------------------

covid <- read_csv2(fichier_covid)

# adminexpress prétéléchargé
dep <- read_sf(path(aex, "ADE_2-2_SHP_LAMB93_FR/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  st_set_crs(2154)

dep_971 <- read_sf(path(aex, "ADE_2-2_SHP_RGAF09UTM20_D971/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  st_set_crs(5490)

dep_972 <- read_sf(path(aex, "ADE_2-2_SHP_RGAF09UTM20_D972/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  st_set_crs(5490)

dep_973 <- read_sf(path(aex, "ADE_2-2_SHP_UTM22RGFG95_D973/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  st_set_crs(2972)

dep_974 <- read_sf(path(aex, "ADE_2-2_SHP_RGR92UTM40S_D974/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  st_set_crs(2975)

dep_976 <- read_sf(path(aex, "ADE_2-2_SHP_RGM04UTM38S_D976/DEPARTEMENT.shp")) %>%
  clean_names() %>%
  st_set_crs(4471)

pop <- read_xls(fichier_pop, skip = 2) %>%
  clean_names()


# prétraitement -----------------------------------------------------------

# contour métropole
fr <- dep %>%
  st_union() %>%
  st_sf()

# jointures des données
creer_df <- function(territoire) {
  territoire %>%
    left_join(pop, by = c("insee_dep" = "x1")) %>%
    left_join(
      covid %>%
        filter(jour == max(jour),
               sexe == 0) %>%
        group_by(dep) %>%
        summarise(deces = sum(dc, na.rm = TRUE),
                  reanim = sum(rea, na.rm = TRUE),
                  hospit = sum(hosp, na.rm = TRUE)),
      by = c("insee_dep" = "dep")) %>%
    st_point_on_surface()
}

covid_geo_pop     <- creer_df(dep)
covid_geo_pop_971 <- creer_df(dep_971)
covid_geo_pop_972 <- creer_df(dep_972)
covid_geo_pop_973 <- creer_df(dep_973)
covid_geo_pop_974 <- creer_df(dep_974)
covid_geo_pop_976 <- creer_df(dep_976)

# lissage -----------------------------------------------------------------
# génération des grilles mortalité, hospitalisation et réanimation et population
# et création des grilles pour 100000 habitants

# décès métropole et DOM
d <- covid_geo_pop %>%
  lissage("deces", rayon, pixel, fr)

d_971 <- covid_geo_pop_971 %>%
  lissage("deces", rayon, pixel, dep_971, 5490)

d_972 <- covid_geo_pop_972 %>%
  lissage("deces", rayon, pixel, dep_972, 5490)

d_973 <- covid_geo_pop_973 %>%
  lissage("deces", rayon, pixel, dep_973, 2972)

d_974 <- covid_geo_pop_974 %>%
  lissage("deces", rayon, pixel, dep_974, 2975)

d_976 <- covid_geo_pop_976 %>%
  lissage("deces", rayon, pixel, dep_976, 4471)

# population métropole et DOM
p <- covid_geo_pop %>%
  lissage("x2020_p", rayon, pixel, fr)

p_971 <- covid_geo_pop_971 %>%
  lissage("x2020_p", rayon, pixel, dep_971, 5490)

p_972 <- covid_geo_pop_972 %>%
  lissage("x2020_p", rayon, pixel, dep_972, 5490)

p_973 <- covid_geo_pop_973 %>%
  lissage("x2020_p", rayon, pixel, dep_973, 2972)

p_974 <- covid_geo_pop_974 %>%
  lissage("x2020_p", rayon, pixel, dep_974, 2975)

p_976 <- covid_geo_pop_976 %>%
  lissage("x2020_p", rayon, pixel, dep_976, 4471)

# grilles pour 100000 hab
d100k <- d * 100000 / p
d100k_971 <- d_971 * 100000 / p_971
d100k_972 <- d_972 * 100000 / p_972
d100k_973 <- d_973 * 100000 / p_973
d100k_974 <- d_974 * 100000 / p_974
d100k_976 <- d_976 * 100000 / p_976

# réanimation et hospitalisation métropole uniquement
r <- covid_geo_pop %>%
  lissage("reanim", rayon, pixel, fr)
r100k <- r * 100000 / p

h <- covid_geo_pop %>%
  lissage("hospit", rayon, pixel, fr)
h100k <- h * 100000 / p


# carto -------------------------------------------------------------------

# décès métropole et DOM

# classification à réutiliser pour les 6 cartes
set.seed(1234)
classes <- classIntervals(raster::values(d100k), n = 5, style = "kmeans", dataPrecision = 0)$brks

# métro et DOM
(carte_d <- tm_layout(title = paste("COVID-19 - France - cumul au", max(covid$jour)),
                     legend.position = c("left", "bottom"),
                     frame = FALSE) +
  tm_shape(d100k) +
  tm_raster(title = glue("décès à l'hôpital
                         pour 100 000 hab."),
            style = "fixed",
            breaks = classes,
            palette = "viridis",
            legend.format = list(text.separator = "à moins de",
                                 digits = 0),
            legend.reverse = TRUE) +
  tm_shape(dep) +
  tm_borders() +
  tm_credits(glue("http://r.iresmi.net/
                  lissage noyau bisquare {rayon / 1000} km sur grille {pixel / 1000} km
                  classif. kmeans
                  projections LAEA Europe (métropole) et locales (DOM)
                  données départementales Santé publique France,
                  INSEE RP 2020, IGN Adminexpress 2020"),
             size = .5,
             position = c(.5, .025))
)

tm_971 <- tm_shape(d100k_971, ext = 0.7) +
  tm_raster(style = "fixed",
            breaks = classes,
            palette = "viridis",
            legend.show = FALSE) +
  tm_shape(dep_971) +
  tm_borders() +
  tm_layout(frame = FALSE,
            bg.color = NA)

tm_972 <- tm_shape(d100k_972, ext = 0.7) +
  tm_raster(style = "fixed",
            breaks = classes,
            palette = "viridis",
            legend.show = FALSE) +
  tm_shape(dep_972) +
  tm_borders() +
  tm_layout(frame = FALSE,
            bg.color = NA)

tm_973 <- tm_shape(d100k_973) +
  tm_raster(style = "fixed",
            breaks = classes,
            palette = "viridis",
            legend.show = FALSE) +
  tm_shape(dep_973) +
  tm_borders() +
  tm_layout(frame = FALSE,
            bg.color = NA)

tm_974 <- tm_shape(d100k_974, ext = 0.75) +
  tm_raster(style = "fixed",
            breaks = classes,
            palette = "viridis",
            legend.show = FALSE) +
  tm_shape(dep_974) +
  tm_borders()+
  tm_layout(frame = FALSE,
            bg.color = NA)

tm_976 <- tm_shape(d100k_976, ext = 0.6) +
  tm_raster(style = "fixed",
            breaks = classes,
            palette = "viridis",
            legend.show = FALSE) +
  tm_shape(dep_976) +
  tm_borders()+
  tm_layout(frame = FALSE,
            bg.color = NA)

# assemblage
fichier_carto <- glue("resultats/covid_fr_{max(covid$jour)}.png")

tmap_save(carte_d, fichier_carto, width = 900, height = 900, scale = .4,
          insets_tm = list(tm_971, tm_972, tm_973, tm_974, tm_976),
          insets_vp = list(viewport(x = .1, y = .65, width = .15, height = .15),
                           viewport(x = .1, y = .58, width = .15, height = .15),
                           viewport(x = .15, y = .4, width = .35, height = .45),
                           viewport(x = .9, y = .4, width = .15, height = .15),
                           viewport(x = .9, y = .5, width = .15, height = .15)))

Catégories
Non classé

Mauna Loa CO₂ polar plot

After a classic plot of the Keeling curve (see our former post) used on Wikipedia, we can explore another data visualization. The CO₂ atmospheric concentration, the main cause of the climate warming, is following a seasonal cycle so it could be interesting (or ironic ?) to use a polar plot.

Config and data

We only keep two translations for brevity here…

# Required packages
library(tidyverse)
library(scales)
library(lubridate)

# Translations ------------------------------------------------------------

language <- list(
  en_US = list(
    locale_lc_time = "en_US.UTF-8",
    title = bquote("Monthly mean"~CO[2]~"concentration"),
    caption = paste("Data : P. Tans, NOAA/ESRL (www.esrl.noaa.gov/gmd/ccgg/trends/)\nand R. Keeling, Scripps Institution of Oceanography (scrippsco2.ucsd.edu/). Accessed", Sys.Date()),
    x = "Year",
    y = bquote(CO[2]~"fraction in dry air ("*mu*"mol/mol)"),
    x2 = "Month",
    y2 = bquote(atop(CO[2]~"fraction in dry air ("*mu*"mol/mol)", "Departure from yearly average")),
    title2 = "Seasonal variation"
  ),
  fr_FR = list(
    locale_lc_time = "fr_FR.UTF-8",
    title = bquote("Moyenne mensuelle de la concentration de"~CO[2]),
    caption = paste("données : P. Tans, NOAA/ESRL (www.esrl.noaa.gov/gmd/ccgg/trends/)\net R. Keeling, Scripps Institution of Oceanography (scrippsco2.ucsd.edu/). Accédé le", Sys.Date()),
    x = "année",
    y = bquote("fraction de"~CO[2]~"dans l'air sec ("*mu*"mol/mol)"),
    x2 = "mois",
    y2 = bquote(atop("fraction de"~CO[2]~"dans l'air sec ("*mu*"mol/mol)", "en écart à la moyenne annuelle")),
    title2 = "Variation saisonnière"
  ))

# Data --------------------------------------------------------------------

# https://www.esrl.noaa.gov/gmd/ccgg/trends/
co2ml <- read_delim("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt",
                    delim = " ",
                    locale = locale(decimal_mark = "."),
                    na = c("-99.99", "-1"),
                    col_types = "iiddddi",
                    col_names = c("year", "month", "decimal", "co2", "co2_interpol", "co2_trend", "days"),
                    comment = "#",
                    trim_ws = TRUE) %>% 
  group_by(year) %>% 
  mutate(year_mean = mean(co2_interpol, na.rm = TRUE),
         delta = co2_interpol - year_mean,
         vdate = ymd(paste0("2015-", month, "-01"))) %>% 
  ungroup()

Create the plot for each language and save

We use a virtual date to keep the data in the same January-December interval and we add a partial dataframe to smooth the Dec./Jan. transition and build the spiral.

# Polar plot
for (l in names(language)) {
  message(l)
  current <- language[[l]]
  
  # format the date in local names
  Sys.setlocale("LC_TIME", current$locale_lc_time)
  
  p3 <- co2ml %>% 
    filter(vdate == "2015-01-01") %>% 
    mutate(vdate = ymd("2015-12-31"),
           year = year -1) %>% 
    bind_rows(co2ml) %>% 
    ggplot(aes(vdate, co2_interpol, group = year, color = year)) +
      geom_line(size = 1.2) +
      scale_x_date(breaks = pretty_breaks(12), labels = date_format("%b")) +
      scale_color_viridis_c() +
      labs(subtitle = current$title,
           x = "",
           y = current$y,
           color = current$x,
           title = paste("Mauna Loa", min(co2ml$year), "-", max(co2ml$year)),
           caption = current$caption) +
      coord_polar() +
      theme_bw() +
      theme(axis.title.y = element_text(hjust = .85),
            panel.grid.major.y  = element_blank(),
            panel.grid.minor.x = element_blank(),
            panel.border = element_blank(),
            plot.caption = element_text(size = 7))
  
  ggsave(p3, file = paste("co2_mauna_loa_polar", l, Sys.Date(), "wp.svg", sep = "_"), width = 20, height = 20, units = "cm", device = svg)
}
60 years of CO₂ increase

Does this new year look good ? Will the spiral cross its path soon ?