Chapter 17 Commuting time
=file.exists("commute_times.csv") do_eval
Commuting time is often mentioned as a valid reason for leaving one employer for another.
You are asked to calculate the commuting time for the employee working at your company. You decide that hyphotetically they must all arrive in the office at 8.30 am. In the Google you identified that there is google api for doing this, called Distance Matrix API, which works for a matrix of origins and destinations.
The information returned is based on the recommended route between start and end points and consists of rows containing duration and distance values for each pair." To use the Distance Matrix API, you must first activate the API in the Google Cloud Platform Console and obtain the proper authentication credentials. You need to provide your own API key in each request.
The documentation on how to do this is located here: https://developers.google.com/maps/documentation/distance-matrix/intro
A file with 200 fake names and addresses has been put together for you. The task is to add the commuting time next to each staff member. Normally google will charge us one dollar for checking 200 commuting times.
Please note that the departure_time specifies the desired time of departure must be in POSIXct. format and must be in the future (i.e.greater than sys.time()). If no value is specified it defaults to Sys.time().
Please note you can only specify one of arrival_time or departure_time, not both. If both are supplied, departure_time will be used.
library(googleway)
library(tidyverse)
library(lubridate)
# Set up here your api key key=
= 'AIzaSyCOly69PDrlPlM42I378p2lmvNs8I2w'
key
<- dmy_hms("10/09/2018 08:30")
d
<- as.numeric(d)
arrival
<- c("SE3+8UQ,UK")
from <- c("E14+5EU,UK")
to
<- google_distance(origins = from,
test destinations = to,
mode = "walking",
arrival_time = as.POSIXct(arrival, origin = "1970-01-01", tz = "UTC"),
key=key)
$rows$elements[[1]]$distance test
NULL
To make all this simpler, we put together an R function which can be used to analyse many results. Note this calls all modes of transport for every employee making it more costly than just running one mode per employee.
#' Get distance data between two points based on all the travel mode options. Works for many origin points.
#'
#' @param x A vector of origins in address or postcode format
#' @param dest A single destinationin address or postocde format
#' @param arrival_time A POSIXct datetime that folks need to arrive by
#' @param key A google distance API key
#' @param ... Additional options to pass to `google_distance()`
#'
#' @return Data.frame containing (typically) 4 rows per input element
= function(x, dest, arrival_time, key, ...){
google_distance_all
# simple hygeine stuff
= purrr::possibly(
gd ::memoise(
memoise
google_distance)"Fail"
,
)
# Prep dataset
= expand.grid(from=x,
interested_in mode=c("driving", "walking", "bicycling", "transit"),
stringsAsFactors = FALSE)
# Perform google_distance calls for all combos
::map2(interested_in$from,interested_in$mode,
purrr~gd(.x, dest, mode=.y,
arrival_time = arrival_time,
key=key)
%>%
) # Extract relevant section
::map("rows") %>%
purrr::map("elements") %>%
purrr::flatten() %>%
purrr# Simplify the data.frames
::map(unclass) %>%
purrr::map_df(purrr::flatten) %>%
purrr# Add original lookup values
cbind(interested_in)
}
This needs some packages installed to work
if(!require(purrr)) install.packages("purrr")
if(!require(memoise)) install.packages("memoise")
if(!require(googleway)) install.packages("googleway")
We need some information about our google API account, where our office is, and when we’re testing commute times for.
= "E14 5EU"
office = as.POSIXct("2018-12-03 09:00") monday_9am
Then we can use this function to get data for our 200 employees
<- read_csv("https://hranalytics.netlify.com/data/200_staff_members.csv") the_200
= google_distance_all(
results $Postcode,
the_200
office,arrival_time = monday_9am,
key = key
)
write_csv(results, file.path("data"), "commute_times.csv")
results
# A tibble: 800 x 5
text value status from mode
<chr> <dbl> <chr> <chr> <chr>
1 16 mins 963 OK E15 1NS driving
2 57 mins 3396 OK W12 8ET driving
3 41 mins 2469 OK SW1V 1AP driving
4 19 mins 1123 OK SE1 2RE driving
5 37 mins 2206 OK WC1A 2QS driving
6 43 mins 2604 OK N6 5JW driving
7 31 mins 1871 OK EC1N 8PN driving
8 36 mins 2158 OK WC2B 6TE driving
9 31 mins 1841 OK N1 9AZ driving
10 24 mins 1423 OK WC2R 3LD driving
# ... with 790 more rows
Now that we have this data we can answer questions about our employee’s commute times.
Who has the longest commutes by car?
%>%
results filter(mode == "driving") %>%
mutate(hours=value/60^2) %>%
top_n(10, hours) %>%
arrange(desc(hours))
# A tibble: 10 x 6
text value status from mode hours
<chr> <dbl> <chr> <chr> <chr> <dbl>
1 10 hours 46 mins 38745 OK BT49 0BX driving 10.7625
2 10 hours 33 mins 37986 OK BT51 3SQ driving 10.5517
3 6 hours 45 mins 24274 OK G4 0JY driving 6.74278
4 5 hours 19 mins 19145 OK CA1 1BG driving 5.31806
5 4 hours 35 mins 16519 OK PL1 4GP driving 4.58861
6 4 hours 17 mins 15411 OK BB5 3EZ driving 4.28083
7 3 hours 57 mins 14239 OK OL10 2TL driving 3.95528
8 3 hours 30 mins 12585 OK LS23 7BA driving 3.49583
9 3 hours 26 mins 12341 OK LS24 9QW driving 3.42806
10 3 hours 18 mins 11882 OK S10 3TR driving 3.30056
We can look at the distributions overall too.
%>%
results mutate(hours=value/60^2) %>%
ggplot() +
aes(x=hours) +
geom_density() +
scale_x_log10() +
facet_wrap(~mode) +
theme_minimal()