× Need help learning R? Enroll in Applied Epi's intro R course, try our free R tutorials, post in our Community Q&A forum, or ask about our R Help Desk service.

9 Working with dates

Working with dates in R requires more attention than working with other object classes. Below, we offer some tools and example to make this process less painful. Luckily, dates can be wrangled easily with practice, and with a set of helpful packages such as lubridate.

Upon import of raw data, R often interprets dates as character objects - this means they cannot be used for general date operations such as making time series and calculating time intervals. To make matters more difficult, there are many ways a date can be formatted and you must help R know which part of a date represents what (month, day, hour, etc.).

Dates in R are their own class of object - the Date class. It should be noted that there is also a class that stores objects with date and time. Date time objects are formally referred to as POSIXt, POSIXct, and/or POSIXlt classes (the difference isn’t important). These objects are informally referred to as datetime classes.

  • It is important to make R recognize when a column contains dates.
  • Dates are an object class and can be tricky to work with.
  • Here we present several ways to convert date columns to Date class.

9.1 Preparation

Load packages

This code chunk shows the loading of packages required for this page. In this handbook we emphasize p_load() from pacman, which installs the package if necessary and loads it for use. You can also load installed packages with library() from base R. See the page on R basics for more information on R packages.

# Checks if package is installed, installs if necessary, and loads package for current session

pacman::p_load(
  lubridate,  # general package for handling and converting dates  
  parsedate,   # has function to "guess" messy dates
  aweek,      # another option for converting dates to weeks, and weeks to dates
  zoo,        # additional date/time functions
  tidyverse,  # data management and visualization  
  rio)        # data import/export

Import data

We import the dataset of cases from a simulated Ebola epidemic. If you want to download the data to follow along step-by-step, see instruction in the Download handbook and data page. We assume the file is in the working directory so no sub-folders are specified in this file path.

linelist <- import("linelist_cleaned.xlsx")

9.2 Current date

You can get the current “system” date or system datetime of your computer by doing the following with base R.

# get the system date - this is a DATE class
Sys.Date()
## [1] "2023-07-18"
# get the system time - this is a DATETIME class
Sys.time()
## [1] "2023-07-18 15:34:00 CEST"

With the lubridate package these can also be returned with today() and now(), respectively. date() returns the current date and time with weekday and month names.

9.3 Convert to Date

After importing a dataset into R, date column values may look like “1989/12/30”, “05/06/2014”, or “13 Jan 2020”. In these cases, R is likely still treating these values as Character values. R must be told that these values are dates… and what the format of the date is (which part is Day, which is Month, which is Year, etc).

Once told, R converts these values to class Date. In the background, R will store the dates as numbers (the number of days from its “origin” date 1 Jan 1970). You will not interface with the date number often, but this allows for R to treat dates as continuous variables and to allow special operations such as calculating the distance between dates.

By default, values of class Date in R are displayed as YYYY-MM-DD. Later in this section we will discuss how to change the display of date values.

Below we present two approaches to converting a column from character values to class Date.

TIP: You can check the current class of a column with base R function class(), like class(linelist$date_onset).

base R

as.Date() is the standard, base R function to convert an object or column to class Date (note capitalization of “D”).

Use of as.Date() requires that:

  • You specify the existing format of the raw character date or the origin date if supplying dates as numbers (see section on Excel dates)
  • If used on a character column, all date values must have the same exact format (if this is not the case, try parse_date() from the parsedate package)

First, check the class of your column with class() from base R. If you are unsure or confused about the class of your data (e.g. you see “POSIXct”, etc.) it can be easiest to first convert the column to class Character with as.character(), and then convert it to class Date.

Second, within the as.Date() function, use the format = argument to tell R the current format of the character date components - which characters refer to the month, the day, and the year, and how they are separated. If your values are already in one of R’s standard date formats (“YYYY-MM-DD” or “YYYY/MM/DD”) the format = argument is not necessary.

To format =, provide a character string (in quotes) that represents the current date format using the special “strptime” abbreviations below. For example, if your character dates are currently in the format “DD/MM/YYYY”, like “24/04/1968”, then you would use format = "%d/%m/%Y" to convert the values into dates. Putting the format in quotation marks is necessary. And don’t forget any slashes or dashes!

# Convert to class date
linelist <- linelist %>% 
  mutate(date_onset = as.Date(date_of_onset, format = "%d/%m/%Y"))

Most of the strptime abbreviations are listed below. You can see the complete list by running ?strptime.

%d = Day number of month (5, 17, 28, etc.)
%j = Day number of the year (Julian day 001-366)
%a = Abbreviated weekday (Mon, Tue, Wed, etc.)
%A = Full weekday (Monday, Tuesday, etc.) %w = Weekday number (0-6, Sunday is 0)
%u = Weekday number (1-7, Monday is 1)
%W = Week number (00-53, Monday is week start)
%U = Week number (01-53, Sunday is week start)
%m = Month number (e.g. 01, 02, 03, 04)
%b = Abbreviated month (Jan, Feb, etc.)
%B = Full month (January, February, etc.)
%y = 2-digit year (e.g. 89)
%Y = 4-digit year (e.g. 1989)
%h = hours (24-hr clock)
%m = minutes
%s = seconds %z = offset from GMT
%Z = Time zone (character)

TIP: The format = argument of as.Date() is not telling R the format you want the dates to be, but rather how to identify the date parts as they are before you run the command.

TIP: Be sure that in the format = argument you use the date-part separator (e.g. /, -, or space) that is present in your dates.

Once the values are in class Date, R will by default display them in the standard format, which is YYYY-MM-DD.

lubridate

Converting character objects to dates can be made easier by using the lubridate package. This is a tidyverse package designed to make working with dates and times more simple and consistent than in base R. For these reasons, lubridate is often considered the gold-standard package for dates and time, and is recommended whenever working with them.

The lubridate package provides several different helper functions designed to convert character objects to dates in an intuitive, and more lenient way than specifying the format in as.Date(). These functions are specific to the rough date format, but allow for a variety of separators, and synonyms for dates (e.g. 01 vs Jan vs January) - they are named after abbreviations of date formats.

# install/load lubridate 
pacman::p_load(lubridate)

The ymd() function flexibly converts date values supplied as year, then month, then day.

# read date in year-month-day format
ymd("2020-10-11")
## [1] "2020-10-11"
ymd("20201011")
## [1] "2020-10-11"

The mdy() function flexibly converts date values supplied as month, then day, then year.

# read date in month-day-year format
mdy("10/11/2020")
## [1] "2020-10-11"
mdy("Oct 11 20")
## [1] "2020-10-11"

The dmy() function flexibly converts date values supplied as day, then month, then year.

# read date in day-month-year format
dmy("11 10 2020")
## [1] "2020-10-11"
dmy("11 October 2020")
## [1] "2020-10-11"

If using piping, the conversion of a character column to dates with lubridate might look like this:

linelist <- linelist %>%
  mutate(date_onset = lubridate::dmy(date_onset))

Once complete, you can run class() to verify the class of the column

# Check the class of the column
class(linelist$date_onset)  

Once the values are in class Date, R will by default display them in the standard format, which is YYYY-MM-DD.

Note that the above functions work best with 4-digit years. 2-digit years can produce unexpected results, as lubridate attempts to guess the century.

To convert a 2-digit year into a 4-digit year (all in the same century) you can convert to class character and then combine the existing digits with a pre-fix using str_glue() from the stringr package (see Characters and strings). Then convert to date.

two_digit_years <- c("15", "15", "16", "17")
str_glue("20{two_digit_years}")
## 2015
## 2015
## 2016
## 2017

Combine columns

You can use the lubridate functions make_date() and make_datetime() to combine multiple numeric columns into one date column. For example if you have numeric columns onset_day, onset_month, and onset_year in the data frame linelist:

linelist <- linelist %>% 
  mutate(onset_date = make_date(year = onset_year, month = onset_month, day = onset_day))

9.4 Excel dates

In the background, most software store dates as numbers. R stores dates from an origin of 1st January, 1970. Thus, if you run as.numeric(as.Date("1970-01-01)) you will get 0.

Microsoft Excel stores dates with an origin of either December 30, 1899 (Windows) or January 1, 1904 (Mac), depending on your operating system. See this Microsoft guidance for more information.

Excel dates often import into R as these numeric values instead of as characters. If the dataset you imported from Excel shows dates as numbers or characters like “41369”… use as.Date() (or lubridate’s as_date() function) to convert, but instead of supplying a “format” as above, supply the Excel origin date to the argument origin =.

This will not work if the Excel date is stored in R as a character type, so be sure to ensure the number is class Numeric!

NOTE: You should provide the origin date in R’s default date format (“YYYY-MM-DD”).

# An example of providing the Excel 'origin date' when converting Excel number dates
data_cleaned <- data %>% 
  mutate(date_onset = as.numeric(date_onset)) %>%   # ensure class is numeric
  mutate(date_onset = as.Date(date_onset, origin = "1899-12-30")) # convert to date using Excel origin

9.5 Messy dates

The function parse_date() from the parsedate package attempts to read a “messy” date column containing dates in many different formats and convert the dates to a standard format. You can read more online about parse_date().

For example parse_date() would see a vector of the following character dates “03 Jan 2018”, “07/03/1982”, and “08/20/85” and convert them to class Date as: 2018-01-03, 1982-03-07, and 1985-08-20.

parsedate::parse_date(c("03 January 2018",
                        "07/03/1982",
                        "08/20/85"))
## [1] "2018-01-03 UTC" "1982-07-03 UTC" "1985-08-20 UTC"
# An example using parse_date() on the column date_onset
linelist <- linelist %>%      
  mutate(date_onset = parse_date(date_onset))

9.6 Working with date-time class

As previously mentioned, R also supports a datetime class - a column that contains date and time information. As with the Date class, these often need to be converted from character objects to datetime objects.

Convert dates with times

A standard datetime object is formatted with the date first, which is followed by a time component - for example 01 Jan 2020, 16:30. As with dates, there are many ways this can be formatted, and there are numerous levels of precision (hours, minutes, seconds) that can be supplied.

Luckily, lubridate helper functions also exist to help convert these strings to datetime objects. These functions are extensions of the date helper functions, with _h (only hours supplied), _hm (hours and minutes supplied), or _hms (hours, minutes, and seconds supplied) appended to the end (e.g. dmy_hms()). These can be used as shown:

Convert datetime with only hours to datetime object

ymd_h("2020-01-01 16hrs")
## [1] "2020-01-01 16:00:00 UTC"
ymd_h("2020-01-01 4PM")
## [1] "2020-01-01 16:00:00 UTC"

Convert datetime with hours and minutes to datetime object

dmy_hm("01 January 2020 16:20")
## [1] "2020-01-01 16:20:00 UTC"

Convert datetime with hours, minutes, and seconds to datetime object

mdy_hms("01 January 2020, 16:20:40")
## [1] "2020-01-20 16:20:40 UTC"

You can supply time zone but it is ignored. See section later in this page on time zones.

mdy_hms("01 January 2020, 16:20:40 PST")
## [1] "2020-01-20 16:20:40 UTC"

When working with a data frame, time and date columns can be combined to create a datetime column using str_glue() from stringr package and an appropriate lubridate function. See the page on Characters and strings for details on stringr.

In this example, the linelist data frame has a column in format “hours:minutes”. To convert this to a datetime we follow a few steps:

  1. Create a “clean” time of admission column with missing values filled-in with the column median. We do this because lubridate won’t operate on missing values. Combine it with the column date_hospitalisation, and then use the function ymd_hm() to convert.
# packages
pacman::p_load(tidyverse, lubridate, stringr)

# time_admission is a column in hours:minutes
linelist <- linelist %>%
  
  # when time of admission is not given, assign the median admission time
  mutate(
    time_admission_clean = ifelse(
      is.na(time_admission),         # if time is missing
      median(time_admission),        # assign the median
      time_admission                 # if not missing keep as is
  ) %>%
  
    # use str_glue() to combine date and time columns to create one character column
    # and then use ymd_hm() to convert it to datetime
  mutate(
    date_time_of_admission = str_glue("{date_hospitalisation} {time_admission_clean}") %>% 
      ymd_hm()
  )

Convert times alone

If your data contain only a character time (hours and minutes), you can convert and manipulate them as times using strptime() from base R. For example, to get the difference between two of these times:

# raw character times
time1 <- "13:45" 
time2 <- "15:20"

# Times converted to a datetime class
time1_clean <- strptime(time1, format = "%H:%M")
time2_clean <- strptime(time2, format = "%H:%M")

# Difference is of class "difftime" by default, here converted to numeric hours 
as.numeric(time2_clean - time1_clean)   # difference in hours
## [1] 1.583333

Note however that without a date value provided, it assumes the date is today. To combine a string date and a string time together see how to use stringr in the section just above. Read more about strptime() here.

To convert single-digit numbers to double-digits (e.g. to “pad” hours or minutes with leading zeros to achieve 2 digits), see this “Pad length” section of the Characters and strings page.

Extract time

You can extract elements of a time with hour(), minute(), or second() from lubridate.

Here is an example of extracting the hour, and then classifing by part of the day. We begin with the column time_admission, which is class Character in format “HH:MM”. First, the strptime() is used as described above to convert the characters to datetime class. Then, the hour is extracted with hour(), returning a number from 0-24. Finally, a column time_period is created using logic with case_when() to classify rows into Morning/Afternoon/Evening/Night based on their hour of admission.

linelist <- linelist %>%
  mutate(hour_admit = hour(strptime(time_admission, format = "%H:%M"))) %>%
  mutate(time_period = case_when(
    hour_admit > 06 & hour_admit < 12 ~ "Morning",
    hour_admit >= 12 & hour_admit < 17 ~ "Afternoon",
    hour_admit >= 17 & hour_admit < 21 ~ "Evening",
    hour_admit >=21 | hour_admit <= 6 ~ "Night"))

To learn more about case_when() see the page on Cleaning data and core functions.

9.7 Working with dates

lubridate can also be used for a variety of other functions, such as extracting aspects of a date/datetime, performing date arithmetic, or calculating date intervals

Here we define a date to use for the examples:

# create object of class Date
example_date <- ymd("2020-03-01")

Extract date components

You can extract common aspects such as month, day, weekday:

month(example_date)  # month number
## [1] 3
day(example_date)    # day (number) of the month
## [1] 1
wday(example_date)   # day number of the week (1-7)
## [1] 1

You can also extract time components from a datetime object or column. This can be useful if you want to view the distribution of admission times.

example_datetime <- ymd_hm("2020-03-01 14:45")

hour(example_datetime)     # extract hour
minute(example_datetime)   # extract minute
second(example_datetime)   # extract second

There are several options to retrieve weeks. See the section on Epidemiological weeks below.

Note that if you are seeking to display a date a certain way (e.g. “Jan 2020” or “Thursday 20 March” or “Week 20, 1977”) you can do this more flexibly as described in the section on Date display.

Date math

You can add certain numbers of days or weeks using their respective function from lubridate.

# add 3 days to this date
example_date + days(3)
## [1] "2020-03-04"
# add 7 weeks and subtract two days from this date
example_date + weeks(7) - days(2)
## [1] "2020-04-17"

Date intervals

The difference between dates can be calculated by:

  1. Ensure both dates are of class date
  2. Use subtraction to return the “difftime” difference between the two dates
  3. If necessary, convert the result to numeric class to perform subsequent mathematical calculations

Below the interval between two dates is calculated and displayed. You can find intervals by using the subtraction “minus” symbol on values that are class Date. Note, however that the class of the returned value is “difftime” as displayed below, and must be converted to numeric.

# find the interval between this date and Feb 20 2020 
output <- example_date - ymd("2020-02-20")
output    # print
## Time difference of 10 days
class(output)
## [1] "difftime"

To do subsequent operations on a “difftime”, convert it to numeric with as.numeric().

This can all be brought together to work with data - for example:

pacman::p_load(lubridate, tidyverse)   # load packages

linelist <- linelist %>%
  
  # convert date of onset from character to date objects by specifying dmy format
  mutate(date_onset = dmy(date_onset),
         date_hospitalisation = dmy(date_hospitalisation)) %>%
  
  # filter out all cases without onset in march
  filter(month(date_onset) == 3) %>%
    
  # find the difference in days between onset and hospitalisation
  mutate(days_onset_to_hosp = date_hospitalisation - date_of_onset)

In a data frame context, if either of the above dates is missing, the operation will fail for that row. This will result in an NA instead of a numeric value. When using this column for calculations, be sure to set the na.rm = argument to TRUE. For example:

# calculate the median number of days to hospitalisation for all cases where data are available
median(linelist_delay$days_onset_to_hosp, na.rm = T)

9.8 Date display

Once dates are the correct class, you often want them to display differently, for example to display as “Monday 05 January” instead of “2018-01-05”. You may also want to adjust the display in order to then group rows by the date elements displayed - for example to group by month-year.

format()

Adjust date display with the base R function format(). This function accepts a character string (in quotes) specifying the desired output format in the “%” strptime abbreviations (the same syntax as used in as.Date()). Below are most of the common abbreviations.

Note: using format() will convert the values to class Character, so this is generally used towards the end of an analysis or for display purposes only! You can see the complete list by running ?strptime.

%d = Day number of month (5, 17, 28, etc.)
%j = Day number of the year (Julian day 001-366)
%a = Abbreviated weekday (Mon, Tue, Wed, etc.)
%A = Full weekday (Monday, Tuesday, etc.)
%w = Weekday number (0-6, Sunday is 0)
%u = Weekday number (1-7, Monday is 1)
%W = Week number (00-53, Monday is week start)
%U = Week number (01-53, Sunday is week start)
%m = Month number (e.g. 01, 02, 03, 04)
%b = Abbreviated month (Jan, Feb, etc.)
%B = Full month (January, February, etc.)
%y = 2-digit year (e.g. 89)
%Y = 4-digit year (e.g. 1989)
%h = hours (24-hr clock)
%m = minutes
%s = seconds
%z = offset from GMT
%Z = Time zone (character)

An example of formatting today’s date:

# today's date, with formatting
format(Sys.Date(), format = "%d %B %Y")
## [1] "18 July 2023"
# easy way to get full date and time (default formatting)
date()
## [1] "Tue Jul 18 15:34:01 2023"
# formatted combined date, time, and time zone using str_glue() function
str_glue("{format(Sys.Date(), format = '%A, %B %d %Y, %z  %Z, ')}{format(Sys.time(), format = '%H:%M:%S')}")
## Tuesday, July 18 2023, +0000  UTC, 15:34:01
# Using format to display weeks
format(Sys.Date(), "%Y Week %W")
## [1] "2023 Week 29"

Note that if using str_glue(), be aware of that within the expected double quotes ” you should only use single quotes (as above).

Month-Year

To convert a Date column to Month-year format, we suggest you use the function as.yearmon() from the zoo package. This converts the date to class “yearmon” and retains the proper ordering. In contrast, using format(column, "%Y %B") will convert to class Character and will order the values alphabetically (incorrectly).

Below, a new column yearmonth is created from the column date_onset, using the as.yearmon() function. The default (correct) ordering of the resulting values are shown in the table.

# create new column 
test_zoo <- linelist %>% 
     mutate(yearmonth = zoo::as.yearmon(date_onset))

# print table
table(test_zoo$yearmon)
## 
## Apr 2014 May 2014 Jun 2014 Jul 2014 Aug 2014 Sep 2014 Oct 2014 Nov 2014 Dec 2014 Jan 2015 Feb 2015 Mar 2015 Apr 2015 
##        7       64      100      226      528     1070     1112      763      562      431      306      277      186

In contrast, you can see how only using format() does achieve the desired display format, but not the correct ordering.

# create new column
test_format <- linelist %>% 
     mutate(yearmonth = format(date_onset, "%b %Y"))

# print table
table(test_format$yearmon)
## 
## Apr 2014 Apr 2015 Aug 2014 Dec 2014 Feb 2015 Jan 2015 Jul 2014 Jun 2014 Mar 2015 May 2014 Nov 2014 Oct 2014 Sep 2014 
##        7      186      528      562      306      431      226      100      277       64      763     1112     1070

Note: if you are working within a ggplot() and want to adjust how dates are displayed only, it may be sufficient to provide a strptime format to the date_labels = argument in scale_x_date() - you can use "%b %Y" or "%Y %b". See the ggplot tips page.

zoo also offers the function as.yearqtr(), and you can use scale_x_yearmon() when using ggplot().

9.9 Epidemiological weeks

lubridate

See the page on Grouping data for more extensive examples of grouping data by date. Below we briefly describe grouping data by weeks.

We generally recommend using the floor_date() function from lubridate, with the argument unit = "week". This rounds the date down to the “start” of the week, as defined by the argument week_start =. The default week start is 1 (for Mondays) but you can specify any day of the week as the start (e.g. 7 for Sundays). floor_date() is versitile and can be used to round down to other time units by setting unit = to “second”, “minute”, “hour”, “day”, “month”, or “year”.

The returned value is the start date of the week, in Date class. Date class is useful when plotting the data, as it will be easily recognized and ordered correctly by ggplot().

If you are only interested in adjusting dates to display by week in a plot, see the section in this page on Date display. For example when plotting an epicurve you can format the date display by providing the desired strptime “%” nomenclature. For example, use “%Y-%W” or “%Y-%U” to return the year and week number (given Monday or Sunday week start, respectively).

Weekly counts

See the page on Grouping data for a thorough explanation of grouping data with count(), group_by(), and summarise(). A brief example is below.

  1. Create a new ‘week’ column with mutate(), using floor_date() with unit = "week"
  2. Get counts of rows (cases) per week with count(); filter out any cases with missing date
  3. Finish with complete() from tidyr to ensure that all weeks appear in the data - even those with no rows/cases. By default the count values for any “new” rows are NA, but you can make them 0 with the fill = argument, which expects a named list (below, n is the name of the counts column).
# Make aggregated dataset of weekly case counts
weekly_counts <- linelist %>% 
  drop_na(date_onset) %>%             # remove cases missing onset date
  mutate(weekly_cases = floor_date(   # make new column, week of onset
    date_onset,
    unit = "week")) %>%            
  count(weekly_cases) %>%           # group data by week and count rows per group (creates column 'n')
  tidyr::complete(                  # ensure all weeks are present, even those with no cases reported
    weekly_cases = seq.Date(          # re-define the "weekly_cases" column as a complete sequence,
      from = min(weekly_cases),       # from the minimum date
      to = max(weekly_cases),         # to the maxiumum date
      by = "week"),                   # by weeks
    fill = list(n = 0))             # fill-in NAs in the n counts column with 0

Here are the first rows of the resulting data frame:

Epiweek alternatives

Note that lubridate also has functions week(), epiweek(), and isoweek(), each of which has slightly different start dates and other nuances. Generally speaking though, floor_date() should be all that you need. Read the details for these functions by entering ?week into the console or reading the documentation here.

You might consider using the package aweek to set epidemiological weeks. You can read more about it on the RECON website. It has the functions date2week() and week2date() in which you can set the week start day with week_start = "Monday". This package is easiest if you want “week”-style outputs (e.g. “2020-W12”). Another advantage of aweek is that when date2week() is applied to a date column, the returned column (week format) is automatically of class Factor and includes levels for all weeks in the time span (this avoids the extra step of complete() described above). However, aweek does not have the functionality to round dates to other time units such as months, years, etc.

Another alternative for time series which also works well to show a a “week” format (“2020 W12”) is yearweek() from the package tsibble, as demonstrated in the page on Time series and outbreak detection.

9.10 Converting dates/time zones

When data is present in different time time zones, it can often be important to standardise this data in a unified time zone. This can present a further challenge, as the time zone component of data must be coded manually in most cases.

In R, each datetime object has a timezone component. By default, all datetime objects will carry the local time zone for the computer being used - this is generally specific to a location rather than a named timezone, as time zones will often change in locations due to daylight savings time. It is not possible to accurately compensate for time zones without a time component of a date, as the event a date column represents cannot be attributed to a specific time, and therefore time shifts measured in hours cannot be reasonably accounted for.

To deal with time zones, there are a number of helper functions in lubridate that can be used to change the time zone of a datetime object from the local time zone to a different time zone. Time zones are set by attributing a valid tz database time zone to the datetime object. A list of these can be found here - if the location you are using data from is not on this list, nearby large cities in the time zone are available and serve the same purpose.

https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

# assign the current time to a column
time_now <- Sys.time()
time_now
## [1] "2023-07-18 15:34:02 CEST"
# use with_tz() to assign a new timezone to the column, while CHANGING the clock time
time_london_real <- with_tz(time_now, "Europe/London")

# use force_tz() to assign a new timezone to the column, while KEEPING the clock time
time_london_local <- force_tz(time_now, "Europe/London")


# note that as long as the computer that was used to run this code is NOT set to London time,
# there will be a difference in the times 
# (the number of hours difference from the computers time zone to london)
time_london_real - time_london_local
## Time difference of -1 hours

This may seem largely abstract, and is often not needed if the user isn’t working across time zones.

9.11 Lagging and leading calculations

lead() and lag() are functions from the dplyr package which help find previous (lagged) or subsequent (leading) values in a vector - typically a numeric or date vector. This is useful when doing calculations of change/difference between time units.

Let’s say you want to calculate the difference in cases between a current week and the previous one. The data are initially provided in weekly counts as shown below.

When using lag() or lead() the order of rows in the dataframe is very important! - pay attention to whether your dates/numbers are ascending or descending

First, create a new column containing the value of the previous (lagged) week.

  • Control the number of units back/forward with n = (must be a non-negative integer)
  • Use default = to define the value placed in non-existing rows (e.g. the first row for which there is no lagged value). By default this is NA.
  • Use order_by = TRUE if your the rows are not ordered by your reference column
counts <- counts %>% 
  mutate(cases_prev_wk = lag(cases_wk, n = 1))

Next, create a new column which is the difference between the two cases columns:

counts <- counts %>% 
  mutate(cases_prev_wk = lag(cases_wk, n = 1),
         case_diff = cases_wk - cases_prev_wk)

You can read more about lead() and lag() in the documentation here or by entering ?lag in your console.

9.12 Resources

lubridate tidyverse page
lubridate RStudio cheatsheet
R for Data Science page on dates and times
Online tutorial Date formats