Skip to contents

filter_most_recent_data_per_enrollment() filters the most recent data for enrollments with data collected in multiple data collection stages following a set of rules.

Usage

filter_most_recent_data_per_enrollment(data)

Arguments

data

A data frame with multiple rows per enrollment

Value

A data frame with one row per enrollment

Details

An enrollment's most recent data corresponds to data collected at "Project exit". If the enrollment has no "Project exit", the most recent data corresponds to data collected at the "Project update" with the most recent date_updated ("Project update" includes both "Project update" and "Project annual assessment" data_collection_stages). If an enrollment has multiple "Project update" with the same most recent date_updated, the tie is broken by selecting the first appearance. Finally, if an enrollment has no "Project update", the most recent data corresponds to data collected at "Project start".

The general rules described above wouldn't be applicable in certain scenarios (e.g. Employment data is collected at "Project start" and "Project exit" by definition, showing missing data for entries that correspond to a "Project update"). To address this issue, function's users are expected to preprocess the data accordingly before calling filter_most_recent_data_per_enrollment() (e.g. by filtering out data that correspond to "Project update" data collection stage).

Examples

if (FALSE) { # \dontrun{
mock_data <- tibble::tribble(
    ~test_row, ~organization_id, ~personal_id, ~enrollment_id, ~data_collection_stage, ~date_updated, ~status,
    1, 1L, 1L, 1000L, "Project start", "2024-12-31", "A",
    2, 1L, 1L, 1000L, "Project update", "2023-01-01", "B"
) |>
    dplyr::mutate(dplyr::across(date_updated, as.Date))

filter_most_recent_data_per_enrollment(mock_data)
} # }