Filter most recent data per enrollment
filter_most_recent_data_per_enrollment.Rd
filter_most_recent_data_per_enrollment()
filters the most recent data
for enrollments with data collected in multiple data collection stages
following a set of rules.
Details
An enrollment's most recent data corresponds to data collected at "Project exit". If the enrollment has no "Project exit", the most recent data corresponds to data collected at the "Project update" with the most recent date_updated ("Project update" includes both "Project update" and "Project annual assessment" data_collection_stages). If an enrollment has multiple "Project update" with the same most recent date_updated, the tie is broken by selecting the first appearance. Finally, if an enrollment has no "Project update", the most recent data corresponds to data collected at "Project start".
The general rules described above wouldn't be applicable in certain scenarios (e.g.
Employment data is collected at "Project start" and "Project exit" by definition,
showing missing data for entries that correspond to a "Project update"). To address
this issue, function's users are expected to preprocess the data accordingly before
calling filter_most_recent_data_per_enrollment()
(e.g. by filtering out data that
correspond to "Project update" data collection stage).
Examples
if (FALSE) { # \dontrun{
mock_data <- tibble::tribble(
~test_row, ~organization_id, ~personal_id, ~enrollment_id, ~data_collection_stage, ~date_updated, ~status,
1, 1L, 1L, 1000L, "Project start", "2024-12-31", "A",
2, 1L, 1L, 1000L, "Project update", "2023-01-01", "B"
) |>
dplyr::mutate(dplyr::across(date_updated, as.Date))
filter_most_recent_data_per_enrollment(mock_data)
} # }