The {cdrs} Package

Published

Last Updated in February, 2024

1 Introduction

The {cdrs} (pronounced “cedars”) package for R is intended to make your analysis of the DRS data easier and more reproducible. Below, we discuss how to install it, and describe some of the key functions of this package. To examine the source code, go to the github repository here.

2 Installing and Updating

This package is not yet available on the official package repository of CRAN. As such, you will need to install it manually using a function from the package {devtools}. The {devtools} package helps R programmers do several tasks such as write R packages, but in this case, it is used to install the {cdrs} package from GitHub.

tl;dr

If you know what you’re doing, run devtools::install_github("ktomari/cdrs")

2.1 1. Set up devtools

First, make sure you have {devtools} installed. You can run this script to see if you have it installed on your machine, if it not it will download and install it from CRAN. If you know you don’t have it, just run install.packages("devtools").

if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
    message("devtools has been successfully installed.")
} else {
    message("devtools is already installed.")
}

2.2 2. Install cdrs

Second, install {cdrs} from GitHub.

devtools::install_github("ktomari/cdrs")

# you can then load it like any other package.
library(cdrs)

2.3 3. Update cdrs

As of this writing, the {cdrs} package is being actively developed. Since it is not currently available on CRAN, you will have to check for updates manually. You may do so by checking our News page to see if there have been updates. When updates are available, simply repeat step 2) devtools::install_github("ktomari/cdrs"). This will download and install it again from GitHub.

3 Key Features

3.1 Reading DRS Data

You can read the DRS data in with cdrs_read() which has two arguments:

  1. path_: the path to the zip file or directory which contains the DRS data. This function will automatically unzip files as needed and load them into a temporary memory.
  2. relevel_: determines how you want to approach values like "<Decline to answer>". The default setting is "default"—no surprise there! This will convert all missing values to NA, except for those that express uncertainty like <"Unsure">. You may also specify it as "none", which will load the DRS data without converting any missing values.1 Finally, you may also supply it a list of logical values that match the parameters for cdrs_revise(). This last option is more complicated so we’ll leave a discussion for this in the documentation, ie. ?cdrs_read.
drs_data <- cdrs::cdrs_read("Your/Path/Here", 
                            relevel_ = "default")

If you want to see an example of this in action without downloading the public data set, use cdrs_read_example() instead.

3.2 DRS with {survey}

We rely on the {survey} package for much of the backend calculations of the DRS survey. This package is critical because it allows us to make proper estimations given the unique assumptions and characteristics of survey statistics. You can learn more about how we approach survey statistics in our documentation.

The first step requires us to create a data subset, followed by a “survey design object” which specifies the complex survey design that the DRS uses. We provide the conceptual details of how we specify this object in the Weights and Complex Survey Design page, whereas below, we simply demonstrate its implementation with {cdrs}.

# First we create a data subset of the variables we're interest in.
# The cdrs_subset() function subsets the variables of interest,
# as well as the Zone and WTFINAL columns which we'll need later.
# Note, you must pass a character vector, eg. `"Q2"` or `c("Q2", "Q3_1", ...)`
drs_subset <- cdrs::cdrs_subset(drs_data, "Q2")

# Second, we create the survey design object, which can later be used with
# {survey} functions like svytable. The argument `set_fpc` is largely optional as we found no real difference with robustness tests.
svydes_obj <- cdrs::cdrs_design(drs_subset, set_fpc = T)

Given this, we can reproduce the first example in the Weights and Complex Survey Design page. This calculated the weighted proportions of "Q2".

prop <- survey::svymean(
  # x is the formula
  x = ~ Q2,
  design = svydes_obj,
  na.rm = T)

3.2.1 Other {survey} wrappers

The {cdrs} package also has other “wrappers” for {survey} package functions. These wrappers serve to specify the parameters of certain {survey} functions based on the design of the DRS.

3.2.1.1 2-way Cross Tabulation

Want to see a contingency table? Simply supply the (full) DRS data and the 2 variables you wish to analyze. Note, this approach uses svytotal which yields population proportion estimates for each crossed factor and their standard errors. You may run stats::ftable() on the results to create a ‘flat’ contigency table.

xt <- cdrs_crosstab(drs_data, c("Q2", "SEX_P"))

ftable(xt)

Footnotes

  1. Why would you specify "none" and use cdrs_read()? Why not just load it with read.csv()? Just because we set relevel_ = "none" doesn’t mean cdrs_read is doing nothing. It will still make sure your data hasn’t been corrupted, and make sure all your ordered factors are correctly ordered.↩︎