Appendices
Test Your Knowledge: Solutions
- Log-transform the variable
age
indata
and save the result asage.log
.
<- log(data$age) age.log
- Square all values in
cesd.1
and save the result ascesd.1.squared
within the data setdata
.
$cesd.1.squared <- data$cesd.1^2 data
- Calculate the mean and standard deviation (\(SD\)) of the variable
cesd.2
. If necessary, use the Internet to find out which function in R calculates the standard deviation of a numeric vector.
mean(data$cesd.2, na.rm=TRUE)
sd(data$cesd.2, na.rm=TRUE)
- Save the calculated mean and standard deviation of
cesd.2
in a list element.
list(mean = mean(data$cesd.2, na.rm=TRUE),
sd = sd(data$cesd.2, na.rm=TRUE))
- Does the variable
atsphs.0
indata
have the desired classnumeric
? Try to confirm this using R code.
class(data$atsphs.0)
- Create two new variables in
data
: (1)age.65plus
, alogical
variable which indicates if a person’s age is 65 or above; and (2)cesd.diff
, a variable that contains the difference betweencesd.0
andcesd.1
for each patient.
$age.65plus <- data$age >= 65
data$cesd.diff <- data$cesd.0 - data$cesd.1 data
- Using the pipe operator, filter out the records of all patients who are male (
sex
=0) and part of the intervention group (group
=1); then, in this subset, select the variableage
and calculate its mean.
%>%
data filter(sex==0, group==1) %>%
pull(age) %>%
mean()
- In the fifth and sixth row of
data
, change the value ofdegree
toNA
(missing).
5:6,"degree"] <- c(NA, NA) data[
R Package Information
All code included was tested under R version 4.2.0. The following package versions were used:
dplyr_1.0.10 openxlsx_4.2.5 scales_1.2.0
mice_3.14.0 plot.matrix_1.6.2 skimr_2.1.4
miceadds_3.12-26 psych_2.2.3 stdReg_3.4.1
mitml_0.4-3 purrr_0.3.4 tidyr_1.2.0
mmrm_0.2.2.9013 RefBasedMI_0.1.0 tidyverse_1.3.1
Data & Downloads
The data we use in this tutorial has been uploaded to a Github repository, which can be found at github.com/MathiasHarrer/rct-tutorial. The repository has also been permanently archived on Zenodo. We also uploaded an R script that includes all the code we used in this tutorial. Quick download links are provided below.
rct-tutorial.zip
. A zip-folder containing the example trial data set (seedata.xlsx
), as well as the complete tutorial code (seecode.R
).data.xlsx
. The original, unimputed example data set, which includes simulated patient data of a randomized controlled trial comparing an Internet-based depression intervention to a waitlist control (\(N=\) 546).imp.rda
. The imputed data set object generated by themice
function (see Section 3.2.2). After being imported into R, the object has the nameimp
.imp.j2r.rda
. The imputed data set object generated by theRefBasedMI
function (“jump-to-reference” imputation; see Section 3.2.3). After being imported into R, the object has the nameimp.j2r
.implist.rda
. Theimp
object transformed to a list of imputed data sets (mitml.list
). This multiply imputed data format can be used to pool analyses using thetestEstimates
function (see Section 4.1.3).implist.j2r.rda
. Theimp.j2r
object transformed to a list of imputed data sets (mitml.list
). This multiply imputed data format can be used to pool analyses using thetestEstimates
function (see Chapter ).code.R
. All code used in the tutorial, collected in one script.skimReport.R
. Code of theskimReport
function, which is used in Section 5.3 to generate descriptive tables.
To open .rda
files, save them in your analysis folder. Then, open the folder through the Files pane in R Studio and click on the .rda
file. This automatically imports the object into your R environment. To see how the data.xlsx
Excel file can be imported, see Section 1.3.
Errata & Corrections
Statistical software is constantly evolving, and it is possible that some of the code provided in this tutorial may stop working over time. Errata and/or corrections will be documented here.
If you find an error in the tutorial, feel free to contact Mathias (mathias.harrer@tum.de).
Last updated 2023-06-23.