4 Sample Descriptive Statistics

Setup data & environment to analyze.²

# rm(list=ls())
load('../data/dcassvy.Rdata')
texcmds[['Rversion']] <- paste(R.version$major, R.version$minor, sep='.')

## Helper functions for analyzing and reporting results
source('R/broom_mi.R')
source('R/report_descriptives.R')
source('R/report_models.R')
source('R/make_predictions.R')
source('R/marginal_effects.R')

Include only rows in which respondent is one of four mutually exclusive racial groups.

fourraces <- c('white', 'asian', 'black', 'latino')
dcas16svy <- subset(dcas16svy, anarace==TRUE) %>%
    update(raceeth = factor(raceeth, levels=fourraces))
dcas18svy <- subset(dcas18svy, anarace==TRUE) %>%
    update(raceeth = factor(raceeth, levels=fourraces))

Create tables with descriptive results of both DCAS samples, overall and by race. The tables are constructed using two helper functions saved in report_descriptives.R:

desc: Returns the summary of the multiply-imputed survey mean
make.desc: Returns the table of descriptive statistics

Results are saved in tables/descriptives_dcas16.tex and tables/descriptives_dcas18.tex.

Make table of overall descriptive statistics and by race for DCAS 2016 and DCAS 2018.

Missingness of education and income in 2016 and 2018.

mi16 <- apply(sapply(dcas16[, c("educ", "dem.income.cat4")], is.na), 2, sum)
mi18 <- apply(sapply(dcas18[, c("educ", "income")], is.na), 2, sum)

tibble(
    year = c("2016", "2018"),
    educ_mi = c(mi16[1], mi18[1]),
    educ_pct = round(c(mi16[1]/nrow(dcas16), mi18[1]/nrow(dcas18))*100, 1),
    inc_mi = c(mi16[2], mi18[2]),
    inc_pct = round(c(mi16[2]/nrow(dcas16), mi18[2]/nrow(dcas18))*100, 1)
)

Table 4.1:
year	educ_mi	educ_pct	inc_mi	inc_pct
2016	25	2	109	8.9
2018	25	2.4	79	7.4

## Number of respondents by race

Table 4.2 shows the number of respondents in 2016 and 2018.

racecnt18 <- dcas18svy$designs[[1]]$variables %>%
  group_by(raceeth) %>%
  count() %>%
  rename(`2018` = n)

dcas16svy$designs[[1]]$variables %>%
  group_by(raceeth) %>%
  count() %>%
  rename(`2016` = n) %>%
  full_join(racecnt18, by = "raceeth") %>%
  as_huxtable() %>%
  set_caption(
    "Number of respondents by race in DCAS 2016 and DCAS 2018 surveys"
  ) %>%
  set_label("tab:racecount")

Table 4.2: Number of respondents by race in DCAS 2016 and DCAS 2018 surveys
raceeth	2016	2018
white	266	513
asian	186	93
black	105	308
latino	84	75

4.1 Assign labels to be used throughout analyses

## Race labels and variable levels
racelabs <- c("Asian", "Black", "Latino", "White")
racelevs <- c("all", "asian", "black", "latino", "white")

## Labels for regression tables
regression_labels <- c('Asian','Black','Latinx',
                       'Age','Foreign Born','Male',
                       'Children Present', 'Married',
                       levels(dcas16$dem.educ.attain)[c(1,3:5)],
                       # levels(dcas$dem.income.cat4)[c(1,3:4)],
                       'Home owner',
                       'Years in neighborhood',
                       '10-50 blocks', '>50 blocks')

The data loaded here were created in the file analysis/01-variable-data-construction.Rmd. Due to the stochasticity of the multiple imputations, new data generated by sourcing that file will result in minor differences in the final output.↩︎