Chapter 1: Background and Motivation

Over a half century after transformative civil rights laws such as Title VII of the Civil Rights Act of 1964 made discrimination illegal, America is still grappling with its history of racial injustice and the profound ongoing impact of systemic discrimination.

Given that the primary responsibility for enforcing anti-discrimination laws lies with the individual workers, who must file complaints with their employer or a government agency, this could potentially deter employees from reporting unequal treatment based on color and gender due to fear of retaliation, hurdles in proving discrimination factually and etc.

In this capstone project, we seek to find out if implicit or perceived discrimination exist in workplaces using an established annual government survey data to explore if ethnicity and gender affect the degree of supervisory support for employees working in the federal government.

In our exploratory analysis of how the degree of supervisory support differs across races and gender, we choose the following indicators which are highly valued by employees consistently across organizations:

  1. Supervisor’s support for work-life balance - which plays a significant role in ensuring employees are able to both their work and family responsibilities.

  2. Supervisor’s support for employee development - which is essential for improvements in productivity and worker’s engagement at work

For our data science project, we activated the following packages, using both the Tidyverse and Base R approach.

pacman::p_load(dplyr, tidyverse, lubridate,
               modelr, broom,
               rvest,
               MASS, Hmisc, car, psych,
               ggthemes, scales, ggfortify,
               jtools, huxtable, interactions,
               DT, ggstance,
               knitr, kableExtra,
               effects, table1)

Chapter 2: Importing Dataset

The dataset for this project originates from the Office of Personnel Management’s Federal Employee Viewpoint Survey. It includes 120 perception questions with responses range from 1 = Strongly Disagree to 5 = Strongly Agree. Out of 1,410,610 employees, a total of 624,800 employees completed this survey which accounts for a response rate of 44.3%.

data <- read_csv("FEVS_2020_PRDF.csv")

During the survey rollout, federal employees were asked to rate their interactions and interpersonal relationship with their supervisors and perception on how much support they receive from their supervisors. Two out of the seven questions in this section (Q19 & Q21) will serve as our outcome variables.

Q19. My supervisor supports my need to balance work and other life issues.
Q21. Supervisors in my work unit support employee development.

  • 5 - Strongly Agree
  • 4 - Agree
  • 3 - Neither Agree nor Disagree
  • 2 - Disagree
  • 1 - Strongly Disagree

Also, the dataset contains demographic response (8 items). Respondents were asked by the following questions:

1. Please select the racial category or categories with which you most closely identify.
A. Black or African American
B. White
C. Asian
D. Other Groups Collapsed for Privacy

2. Are you of Hispanic, Latino, or Spanish origin?
A. Yes
B. No

3. Are you an individual with a disability?
A. Yes
B. No

4. What is your age group?
A. Under 40
B. 40 or Older

5. What is your supervisory status?
A. Non-Supervisor/Team Leader
B. Supervisor/Manager/Executive

6. How long have you been with the Federal Government (excluding military service)?
A. Ten years or fewer
B. Eleven to 20 years
C. More than 20 years

7. Are you Male or Female?
A. Male
B. Female

8. What is your US military service status?
A. Military Service
B. No Prior Military Service

Out of the 8 questions, DRNO (categorical) will serve as our key predictor variable; whereas the other five (excluding military and supervisory status) will be used as control variables. The binary variable gender is used as our moderator.

Chapter 3: Tidy & Transform of Dataset

3A. Preparation of Predictor Variables

We start with checking the values of Cronbach’s Alpha for the selected groups of questions that we wish to create new predictor variables from.

  1. Employee’s ratings of their general work environment
  2. Employee’s ratings of their work satisfaction

Given that the values of alpha exceed the threshold of 0.8 for the above groups, we subsequently created two variables (work_experience and work_satisfaction) that contain the mean value of the items for each variable.

Also, we recoded the gender variable - 0 being Female, 1 being Male. White American is set as the reference group for the discussion of results.

3B. Preparation of Outcome Variables

For the purpose of constructing the logistic regression models, we also created a binary outcome variable for each survey question of interest (Q19 & Q21), where higher scores above 3 point towards employee’s agreement with the statement that they felt supported by the supervisors for work-life balance and employee development, while scores 3 and below indicate otherwise.

data_cleaned <- data %>%
  mutate(work_satisfaction = rowMeans(data %>% 
                                      dplyr::select(Q33, Q34, Q35, Q36, Q36, Q37, Q38), na.rm = TRUE),
         work_experience = rowMeans(data %>% 
                                    dplyr::select(Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8), na.rm = TRUE),
         supervisor_support = rowMeans(data %>% 
                                       dplyr::select(Q19, Q20, Q21, Q22, Q23, Q24, Q25), na.rm = TRUE),
         leadership = rowMeans(data %>% 
                               dplyr::select(Q26, Q27, Q28, Q29, Q30, Q31, Q32), na.rm = TRUE),
         gender = ifelse(DSEX == "A", 1 , 0),
         ancestry = ifelse(DHISP == "A", 1, 0),
         disability = ifelse(DDIS == "A", 1, 0),
         above40 = ifelse(DAGEGRP == "B", 1, 0),
         Q19_sqrt = sqrt(max(Q19 + 1) - Q19),
         Q19_log = log10(max(Q19 + 1) - Q19),
         Q19_inverse = 1/(max(Q19 + 1) - Q19),
         Q19_binary = ifelse(Q19 > 3, 1, 0),
         Q21_binary = ifelse(Q21 > 3, 1, 0),
         )

data_cleaned$DRNO <- factor(data_cleaned$DRNO, 
         levels=c("A", "B", "C", "D"),
         labels=c("Black or African American",
                  "White", 
                  "Asian",
                  "Others"))

data_cleaned$DRNO <- relevel(as.factor(data_cleaned$DRNO), "White")

Chapter 4: Summary Statistics

White
(N=363512)
Black or African American
(N=71909)
Asian
(N=28633)
Others
(N=32676)
Total
(N=496730)
Supervisor Support for Work-Life Balance (Binary)
Mean (SD) 0.882 (0.323) 0.853 (0.354) 0.874 (0.332) 0.802 (0.399) 0.872 (0.334)
Median [Min, Max] 1.00 [0, 1.00] 1.00 [0, 1.00] 1.00 [0, 1.00] 1.00 [0, 1.00] 1.00 [0, 1.00]
Supervisor Support for Employee Development (Binary)
Mean (SD) 0.823 (0.382) 0.788 (0.409) 0.827 (0.378) 0.723 (0.447) 0.811 (0.391)
Median [Min, Max] 1.00 [0, 1.00] 1.00 [0, 1.00] 1.00 [0, 1.00] 1.00 [0, 1.00] 1.00 [0, 1.00]
Federal Tenure (Years)
10 or fewer 131408 (36.1%) 25396 (35.3%) 12925 (45.1%) 13434 (41.1%) 183163 (36.9%)
Between 11 and 20 135865 (37.4%) 24462 (34.0%) 10305 (36.0%) 11767 (36.0%) 182399 (36.7%)
More than 20 96239 (26.5%) 22051 (30.7%) 5403 (18.9%) 7475 (22.9%) 131168 (26.4%)
Gender
Male 149039 (41.0%) 43965 (61.1%) 13120 (45.8%) 16258 (49.8%) 222382 (44.8%)
Female 214473 (59.0%) 27944 (38.9%) 15513 (54.2%) 16418 (50.2%) 274348 (55.2%)
Ancestry (Hispanic, Latino, Spanish) (Binary)
No 325735 (89.6%) 69718 (97.0%) 28464 (99.4%) 26951 (82.5%) 450868 (90.8%)
Yes 37777 (10.4%) 2191 (3.0%) 169 (0.6%) 5725 (17.5%) 45862 (9.2%)
Disability (Binary)
No 312698 (86.0%) 58819 (81.8%) 26973 (94.2%) 26784 (82.0%) 425274 (85.6%)
Yes 50814 (14.0%) 13090 (18.2%) 1660 (5.8%) 5892 (18.0%) 71456 (14.4%)
Above 40 Years Old (Binary)
No 85621 (23.6%) 13013 (18.1%) 7370 (25.7%) 8470 (25.9%) 114474 (23.0%)
Yes 277891 (76.4%) 58896 (81.9%) 21263 (74.3%) 24206 (74.1%) 382256 (77.0%)
Work Satisfaction
Mean (SD) 3.71 (0.877) 3.70 (0.887) 3.78 (0.826) 3.51 (0.928) 3.70 (0.881)
Median [Min, Max] 3.83 [1.00, 5.00] 3.83 [1.00, 5.00] 3.83 [1.00, 5.00] 3.67 [1.00, 5.00] 3.83 [1.00, 5.00]
Missing 120 (0.0%) 45 (0.1%) 17 (0.1%) 17 (0.1%) 199 (0.0%)
Work Experience
Mean (SD) 3.90 (0.783) 3.89 (0.805) 3.99 (0.731) 3.74 (0.848) 3.89 (0.789)
Median [Min, Max] 4.00 [1.00, 5.00] 4.00 [1.00, 5.00] 4.00 [1.00, 5.00] 3.88 [1.00, 5.00] 4.00 [1.00, 5.00]
Missing 69 (0.0%) 25 (0.0%) 6 (0.0%) 6 (0.0%) 106 (0.0%)

 
In addition, the outcome variables (Q19 & Q21) are plotted in histograms to visualize the distributions within each ethnicity.

 

(Note: The extreme left-skewness of the outcome variables render multiple linear regression model unsuitable, even after attempts to transform the outcome variables as the condition of skewness is not satisfied. Hence, this leads us to model the data using logistic and parameterized ordinal logistic regression.)

Chapter 5: Logistic Regression Model

For the preparation of the model, we created and ran a correlation matrix, to see how our variables of interest (within the model) are related.

data_cleaned %>% 
  dplyr::select(gender, ancestry, disability, above40, work_experience, work_satisfaction, Q19, Q21) %>% 
  as.matrix(.) %>% 
  Hmisc::rcorr(.) %>% 
  broom::tidy(.) %>%
  rename(`Variable 1` = column1,
         `Variable 2` = column2,
         Correlation = estimate) %>% 
  mutate(Abs_correlation = abs(Correlation)
         ) %>% 
  DT::datatable(options = list(scrollX = T),
                ) %>% 
  formatRound(columns = c("Correlation", "p.value", "Abs_correlation"), 
              digits = 3)

5A. Model Specification

For this project, 2 different regression models are explored. The first part of this project focuses on the logistic regression model which we regressed the demographics and control variables onto the binary outcomes of the likelihood of receiving supervisor’s support on:
(1) work-life balance (Q19_binary) and (2) employee development (Q21_binary).

The second part of this project takes into account the ordinal ratings of the response variables. Hence, we use the parameterized ordinal logistic regression which regresses the same predictors onto the ordinal outcomes of receiving supervisor’s support on: (1) work-life balance (Q19) and (2) employee development (Q21).

For each model, we will first show the model equation, followed by the regression output.

Model 1 - Main Logistic Regression Model Effects (Q19)

\[ \begin{aligned} \operatorname{Q19\_binary} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{Q19\_binary} = \operatorname{1}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience}) + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction}) + \epsilon \end{aligned} \]

model4_logit <- final_cleaned_data %>% 
  glm(Q19_binary ~  DRNO + gender + ancestry + disability + above40 + DFEDTEN + work_experience 
      + work_satisfaction,
      ., family = binomial(link = "logit"))

summary(model4_logit)
## 
## Call:
## glm(formula = Q19_binary ~ DRNO + gender + ancestry + disability + 
##     above40 + DFEDTEN + work_experience + work_satisfaction, 
##     family = binomial(link = "logit"), data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.1697   0.1667   0.2939   0.4369   2.5054  
## 
## Coefficients:
##                                Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)                   -3.922664   0.025169 -155.851  < 2e-16 ***
## DRNOBlack or African American -0.316700   0.014018  -22.592  < 2e-16 ***
## DRNOAsian                     -0.391539   0.021487  -18.222  < 2e-16 ***
## DRNOOthers                    -0.405717   0.017687  -22.938  < 2e-16 ***
## gender1                        0.085865   0.010143    8.466  < 2e-16 ***
## ancestry                      -0.445471   0.015777  -28.235  < 2e-16 ***
## disability                    -0.107923   0.013512   -7.987 1.38e-15 ***
## above40                       -0.152872   0.013233  -11.552  < 2e-16 ***
## DFEDTENBetween 11 and 20      -0.030155   0.012132   -2.486   0.0129 *  
## DFEDTENMore than 20           -0.002496   0.014330   -0.174   0.8617    
## work_experience                0.889674   0.009438   94.269  < 2e-16 ***
## work_satisfaction              0.887080   0.008831  100.455  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 379657  on 495528  degrees of freedom
## Residual deviance: 277406  on 495517  degrees of freedom
## AIC: 277430
## 
## Number of Fisher Scoring iterations: 6

Model 2 - Main Logistic Regression Model Effects (Q21)

\[ \begin{aligned} \operatorname{Q21\_binary} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{Q21\_binary} = \operatorname{1}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience})\ + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction}) + \epsilon \end{aligned} \]

model6_logit <- final_cleaned_data %>% 
  glm(Q21_binary ~  DRNO + gender + ancestry + disability + above40 + DFEDTEN + work_experience 
      + work_satisfaction,
      ., family = binomial(link = "logit"))

summary(model6_logit)
## 
## Call:
## glm(formula = Q21_binary ~ DRNO + gender + ancestry + disability + 
##     above40 + DFEDTEN + work_experience + work_satisfaction, 
##     family = binomial(link = "logit"), data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.1966   0.1426   0.3000   0.4855   3.0867  
## 
## Coefficients:
##                                Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)                   -5.932296   0.026931 -220.279  < 2e-16 ***
## DRNOBlack or African American -0.305987   0.012976  -23.581  < 2e-16 ***
## DRNOAsian                     -0.302837   0.019960  -15.172  < 2e-16 ***
## DRNOOthers                    -0.392240   0.016828  -23.309  < 2e-16 ***
## gender1                        0.098436   0.009325   10.556  < 2e-16 ***
## ancestry                      -0.420589   0.014941  -28.149  < 2e-16 ***
## disability                    -0.159582   0.012600  -12.666  < 2e-16 ***
## above40                       -0.296497   0.012267  -24.170  < 2e-16 ***
## DFEDTENBetween 11 and 20      -0.019117   0.011138   -1.716   0.0861 .  
## DFEDTENMore than 20            0.058096   0.013110    4.431 9.36e-06 ***
## work_experience                1.206585   0.009274  130.110  < 2e-16 ***
## work_satisfaction              1.000478   0.008310  120.392  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 479779  on 495528  degrees of freedom
## Residual deviance: 316893  on 495517  degrees of freedom
## AIC: 316917
## 
## Number of Fisher Scoring iterations: 6

 

In addition to exploring the differences in supervisory support for work-life balance and employee development, literature has shown the persistence of workplace gender segregation in the United States. Hence, we include a set of interaction terms for each ethnicity group with the binary gender variable in attempt to draw more insights from our data.

Model 3 - Main Logistic Regression Model with Gender Interaction Effects (Q19)

\[ \begin{aligned} \operatorname{Q19\_binary} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{Q19\_binary} = \operatorname{1}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}_{\operatorname{1}}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience})\ + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction})\ + \beta_{12}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}} \times \operatorname{gender}_{\operatorname{1}})\ + \\ &\quad \beta_{13}(\operatorname{DRNO}_{\operatorname{Asian}} \times \operatorname{gender}_{\operatorname{1}}) + \beta_{14}(\operatorname{DRNO}_{\operatorname{Others}} \times \operatorname{gender}_{\operatorname{1}}) + \epsilon \end{aligned} \]

model10_logit <- final_cleaned_data %>% 
  glm(Q19_binary ~  DRNO + gender + DRNO*gender + ancestry + disability + above40 + DFEDTEN 
      + work_experience + work_satisfaction,
      ., family = binomial(link = "logit"))

summary(model10_logit)
## 
## Call:
## glm(formula = Q19_binary ~ DRNO + gender + DRNO * gender + ancestry + 
##     disability + above40 + DFEDTEN + work_experience + work_satisfaction, 
##     family = binomial(link = "logit"), data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.1692   0.1668   0.2939   0.4369   2.4580  
## 
## Coefficients:
##                                        Estimate Std. Error  z value Pr(>|z|)
## (Intercept)                           -3.920903   0.025540 -153.517  < 2e-16
## DRNOBlack or African American         -0.292063   0.018056  -16.175  < 2e-16
## DRNOAsian                             -0.358796   0.031148  -11.519  < 2e-16
## DRNOOthers                            -0.525128   0.024434  -21.492  < 2e-16
## gender1                                0.078245   0.012122    6.455 1.08e-10
## ancestry                              -0.448058   0.015782  -28.390  < 2e-16
## disability                            -0.108655   0.013516   -8.039 9.05e-16
## above40                               -0.152209   0.013237  -11.499  < 2e-16
## DFEDTENBetween 11 and 20              -0.031006   0.012135   -2.555  0.01062
## DFEDTENMore than 20                   -0.003816   0.014343   -0.266  0.79022
## work_experience                        0.890307   0.009440   94.317  < 2e-16
## work_satisfaction                      0.887473   0.008831  100.491  < 2e-16
## DRNOBlack or African American:gender1 -0.074954   0.028480   -2.632  0.00849
## DRNOAsian:gender1                     -0.065146   0.042646   -1.528  0.12661
## DRNOOthers:gender1                     0.250279   0.035395    7.071 1.54e-12
##                                          
## (Intercept)                           ***
## DRNOBlack or African American         ***
## DRNOAsian                             ***
## DRNOOthers                            ***
## gender1                               ***
## ancestry                              ***
## disability                            ***
## above40                               ***
## DFEDTENBetween 11 and 20              *  
## DFEDTENMore than 20                      
## work_experience                       ***
## work_satisfaction                     ***
## DRNOBlack or African American:gender1 ** 
## DRNOAsian:gender1                        
## DRNOOthers:gender1                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 379657  on 495528  degrees of freedom
## Residual deviance: 277339  on 495514  degrees of freedom
## AIC: 277369
## 
## Number of Fisher Scoring iterations: 6

Model 4 - Main Logistic Regression Model with Gender Interaction Effects (Q21)

\[ \begin{aligned} \operatorname{Q21\_binary} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{Q21\_binary} = \operatorname{1}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}_{\operatorname{1}}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience}) + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction})\ + \beta_{12}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}} \times \operatorname{gender}_{\operatorname{1}})\ + \\ &\quad \beta_{13}(\operatorname{DRNO}_{\operatorname{Asian}} \times \operatorname{gender}_{\operatorname{1}}) + \beta_{14}(\operatorname{DRNO}_{\operatorname{Others}} \times \operatorname{gender}_{\operatorname{1}}) + \epsilon \end{aligned} \]

model12_logit <- final_cleaned_data %>% 
  glm(Q21_binary ~  DRNO + gender + DRNO*gender + ancestry + disability + above40 + DFEDTEN 
      + work_experience + work_satisfaction,
      ., family = binomial(link = "logit"))

summary(model12_logit)
## 
## Call:
## glm(formula = Q21_binary ~ DRNO + gender + DRNO * gender + ancestry + 
##     disability + above40 + DFEDTEN + work_experience + work_satisfaction, 
##     family = binomial(link = "logit"), data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.2005   0.1431   0.2997   0.4850   3.0523  
## 
## Coefficients:
##                                        Estimate Std. Error  z value Pr(>|z|)
## (Intercept)                           -5.920645   0.027199 -217.682  < 2e-16
## DRNOBlack or African American         -0.320582   0.016658  -19.245  < 2e-16
## DRNOAsian                             -0.280925   0.028869   -9.731  < 2e-16
## DRNOOthers                            -0.502941   0.023338  -21.551  < 2e-16
## gender1                                0.077761   0.011055    7.034 2.01e-12
## ancestry                              -0.422987   0.014945  -28.302  < 2e-16
## disability                            -0.161012   0.012603  -12.776  < 2e-16
## above40                               -0.296524   0.012271  -24.165  < 2e-16
## DFEDTENBetween 11 and 20              -0.019171   0.011141   -1.721   0.0853
## DFEDTENMore than 20                    0.058874   0.013122    4.487 7.23e-06
## work_experience                        1.206680   0.009275  130.093  < 2e-16
## work_satisfaction                      1.000578   0.008311  120.397  < 2e-16
## DRNOBlack or African American:gender1  0.027606   0.026455    1.044   0.2967
## DRNOAsian:gender1                     -0.045309   0.039636   -1.143   0.2530
## DRNOOthers:gender1                     0.228625   0.033658    6.793 1.10e-11
##                                          
## (Intercept)                           ***
## DRNOBlack or African American         ***
## DRNOAsian                             ***
## DRNOOthers                            ***
## gender1                               ***
## ancestry                              ***
## disability                            ***
## above40                               ***
## DFEDTENBetween 11 and 20              .  
## DFEDTENMore than 20                   ***
## work_experience                       ***
## work_satisfaction                     ***
## DRNOBlack or African American:gender1    
## DRNOAsian:gender1                        
## DRNOOthers:gender1                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 479779  on 495528  degrees of freedom
## Residual deviance: 316843  on 495514  degrees of freedom
## AIC: 316873
## 
## Number of Fisher Scoring iterations: 6

5B. Logistic Regression Results

In this section, we will interpret the results of our 4 models specified in Chapter 5A showing the relationship between (1) ethnicity and (2) interaction effects of gender, and the binary outcomes of the likelihood of receiving supervisor’s support on:

  1. work-life balance (Q19_binary) and
  2. employee development (Q21_binary).
Model 1
Main Effects -
Work-Life Balance
Model 3
With Interactions -
Work-Life Balance
Model 2
Main Effects -
Employee's Development
Model 4
With Interactions -
Employee's Development
Intercept0.020 ***0.020 ***0.003 ***0.003 ***
([0.019, 0.021],
p = 0.000)
([0.019, 0.021],
p = 0.000)
([0.003, 0.003],
p = 0.000)
([0.003, 0.003],
p = 0.000)
Black or African American0.729 ***0.747 ***0.736 ***0.726 ***
([0.709, 0.749],
p = 0.000)
([0.721, 0.774],
p = 0.000)
([0.718, 0.755],
p = 0.000)
([0.702, 0.750],
p = 0.000)
Asian0.676 ***0.699 ***0.739 ***0.755 ***
([0.648, 0.705],
p = 0.000)
([0.657, 0.742],
p = 0.000)
([0.710, 0.768],
p = 0.000)
([0.714, 0.799],
p = 0.000)
Others0.666 ***0.591 ***0.676 ***0.605 ***
([0.644, 0.690],
p = 0.000)
([0.564, 0.620],
p = 0.000)
([0.654, 0.698],
p = 0.000)
([0.578, 0.633],
p = 0.000)
Gender (0 = Female, 1 = Male)1.090 ***1.081 ***1.103 ***1.081 ***
([1.068, 1.112],
p = 0.000)
([1.056, 1.107],
p = 0.000)
([1.083, 1.124],
p = 0.000)
([1.058, 1.105],
p = 0.000)
Ancestry0.641 ***0.639 ***0.657 ***0.655 ***
([0.621, 0.661],
p = 0.000)
([0.619, 0.659],
p = 0.000)
([0.638, 0.676],
p = 0.000)
([0.636, 0.675],
p = 0.000)
Disability0.898 ***0.897 ***0.853 ***0.851 ***
([0.874, 0.922],
p = 0.000)
([0.874, 0.921],
p = 0.000)
([0.832, 0.874],
p = 0.000)
([0.831, 0.873],
p = 0.000)
Age above 40 years old (0 = No, 1 = Yes)0.858 ***0.859 ***0.743 ***0.743 ***
([0.836, 0.881],
p = 0.000)
([0.837, 0.881],
p = 0.000)
([0.726, 0.762],
p = 0.000)
([0.726, 0.761],
p = 0.000)
Federal Tenure between 11 and 20 years0.970 *0.969 *0.9810.981
([0.947, 0.994],
p = 0.013)
([0.947, 0.993],
p = 0.011)
([0.960, 1.003],
p = 0.086)
([0.960, 1.003],
p = 0.085)
Federal Tenure more than 20 years0.9980.9961.060 ***1.061 ***
([0.970, 1.026],
p = 0.862)
([0.969, 1.025],
p = 0.790)
([1.033, 1.087],
p = 0.000)
([1.034, 1.088],
p = 0.000)
Work Experience Rating2.434 ***2.436 ***3.342 ***3.342 ***
([2.390, 2.480],
p = 0.000)
([2.391, 2.481],
p = 0.000)
([3.282, 3.403],
p = 0.000)
([3.282, 3.404],
p = 0.000)
Work Satisfaction Rating2.428 ***2.429 ***2.720 ***2.720 ***
([2.386, 2.470],
p = 0.000)
([2.387, 2.471],
p = 0.000)
([2.676, 2.764],
p = 0.000)
([2.676, 2.765],
p = 0.000)
Black or African American X Gender0.928 **1.028
([0.877, 0.981],
p = 0.008)
([0.976, 1.083],
p = 0.297)
Asian X Gender0.9370.956
([0.862, 1.019],
p = 0.127)
([0.884, 1.033],
p = 0.253)
Others X Gender1.284 ***1.257 ***
([1.198, 1.377],
p = 0.000)
([1.177, 1.343],
p = 0.000)
N495529495529495529495529
AIC277429.956277368.838316916.868316873.448
BIC277563.317277535.538317050.229317040.149
Pseudo R20.3480.3490.4520.452
*** p < 0.001; ** p < 0.01; * p < 0.05.

The above regression table shows the exponentiated coefficients for all variables, which we will give our interpretation based on the odds scale, instead of on the probability scale.

Based on the logistic regression Model 1 and Model 2, we observe that non-White employees (Black/African Americans, Asian and Others) less likely than White employees to agree with the statement that their supervisors support their need for work-life balance and employee development.

In particular, as seen from Model 1, Black or African American federal employees see about 27% decrease in odds as compared to their White colleagues in receiving support from the supervisors to balance their work and other life priorities, holding other variables constant. Employees who are Asian and Others racial categories see a greater decrease in odds, by 32% and 33% respectively, relative to White employees.

As for Model 2, the odds for Black or African American and Asian federal employees in receiving support from the supervisors for their development are approximately 26% lower than their White colleagues, holding other variables constant. The odds for employees of Others racial category decreases more significantly by 32% as compared to their White colleagues.

When considering the gender interaction effects in Model 3, the odds for White male employees receiving support from their supervisors to balance work and other life priorities are 8% higher than White females employees. On the contrary, within the groups of Asian and Black/African American employees, the odds for male employees gaining support from supervisors for work-life balance decreases slightly by 6-7% relative to their female counterparts. Surprisingly for employees of Others racial category, we observe that the odds for male employees gaining support for work-life balance is 28% higher than their female colleagues, which reflects the widest gap in differentiation for the minority as compared to the other 3 racial groups.

Lastly for Model 4, the odds for White male employees receiving support from their supervisors for employee’s development are 8% higher than White females employees. However, within the groups of Asian and Black/African American employees, the odds for male employees gaining support from supervisors development opportunities do not differ significantly from females. For employees of Others racial category, the odds for male employees gaining support for development opportunities is 26% higher than their female colleagues, which again reflects the widest gap in differentiation for the minority as compared to the other 3 racial groups.

For ease of comparing the magnitudes of the effects of other independent variables on the outcome variables, we also plotted the raw regression coefficients in the following graphs. Estimates above 0 implies an increase in probability/odds (i.e. positive relationship) in receiving support for work-life balance or employee’s development due to the variable, while those below 0 indicate otherwise.

In terms of the goodness of fit, the results of the analysis suggest that adding the interaction terms has negligible effect in enhancing the explanatory power of the model as the pseudo R-squared of Model 1 and Model 2 were similar to that of Models 3 and Model 4 respectively.

For the purpose of investigating how the degree of supervisory support for employees in the federal government differ across ethnicity and gender, we will keep the interaction variables in our models.

To see the patterns of interaction, we will visualize the significant interaction effects on the next chapter.

Chapter 6: Visualization of Logistic Regression Models

To visualize the logistic regression analysis performed above, we plotted the predicted probabilities across the ethnicity groups. Within each ethnicity group, the predicted probabilities are further differentiated by gender in the later section.

6A. Ethnicity categorical predictors only

Based on the above figures in sub-section 6A, it has shown consistently that the predicted probabilities of White employees to receive support from supervisor on both work-life balance and employee development are much higher than other racial groups. Differentiated treatment due to racial groups appears to exist in the federal workplaces, where employees in Others racial category tend to be at relatively greater disadvantage.

6B. Ethnicity X Gender

In the second part of the logistic regression model, we will include the gender interaction effects to see how the relationship between ethnicity and supervisor support for work-life balance and employee’s development may change.

Based on the figure above, the predicted probability of White male employees in receiving support from supervisor to pursue work-life balance are higher than their White female colleagues. Similar conclusion can be drawn for employees in Others racial category with a wider disparity in predicted probabilities.

Comparing across all racial groups, White male employees are most likely to receive support from their supervisors to balance work and other life priorities, and least likely so for females in Others racial category.

Gender effects are not significant within Black or African American and Asian employees.

We also observe similar relationship when the outcome variable is the supervisor’s support for employee’s development (Q21_binary). The exception lies in that male Black or African American employees have significantly higher predicted probability in receiving support for their development than their female colleagues.

In summary, we can observe that White male employees have consistently higher predicted probabilities of receiving support from supervisors on work-life balance and employee development, relative to males in the other 3 racial groups. No gender difference is observed for Asians, who are relatively similar in their outcomes due to overlapping confidence intervals. Females of Others racial category have notably the lowest predicted probability amongst non-White employees.

Chapter 7: Parameterized Ordinal Logistic Regression Model (POLR)

Given that the ratings of the response variables of interest are on a 5-point Likert scale, we decided to do an extension of the logistic regression model in Part 1 by modelling the relationship using the ordinal logistic regression.

The scale of the response variables of interest (Q19 and Q21) ranges from: 5 - Strongly Agree, 4 - Agree, 3 - Neither Agree nor Disagree, 2 - Disagree, 1 - Strongly Disagree

7A. Intuition for Parameterized Ordinal Logistic Regression (Proportional Odds)

For ordinal logistic regression, the model assumes that none of independent variables has a disproportionate effect on any rating level of the outcome variable, which can be tested by the Brant package. Besides the proportional odds assumption, it also assumes the absence of multicollinearity where independent variables are not significantly correlated as shown by the Variance Inflation Factor (VIF). These assumptions will be addressed in Chapter 11 and the Appendix.

For the purpose of this project, we will run our regression on a smaller random sample of the full dataset due to the high computational demand of running the full dataset.

set.seed(110721)

final_cleaned_data_sample <- final_cleaned_data %>% 
  sample_frac(.20)

7B. Model Specification

Similar to the logistic regression models in Chapter 5, For each model, we will first show the model equations for parameterized ordinal logistic regression, followed by the regression output of each model.

Q19 and Q21 are the outcome variables for Model 5 and Model 6 respectively, while all other predictor variables are the same for both models.

The model equations for Model 5 and Model 6 are as follow:

\[ \begin{aligned} \log\left[ \frac { P( \operatorname{1} \geq \operatorname{2} ) }{ 1 - P( \operatorname{1} \geq \operatorname{2} ) } \right] &= \alpha_{1} + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience}) + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction}) + \epsilon \\ \log\left[ \frac { P( \operatorname{2} \geq \operatorname{3} ) }{ 1 - P( \operatorname{2} \geq \operatorname{3} ) } \right] &= \alpha_{2} + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience}) + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction}) + \epsilon \\ \log\left[ \frac { P( \operatorname{3} \geq \operatorname{4} ) }{ 1 - P( \operatorname{3} \geq \operatorname{4} ) } \right] &= \alpha_{3} + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience}) + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction}) + \epsilon \\ \log\left[ \frac { P( \operatorname{4} \geq \operatorname{5} ) }{ 1 - P( \operatorname{4} \geq \operatorname{5} ) } \right] &= \alpha_{4} + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience}) + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction}) + \epsilon \end{aligned} \]

Model 5 - Parameterized Ordinal Logistic Regression Model - Main Effects only (Q19)

## Call:
## polr(formula = Q19_factor ~ DRNO + gender + ancestry + disability + 
##     above40 + DFEDTEN + work_experience + work_satisfaction, 
##     data = final_cleaned_data_sample, Hess = T)
## 
## Coefficients:
##                                  Value Std. Error  t value
## DRNOBlack or African American -0.26319    0.01946 -13.5233
## DRNOAsian                     -0.46271    0.02853 -16.2157
## DRNOOthers                    -0.28775    0.02657 -10.8295
## gender1                       -0.01342    0.01381  -0.9714
## ancestry                      -0.33118    0.02308 -14.3498
## disability                    -0.02516    0.01945  -1.2936
## above40                       -0.22011    0.01822 -12.0806
## DFEDTENBetween 11 and 20      -0.07496    0.01660  -4.5154
## DFEDTENMore than 20           -0.07814    0.01915  -4.0811
## work_experience                1.11114    0.01498  74.1983
## work_satisfaction              0.76837    0.01327  57.8893
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2   2.2064   0.0406    54.3250
## 2|3   3.1860   0.0396    80.4671
## 3|4   4.2812   0.0402   106.5428
## 4|5   6.7203   0.0440   152.6055
## 
## Residual Deviance: 175465.67 
## AIC: 175495.67
Model 5 - POLR Regression Coefficients
Value Std. Error t value p value
DRNOBlack or African American -0.2631899 0.0194620 -13.5232587 0.0000000
DRNOAsian -0.4627097 0.0285347 -16.2156671 0.0000000
DRNOOthers -0.2877461 0.0265707 -10.8294660 0.0000000
gender1 -0.0134190 0.0138146 -0.9713577 0.3313702
ancestry -0.3311809 0.0230792 -14.3497575 0.0000000
disability -0.0251645 0.0194530 -1.2936020 0.1958029
above40 -0.2201116 0.0182202 -12.0806041 0.0000000
DFEDTENBetween 11 and 20 -0.0749623 0.0166013 -4.5154464 0.0000063
DFEDTENMore than 20 -0.0781408 0.0191468 -4.0811380 0.0000448
work_experience 1.1111374 0.0149752 74.1983470 0.0000000
work_satisfaction 0.7683657 0.0132730 57.8893371 0.0000000
1|2 2.2063853 0.0406146 54.3249936 0.0000000
2|3 3.1859674 0.0395934 80.4670665 0.0000000
3|4 4.2811604 0.0401826 106.5427517 0.0000000
4|5 6.7203044 0.0440371 152.6054774 0.0000000

As seen from Model 5 regression coefficients table, the output shows that for employees who are Black/African American, Asian and Other races, the log odds of increasing the scale rating for supervisor’s support for work-life balance by 1 unit decreases by 0.26, 0.46 and 0.29 points as compared to their White colleagues respectively. Collectively among all respondents, males and females are equally likely to increase or decrease the scale rating for their supervisors.

Model 5 - Odds Ratios with 95% CIs
odds_ratio 2.5 % 97.5 %
DRNOBlack or African American 0.7685959 0.7400658 0.7982285
DRNOAsian 0.6295754 0.5956470 0.6654981
DRNOOthers 0.7499520 0.7122602 0.7897167
gender1 0.9866707 0.9604328 1.0135598
ancestry 0.7180753 0.6866128 0.7510433
disability 0.9751495 0.9389850 1.0127761
above40 0.8024292 0.7742700 0.8315889
DFEDTENBetween 11 and 20 0.9277785 0.8982854 0.9582662
DFEDTENMore than 20 0.9248342 0.8916645 0.9601560
work_experience 3.0378117 2.9500166 3.1283724
work_satisfaction 2.1562395 2.1009141 2.2130137

Another interpretation of the regression results is shown in above table expressed in odds ratios. For employees who are Black/African American, Asian and of Other races, the odds of being in a higher level rating for supervisor’s support for work-life balance by 1 unit decreases by 23%, 37% and 25% respectively relative to White employees, given that all other variables are held constant. This implies that the likelihood of increasing the rating of the supervisors’ support for work-life balance by 1 unit is the lowest is amongst Asian employees. Overall, it seems that there is no gender differences in odds for rating the supervisor’s support for work-life balance but this will be discussed further in models considering gender interaction effects.

Model 6 - Parameterized Ordinal Logistic Regression Model - Main Effects only (Q21)

## Call:
## polr(formula = Q21_factor ~ DRNO + gender + ancestry + disability + 
##     above40 + DFEDTEN + work_experience + work_satisfaction, 
##     data = final_cleaned_data_sample, Hess = T)
## 
## Coefficients:
##                                  Value Std. Error  t value
## DRNOBlack or African American -0.20692    0.01922 -10.7679
## DRNOAsian                     -0.32622    0.02846 -11.4633
## DRNOOthers                    -0.24515    0.02617  -9.3685
## gender1                        0.01024    0.01352   0.7573
## ancestry                      -0.30368    0.02279 -13.3259
## disability                    -0.05601    0.01910  -2.9330
## above40                       -0.26224    0.01781 -14.7267
## DFEDTENBetween 11 and 20      -0.08712    0.01625  -5.3610
## DFEDTENMore than 20           -0.06583    0.01875  -3.5109
## work_experience                1.44967    0.01518  95.4801
## work_satisfaction              0.94038    0.01327  70.8741
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2   3.8038   0.0412    92.4104
## 2|3   5.1378   0.0413   124.4730
## 3|4   6.6291   0.0435   152.5602
## 4|5   9.2229   0.0485   190.0623
## 
## Residual Deviance: 182612.56 
## AIC: 182642.56
Model 6 - POLR Regression Coefficients
Value Std. Error t value p value
DRNOBlack or African American -0.2069208 0.0192165 -10.7678632 0.0000000
DRNOAsian -0.3262162 0.0284573 -11.4633492 0.0000000
DRNOOthers -0.2451483 0.0261673 -9.3685001 0.0000000
gender1 0.0102391 0.0135207 0.7572956 0.4488728
ancestry -0.3036769 0.0227884 -13.3259333 0.0000000
disability -0.0560092 0.0190966 -2.9329500 0.0033576
above40 -0.2622402 0.0178071 -14.7267140 0.0000000
DFEDTENBetween 11 and 20 -0.0871157 0.0162498 -5.3610303 0.0000001
DFEDTENMore than 20 -0.0658332 0.0187513 -3.5108533 0.0004467
work_experience 1.4496733 0.0151830 95.4801318 0.0000000
work_satisfaction 0.9403800 0.0132683 70.8741148 0.0000000
1|2 3.8037632 0.0411617 92.4103530 0.0000000
2|3 5.1377643 0.0412761 124.4730411 0.0000000
3|4 6.6291067 0.0434524 152.5602400 0.0000000
4|5 9.2229195 0.0485258 190.0622589 0.0000000

As seen from Model 6 regression coefficients table, the output shows that for employees who identify themselves as Black or African American, Asian and Others racial categories, the log odds of giving a higher scale rating by 1 unit for supervisor’s support for employee’s development decreases by 0.21, 0.33 and 0.25 points as compared to their White colleagues respectively. The log odds are not statistically different between males and females federal employees.

Model 6 - Odds Ratios with 95% CIs
odds_ratio 2.5 % 97.5 %
DRNOBlack or African American 0.8130840 0.7830538 0.8442982
DRNOAsian 0.7216491 0.6825780 0.7630796
DRNOOthers 0.7825885 0.7435056 0.8237948
gender1 1.0102917 0.9839102 1.0373900
ancestry 0.7380993 0.7059074 0.7717703
disability 0.9455304 0.9108588 0.9816098
above40 0.7693262 0.7429272 0.7966398
DFEDTENBetween 11 and 20 0.9165711 0.8878381 0.9462294
DFEDTENMore than 20 0.9362871 0.9025010 0.9713344
work_experience 4.2617219 4.1368669 4.3905517
work_satisfaction 2.5609543 2.4953971 2.6284533

Another interpretation from the perspective of using odds ratio seen in Model 6 is that for employees who are Black/African American, Asian and Other races, the odds of an increase in 1 unit in Likert scale rating for supervisor’s support for employee’s development decreases by 19%, 28% and 22% respectively relative to White employees. Asian employees are least likely to give a higher rating of supervisors’ support in the area of employee development. Overall, it seems that there is no gender differences for rating the supervisor’s support for work-life balance but this will be discussed further in models considering gender interaction effects.

 

To explore the gender interaction effects using the parameterized ordinal logistic regression model, we will add the interaction terms between each ethnicity group with gender in addition to the equations used in Chapter 7B.

The model equations for Model 7 and Model 8 are as follow:

\[ \begin{aligned} \log\left[ \frac { P( \operatorname{1} \geq \operatorname{2} ) }{ 1 - P( \operatorname{1} \geq \operatorname{2} ) } \right] &= \alpha_{1} + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience})\ + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction})\ + \beta_{12}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}} \times \operatorname{gender})\ + \\ &\quad \beta_{13}(\operatorname{DRNO}_{\operatorname{Asian}} \times \operatorname{gender}) + \beta_{14}(\operatorname{DRNO}_{\operatorname{Others}} \times \operatorname{gender}) + \epsilon \\ \log\left[ \frac { P( \operatorname{2} \geq \operatorname{3} ) }{ 1 - P( \operatorname{2} \geq \operatorname{3} ) } \right] &= \alpha_{2} + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience})\ + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction})\ + \beta_{12}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}} \times \operatorname{gender})\ + \\ &\quad \beta_{13}(\operatorname{DRNO}_{\operatorname{Asian}} \times \operatorname{gender}) + \beta_{14}(\operatorname{DRNO}_{\operatorname{Others}} \times \operatorname{gender}) + \epsilon \\ \log\left[ \frac { P( \operatorname{3} \geq \operatorname{4} ) }{ 1 - P( \operatorname{3} \geq \operatorname{4} ) } \right] &= \alpha_{3} + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience})\ + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction})\ + \beta_{12}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}} \times \operatorname{gender})\ + \\ &\quad \beta_{13}(\operatorname{DRNO}_{\operatorname{Asian}} \times \operatorname{gender}) + \beta_{14}(\operatorname{DRNO}_{\operatorname{Others}} \times \operatorname{gender}) + \epsilon \\ \log\left[ \frac { P( \operatorname{4} \geq \operatorname{5} ) }{ 1 - P( \operatorname{4} \geq \operatorname{5} ) } \right] &= \alpha_{4} + \beta_{1}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}}) + \beta_{2}(\operatorname{DRNO}_{\operatorname{Asian}}) + \beta_{3}(\operatorname{DRNO}_{\operatorname{Others}})\ + \\ &\quad \beta_{4}(\operatorname{gender}) + \beta_{5}(\operatorname{ancestry}) + \beta_{6}(\operatorname{disability}) + \beta_{7}(\operatorname{above40})\ + \\ &\quad \beta_{8}(\operatorname{DFEDTEN}_{\operatorname{Between\ 11\ and\ 20}}) + \beta_{9}(\operatorname{DFEDTEN}_{\operatorname{More\ than\ 20}}) + \beta_{10}(\operatorname{work\_experience})\ + \\ &\quad \beta_{11}(\operatorname{work\_satisfaction})\ + \beta_{12}(\operatorname{DRNO}_{\operatorname{Black\ or\ African\ American}} \times \operatorname{gender})\ + \\ &\quad \beta_{13}(\operatorname{DRNO}_{\operatorname{Asian}} \times \operatorname{gender}) + \beta_{14}(\operatorname{DRNO}_{\operatorname{Others}} \times \operatorname{gender}) + \epsilon \end{aligned} \]

Model 7 - Parameterized Ordinal Logistic Regression Model - Ethnicity x Gender (Q19)

## Call:
## polr(formula = Q19_factor ~ DRNO + gender + DRNO * gender + ancestry + 
##     disability + above40 + DFEDTEN + work_experience + work_satisfaction, 
##     data = final_cleaned_data_sample, Hess = T)
## 
## Coefficients:
##                                           Value Std. Error   t value
## DRNOBlack or African American         -0.268536    0.02552 -10.52429
## DRNOAsian                             -0.506707    0.04168 -12.15849
## DRNOOthers                            -0.407996    0.03760 -10.84968
## gender1                               -0.036148    0.01627  -2.22195
## ancestry                              -0.333107    0.02308 -14.42997
## disability                            -0.025374    0.01946  -1.30391
## above40                               -0.219406    0.01822 -12.03897
## DFEDTENBetween 11 and 20              -0.075253    0.01661  -4.53185
## DFEDTENMore than 20                   -0.078425    0.01917  -4.09156
## work_experience                        1.111627    0.01498  74.22118
## work_satisfaction                      0.768457    0.01327  57.89484
## DRNOBlack or African American:gender1  0.001118    0.03928   0.02847
## DRNOAsian:gender1                      0.079678    0.05676   1.40368
## DRNOOthers:gender1                     0.237701    0.05302   4.48341
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2   2.1947   0.0410    53.5543
## 2|3   3.1745   0.0400    79.4267
## 3|4   4.2698   0.0405   105.2970
## 4|5   6.7092   0.0444   151.2243
## 
## Residual Deviance: 175444.09 
## AIC: 175480.09
Model 7 - POLR Regression Coefficients
Value Std. Error t value p value
DRNOBlack or African American -0.2685364 0.0255159 -10.5242931 0.0000000
DRNOAsian -0.5067065 0.0416751 -12.1584898 0.0000000
DRNOOthers -0.4079963 0.0376044 -10.8496849 0.0000000
gender1 -0.0361477 0.0162684 -2.2219550 0.0262863
ancestry -0.3331074 0.0230844 -14.4299684 0.0000000
disability -0.0253741 0.0194600 -1.3039065 0.1922655
above40 -0.2194060 0.0182246 -12.0389719 0.0000000
DFEDTENBetween 11 and 20 -0.0752534 0.0166055 -4.5318488 0.0000058
DFEDTENMore than 20 -0.0784250 0.0191675 -4.0915621 0.0000428
work_experience 1.1116275 0.0149772 74.2211834 0.0000000
work_satisfaction 0.7684567 0.0132733 57.8948396 0.0000000
DRNOBlack or African American:gender1 0.0011182 0.0392778 0.0284686 0.9772884
DRNOAsian:gender1 0.0796778 0.0567634 1.4036829 0.1604133
DRNOOthers:gender1 0.2377014 0.0530180 4.4834072 0.0000073
1|2 2.1947238 0.0409812 53.5543435 0.0000000
2|3 3.1744743 0.0399673 79.4267212 0.0000000
3|4 4.2697891 0.0405499 105.2970389 0.0000000
4|5 6.7092300 0.0443661 151.2242791 0.0000000

As evidenced from Model 7 , the log odds of male employees who belong to Others race category giving a higher scale rating by 1 unit for supervisor’s support for work-life balance relative to females increase by 0.24 points. However, the log odds of White male employees decreases by 0.04 points as compared to White females. There are no significant gender differences in terms of log odds of giving a higher scale rating by 1 unit for Black/African Americans and Asians employees.

To evaluate whether the interactions are significant, we use the ANOVA function from the car package. The ANOVA result shows that the interaction between DRNO and gender is significant.
Model 7 - Analysis of Deviance Table (Type III tests)
LR Chisq Df Pr(>Chisq)
DRNO 294.917761 3 0.0000000
gender 4.943669 1 0.0261864
ancestry 204.712100 1 0.0000000
disability 1.699112 1 0.1924041
above40 145.830362 1 0.0000000
DFEDTEN 24.501065 2 0.0000048
work_experience 5725.485214 1 0.0000000
work_satisfaction 3393.975827 1 0.0000000
DRNO:gender 21.578390 3 0.0000798
Model 7 - Odds Ratios with 95% CIs
odds_ratio 2.5 % 97.5 %
DRNOBlack or African American 0.7644976 0.7272039 0.8037186
DRNOAsian 0.6024766 0.5556807 0.6537650
DRNOOthers 0.6649813 0.6181954 0.7154407
gender1 0.9644978 0.9341877 0.9957393
ancestry 0.7166932 0.6852720 0.7496071
disability 0.9749451 0.9387744 1.0125808
above40 0.8029957 0.7748100 0.8321706
DFEDTENBetween 11 and 20 0.9275084 0.8978190 0.9581641
DFEDTENMore than 20 0.9245714 0.8904689 0.9599673
work_experience 3.0393008 2.9515095 3.1299646
work_satisfaction 2.1564357 2.1011115 2.2133266
DRNOBlack or African American:gender1 1.0011188 0.9270084 1.0805397
DRNOAsian:gender1 1.0829381 0.9698571 1.2089720
DRNOOthers:gender1 1.2683304 1.1433057 1.4070905

From the perspective of odds ratios, the estimated odds for male employees who are in the Others racial category increasing their rating of their supervisor’s support for work-life balance by 1 unit are 1.26 times the odds for female employees. For White male employees, the estimated odds of increasing their rating of their supervisor’s support for work-life balance by 1 unit decreases very slightly by 4% relative to female employees. We do not observe significant differences in gender effects for employees who are Black/African Americans and Asians in their odds for increasing their rating of their supervisors by 1 Likert scale unit.

Model 8 - Parameterized Ordinal Logistic Regression Model - Ethnicity x Gender (Q21)

## Call:
## polr(formula = Q21_factor ~ DRNO + gender + DRNO * gender + ancestry + 
##     disability + above40 + DFEDTEN + work_experience + work_satisfaction, 
##     data = final_cleaned_data_sample, Hess = T)
## 
## Coefficients:
##                                          Value Std. Error t value
## DRNOBlack or African American         -0.25351    0.02514 -10.084
## DRNOAsian                             -0.37884    0.04155  -9.117
## DRNOOthers                            -0.35914    0.03690  -9.733
## gender1                               -0.02622    0.01585  -1.654
## ancestry                              -0.30599    0.02279 -13.424
## disability                            -0.05717    0.01910  -2.993
## above40                               -0.26210    0.01781 -14.715
## DFEDTENBetween 11 and 20              -0.08664    0.01625  -5.331
## DFEDTENMore than 20                   -0.06394    0.01877  -3.406
## work_experience                        1.44973    0.01518  95.478
## work_satisfaction                      0.94034    0.01327  70.870
## DRNOBlack or African American:gender1  0.10148    0.03886   2.611
## DRNOAsian:gender1                      0.09479    0.05665   1.673
## DRNOOthers:gender1                     0.22469    0.05221   4.304
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2   3.7823   0.0415    91.1879
## 2|3   5.1163   0.0416   123.0306
## 3|4   6.6078   0.0437   151.0664
## 4|5   9.2021   0.0488   188.6746
## 
## Residual Deviance: 182588.40 
## AIC: 182624.40
Model 8 - POLR Regression Coefficients
Value Std. Error t value p value
DRNOBlack or African American -0.2535128 0.0251391 -10.084405 0.0000000
DRNOAsian -0.3788448 0.0415526 -9.117225 0.0000000
DRNOOthers -0.3591365 0.0368975 -9.733346 0.0000000
gender1 -0.0262172 0.0158480 -1.654288 0.0262863
ancestry -0.3059855 0.0227941 -13.423886 0.0000000
disability -0.0571669 0.0191032 -2.992524 0.1922655
above40 -0.2620973 0.0178111 -14.715390 0.0000000
DFEDTENBetween 11 and 20 -0.0866450 0.0162538 -5.330745 0.0000058
DFEDTENMore than 20 -0.0639356 0.0187715 -3.405994 0.0000428
work_experience 1.4497282 0.0151839 95.478066 0.0000000
work_satisfaction 0.9403381 0.0132684 70.870309 0.0000000
DRNOBlack or African American:gender1 0.1014838 0.0388613 2.611435 0.9772884
DRNOAsian:gender1 0.0947903 0.0566522 1.673198 0.1604133
DRNOOthers:gender1 0.2246898 0.0522073 4.303803 0.0000073
1|2 3.7822530 0.0414776 91.187946 0.0000000
2|3 5.1163236 0.0415858 123.030592 0.0000000
3|4 6.6077959 0.0437410 151.066402 0.0000000
4|5 9.2020683 0.0487722 188.674604 0.0000000

As evidenced from Model 8 , the log odds of male employees who belong to Others race category giving a higher scale rating by 1 unit for supervisor’s support for employee’s development relative to females increase by 0.22 points. On the contrary, the log odds of White male employees giving a higher scale rating by 1 unit for supervisor’s support for employee’s development relative to females decreases very slightly by 0.03 points. There are no significant gender differences in terms of log odds of giving a higher scale rating by 1 unit for Black/African Americans and Asians.

Similar to what we have done for Model 7, we use the ANOVA function from the car package. The ANOVA result shows that the interaction between DRNO and gender is significant.

Model 8 - Analysis of Deviance Table (Type III tests)
LR Chisq Df Pr(>Chisq)
DRNO 220.112758 3 0.0000000
gender 2.738216 1 0.0979741
ancestry 178.128484 1 0.0000000
disability 8.936102 1 0.0027959
above40 217.784730 1 0.0000000
DFEDTEN 28.895518 2 0.0000005
work_experience 9696.510782 1 0.0000000
work_satisfaction 5128.300414 1 0.0000000
DRNO:gender 24.159746 3 0.0000231
Model 8 - Odds Ratios with 95% CIs
odds_ratio 2.5 % 97.5 %
DRNOBlack or African American 0.7760698 0.7387783 0.8152724
DRNOAsian 0.6846519 0.6312876 0.7426597
DRNOOthers 0.6982790 0.6496450 0.7506941
gender1 0.9741235 0.9443234 1.0048510
ancestry 0.7363973 0.7042754 0.7700431
disability 0.9444364 0.9097996 0.9804568
above40 0.7694362 0.7430023 0.7967458
DFEDTENBetween 11 and 20 0.9170026 0.8882420 0.9466633
DFEDTENMore than 20 0.9380654 0.9041993 0.9731952
work_experience 4.2619561 4.1371660 4.3908870
work_satisfaction 2.5608470 2.4952205 2.6284359
DRNOBlack or African American:gender1 1.1068120 1.0258429 1.1943653
DRNOAsian:gender1 1.0994282 0.9839471 1.2285032
DRNOOthers:gender1 1.2519343 1.1302048 1.3867879

From the above odds ratio table for Model 8, the estimated odds for male employees who identify themselves as Black or African American and Others racial categories increasing their rating of their supervisor’s support on employee’s development by 1 unit are about 1.11 and 1.25 times the odds for female employees respectively. There is no significant difference between genders for White and Asian employees in their odds for increasing their rating of their supervisors by 1 Likert scale unit.

Chapter 8: Visualization of Parameterized Ordinal Logistic Regression

In Chapter 8, we subsequently calculate the predicted probabilities and odds ratios for various combination of focal predictors - ethnicity groups and gender, while holding other predictors at their fixed values.

We will first plot the predicted probabilities of Model 5 and Model 6 in sub-section 8A.

In sub-section 8B, we plot the predicted odds ratio across different ethnicity groups for males and females separately. The effect displays are created with “latent” option activated. In these plots, the y axis is on the logit scale, which we interpret to be a latent, or hidden, scale from which the ordered categories are derived.

8A. Ethnicity categorical predictors only

Collectively based on the effect displays of Model 5 & Model 6, it appears that generally federal employees are highly likely to rate their supervisor’s support for work-life balance and development on the scale of 4 and higher.

White employees are very likely to give the highest rating of 5 on their supervisors’ support for work-life balance relative to other racial groups, while Black/African Americans and Asian employees are more likely to give a slightly lower rating of 4. As for supervisor’s support on employee development, all racial groups are almost equally likely to give a slightly lower rating of 4.

8B. Ethnicity x Gender

 

As seen from the effect displays of Model 7 & Model 8, it appears that the predicted scale rating of White female employees on their supervisors’ support for work-life balance is 5, which are higher relative to females in the other 3 racial groups ranging from 4 to borderline 4-5. The predicted scale ratings of females who identify themselves as African Americans and Others are lowest at 4.

For male employees, the predicted scale rating of White male employees on their supervisors’ support for work-life balance is also 5, but we do observe males who are Black or African Americans and Others have some degree of uncertainty at borderline 4-5. Asian male employees has the lowest predicted rating of 4.

As for employees’ development, the predicted scale rating of all federal employees is 4. Female employees who are Whites are far more likely to give a better rating for their supervisors than non-White female employees. Similar trend is observed for male employees, but they are relatively more similar in their predicted rating outcome due to overlaps in the confidence intervals. Overall, Asian employees are least likely among all racial groups to rate their supervisors’ support towards work-life balance and employee’s development more favorably.

Chapter 9: Conclusion

From our data science project, we could find the following two findings:

  1. The relationships between ethnicity and supervisor’s support (binary) in terms of work-life balance and employee’s development differ depending on one’s gender. In particular, male employees who identify themselves as Black or African Americans are less likely to agree than females with the statement that their supervisors supported their needs to balance work and other life issues and personal development. This correlation seems to validate the phenomenon that “Black males may face a different social reality (including interpersonal relationships at workplace) from their female counterparts”. However, such gender disparity is not observed among Asian federal employees. On the contrary, White males are more likely to agree as compared to their White female colleagues with the statement that their supervisors support their need for work-life balance and personal development and even more so for males who identify themselves in the Others racial category. This may be attributed to the fact that respondents in the Others racial category form the smallest representation in the OPM Federal Employee Viewpoint Survey are better performers in their field of practice employed by the U.S. Federal Government and thus are more favored and/or receive better support from their supervisors.

  2. Taking into account the ordinal nature of the ratings, the relationship between ethnicity and the degree of supervisor’s support towards work-life balance and employee’s development also differ depending on one’s gender. It appears that, among those who identify themselves in the Others racial category, the estimated odds of giving a higher rating (i.e. from 4 to 5) on their supervisor’s support for work-life balance and employee development are higher for males than for females. An opposite relationship is seen among White employees, although the difference in odds ratio between genders is extremely small at 4%. As for Black/African employees, we do not observe significant difference between genders in giving a higher rating for their supervisor’s support for work-life balance but a 10% increase in odds ratio for Black males employees rating their supervisors favorably (e.g. scale rating from 4 to 5) in their support for employee development. This seem to contradict the earlier observation when we look at binary outcome variables for Black/African American and may require further investigation into our model assumptions for ordinal logistic regression. No gender differences are observed among Asian employees in terms of rating their supervisor’s support for work-life balance and employee development.

Chapter 10: Implications

Despite the anti-discriminatory legislation and frameworks in place, our exploratory data analysis reflects a disconnect between the growing commitment to racial and gender equality and the lack of improvement that employees face in the day-to-day experiences of color and gender. Individuals who are non-White remain far more likely than Whites (or White males in particular) to be on the receiving end of fewer promotional or development opportunities and/or less likely to get the support they need.

It corroborates with a very recent work of Mckinsey & Company which have shown women of color lose ground to White women and men of color in corporate America. Especially in times of disruption that drives a fundamental change in the way people work, it is imperative for companies to be fully on board to create a culture that focuses on employee well-being, racial and gender equity, shaping an environment to make workers feel more engaged and valued at work. If managers are assessed and rewarded when such goals are met, it could potentially address the discrimination gap and lead to better corporate performance and financial gains.

Chapter 11: Limitations and Future Directions

Most of our predictor variables are binary or categorical, which may capture less information than what we hope to. In the case of age and number of years of federal work experience, such information would be valuable if the survey questions are designed to allow respondents to indicate numeric input.

In addition, greater care is needed when it comes to working with observational data where our exploratory analysis indicate that these variables are correlated with each other, and not causally linked. This project could be expanded to combine datasets across the years to form panel data, where we could use mixed effects methods to address within-subjects and between-subject changes and establish casual relationships.

Lastly, we need to look into the underlying assumption of proportional odds using the Brant Test (see Appendix) to check if parallel regression holds. Given that the p-values are less than the alpha of 0.05, we will have to explore different regressions models that relax this assumption to describe the relationship between each pair of outcome groups which is beyond the scope of the discussion.

In future, we can further improve the model with clustering to allow the algorithm to search for the optimal number of hidden categorizations instead of comparing across the 4 racial categories predefined in the survey.

Chapter 12: References

Philip N. Cohen, “The Persistence of Workplace Gender Segregation in the US,” Sociology Compass 7, no. 11 (2013): pp. 889-899, https://doi.org/10.1111/soc4.12083.

Richard V. Reeves, Sarah Nzau, and Ember Smith, “The Challenges Facing Black Men – and the Case for Action,” Brookings (Brookings Institution, November 19, 2020), https://www.brookings.edu/blog/up-front/2020/11/19/the-challenges-facing-black-men-and-the-case-for-action/.

“Women in the Workplace 2021,” McKinsey & Company (McKinsey & Company, September 27, 2021), https://www.mckinsey.com/featured-insights/diversity-and-inclusion/women-in-the-workplace.

Appendix

1. Brant Test for Ordinal Logistic Regression

library(brant)

brant(polr_model13)
## -------------------------------------------------------------------- 
## Test for             X2  df  probability 
## -------------------------------------------------------------------- 
## Omnibus                  745.28  33  0
## DRNOBlack or African American    28.65   3   0
## DRNOAsian                13.53   3   0
## DRNOOthers               27.6    3   0
## gender1                  93.12   3   0
## ancestry             35.73   3   0
## disability               34.88   3   0
## above40                  70.83   3   0
## DFEDTENBetween 11 and 20     1.08    3   0.78
## DFEDTENMore than 20          8.23    3   0.04
## work_experience          101.2   3   0
## work_satisfaction            151.82  3   0
## -------------------------------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds
brant(polr_model15)
## -------------------------------------------------------------------- 
## Test for             X2  df  probability 
## -------------------------------------------------------------------- 
## Omnibus                  579.87  33  0
## DRNOBlack or African American    26.16   3   0
## DRNOAsian                23.98   3   0
## DRNOOthers               19.87   3   0
## gender1                  86.3    3   0
## ancestry             35.75   3   0
## disability               39.41   3   0
## above40                  22.42   3   0
## DFEDTENBetween 11 and 20     11.09   3   0.01
## DFEDTENMore than 20          52.23   3   0
## work_experience          155.39  3   0
## work_satisfaction            89.96   3   0
## -------------------------------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds
brant(polr_model16)
## ---------------------------------------------------------------------------- 
## Test for                 X2  df  probability 
## ---------------------------------------------------------------------------- 
## Omnibus                      754.74  42  0
## DRNOBlack or African American        8.58    3   0.04
## DRNOAsian                    10.13   3   0.02
## DRNOOthers                   12.29   3   0.01
## gender1                      72.99   3   0
## ancestry                 35.51   3   0
## disability                   34.91   3   0
## above40                      70.99   3   0
## DFEDTENBetween 11 and 20         1.13    3   0.77
## DFEDTENMore than 20              8.06    3   0.04
## work_experience              101.04  3   0
## work_satisfaction                152.7   3   0
## DRNOBlack or African American:gender1    7.37    3   0.06
## DRNOAsian:gender1                5.7 3   0.13
## DRNOOthers:gender1               2.1 3   0.55
## ---------------------------------------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds
brant(polr_model18)
## ---------------------------------------------------------------------------- 
## Test for                 X2  df  probability 
## ---------------------------------------------------------------------------- 
## Omnibus                      583.64  42  0
## DRNOBlack or African American        15.8    3   0
## DRNOAsian                    16.31   3   0
## DRNOOthers                   9.65    3   0.02
## gender1                      68.88   3   0
## ancestry                 36.03   3   0
## disability                   39.52   3   0
## above40                      22.28   3   0
## DFEDTENBetween 11 and 20         11.1    3   0.01
## DFEDTENMore than 20              52.34   3   0
## work_experience              155.72  3   0
## work_satisfaction                90.18   3   0
## DRNOBlack or African American:gender1    0.43    3   0.93
## DRNOAsian:gender1                5.75    3   0.12
## DRNOOthers:gender1               3.33    3   0.34
## ---------------------------------------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds

 

2. Diagnostics of multicollinearity

Variance inflation factor (VIF) helps in formal detection-tolerance for multicollinearity. Given that the correlation matrix has shown that work_experience and work_satisfaction are highly correlated with a coefficient of 0.817, this indicates a potential problem of multicollinearity and the need for further investigation. Hence, we run a multiple linear regression with the same predictor and outcome variables as our logistic regression models and compute the VIF using car package. Since VIF for all variables are below 4, this indicates an absence of multicollinearity issue.

model1_lm <- lm(Q19_binary ~ DRNO + gender + ancestry + disability + above40 + 
                DFEDTEN + work_experience + work_satisfaction, data = final_cleaned_data)

car::vif(model1_lm) %>%
DT::datatable()

 

model2_lm <- lm(Q21_binary ~ DRNO + gender + ancestry + disability + above40 + 
                DFEDTEN + work_experience + work_satisfaction, data = final_cleaned_data)

car::vif(model2_lm) %>%
DT::datatable()