Perceived Barriers on Data Scientists' Job Satisfaction

Python

Project Overview

This project was part of a course assignment. The objective was to explore the impact of perceived barriers (organizational, support, and technical) on data scientists' job satisfaction.

The dataset was obtained from the "2017 Kaggle Machine Learning & Data Science Survey" and was filtered down to 205 rows and 13 columns.

The dataset and the notebook.

Introduction

In recent years, data and "big data" have become vital resources for organizations, driving customer and economic value (Hartmann et al., 2016). Consequently, there's a high demand for individuals proficient in data technologies at both operational and strategic levels (Lee et al., 2014). This shift has led to the emergence of the data scientist role, often called "the sexiest job of the 21st century" (Davenport & Patil, 2012, 2022).

However, the lack of a clear definition of a data scientist's responsibilities leads to job market mismatches and unmet job expectations, resulting in decreased job satisfaction and higher turnover (Taris et al., 2006; Maden et al., 2016; Yu & Davis, 2019). Additionally, many businesses face "soft" barriers, such as a lack of understanding of how to use analytics effectively, further impacting job satisfaction (LaValle et al., 2011).

A researcher in HRM aims to explore these issues, focusing on the question:

"To what extent do perceived organizational support and technical barriers influence a data scientist’s job satisfaction?"

Descriptive Statistics

From the table, several salient aspects can be derived as follows:
  • The CompensationAmount has an extremely wide range, from 4.36 to 350000, with a large standard deviation. This suggests a highly skewed distribution where a few very high salaries may be pulling the average up, which is common in income data.
  • On average, respondents reported a level of RemoteWork of 2.54, with a low standard deviation, indicating some consistency in remote work practices.
  • There is considerable variation in EmployerSize, with a standard deviation (4195.24) larger than the mean (3490.15), which indicates a non-normal distribution with potentially a few very large employers.

Correlation Analysis

Several notable correlations that derive from the heatmap:
  • There is a moderate positive correlation between Age and Compensation (r =  46**). This suggests that as employees get older, they tend to earn more, which could reflect career progression and accumulation of experience.
  • Tenure is moderately correlated with both Age (r = 0.46**) and EmployerSize (r = 0.39**). This indicates that longer tenure is associated with being older and working for a larger employer, which could suggest loyalty or a lack of mobility among older employees or those at larger companies.
  • There is significant negative correlation between JobSatisfaction and BarriersSupport (r = −0.26**), which may imply that as support barriers increase, job satisfaction decreases.
  • A surprising finding in some negative correlations between JobFunction with EmployerSize, which could indicate that specific roles are outsourced or not as prevalent in larger organizations.
  • A continuative finding with the lack of significant correlation between Gender and JobSatisfaction, suggesting that within this dataset, gender does not have a straightforward relationship with how job satisfaction is perceived or reported.

Conceptual Model

Based on the research question, the researcher developed the following conceptual model:
This model summarizes a theory that the researcher developed and that explains the relationship between, on the one hand, perceived organizational barriers and job satisfaction, on the other hand. From this theory, the following hypotheses follow:
  • Hypothesis 1:  As the perceived level of barriers increases, the job satisfaction level of employees decreases
  • Hypothesis 2:  Job-level features interact with perceived barriers in the generation of job satisfaction

Regression Analysis

Regression analysis is used to examine the relationship between a dependent variable and one or more independent variables. It helps determine the extent to which changes in the independent variables are associated with changes in the dependent variable. By analyzing the regression coefficients, the significance and direction of these relationships can be assessed and allowing to draw conclusions about the impact of each independent variable on the dependent variable.
## OLS Regression Results

| Step | Variable                           | Model 1             | Model 2             | Model 3             |
|------|------------------------------------|---------------------|---------------------|---------------------|
| Step 1: controls                          |                     |                     |                     |
| 1    | Constant                           | 3.34*** (0.31)      | 4.21*** (0.55)      | 2.89 † (1.62)       |
| 2    | scale(Age)                         | 0.04 (0.08)         | 0.04 (0.08)         | 0.04 (0.08)         |
| 3    | Gender                             | -0.26 (0.22)        | -0.29 (0.22)        | 1.01 (1.67)         |
| 4    | scale(EmployerSize)                | -0.06 (0.07)        | -0.04 (0.07)        | -0.06 (0.07)        |
| 5    | EmploymentDummy                    | -0.10 (0.34)        | -0.25 (0.33)        | -0.26 (0.34)        |
| 6    | JobFunctionDummy01                 | -0.52* (0.23)       | -0.47* (0.22)       | -0.45* (0.22)       |
| 7    | JobFunctionDummy02                 | -0.07 (0.15)        | -0.08 (0.15)        | -0.05 (0.15)        |
| 8    | scale(CompensationAmount)          | 0.02 (0.08)         | 0.00 (0.08)         | -0.59 (0.50)        |
| 9    | RemoteWork                         | 0.10 (0.09)         | 0.05 (0.08)         | 0.06 (0.09)         |
| 10   | scale(Tenure)                      | 0.05 (0.08)         | 0.04 (0.08)         | 0.04 (0.08)         |
| Step 2: main effects                      |                     |                     |                     |
| 11   | BarriersOrganizational             |                     | -0.01 (0.11)        | -0.29 (0.38)        |
| 12   | BarriersSupport                    |                     | -0.41*** (0.11)     | 0.28 (0.40)         |
| 13   | BarriersTechnical                  |                     | 0.17 (0.11)         | 0.19 (0.46)         |
| Step 3: interaction effects               |                     |                     |                     |
| 14   | 7 x 10                             |                     |                     | 0.07 (0.14)         |
| 15   | 7 x 11                             |                     |                     | 0.04 (0.11)         |
| 16   | 7 x 12                             |                     |                     | 0.10 (0.14)         |
| 17   | 2 x 10                             |                     |                     | 0.31 (0.40)         |
| 18   | 2 x 11                             |                     |                     | -0.73† (0.42)       |
| 19   | 2 x 12                             |                     |                     | -0.03 (0.47)        |
|      | R-squared                          | 0.06                | 0.13                | 0.15                |
|      | F-statistic                        | 1.274               | 2.312**             | 1.88*               |

**Note:** 
- *** p < .001; **p < .01; *p < .05; † p < .10
- Standardized coefficients are reported with standard errors in parentheses.
Based  on result, an interpretation can be obtained which involves examining the relationships posited by two hypotheses using the regression models.  
  • First,  the number of observations in the regression analysis  n = 205, indicates a moderately sized sample. This size allows for some degree of confidence in the generalizability of the results, but it is also small enough that one might be cautious about the power of the analysis, particularly when it comes to detecting the significance of  interaction effects.
  • Second, the model fit in regression analysis can be assessed using several statistics, including the R-squared (R²) and the F-statistic. It can be seen that the R-squared increases from model 1 to model 3, indicating that adding main effect and interaction effects incrementally improves the model’s explanation of job satisfaction variance. However, the change in R-squared from model 2 to model 3 is the same as from model 1 to model 2 which is 0.07, indicating that the interaction effects do not add additional explanatory power beyond the main effects already captured in model 2. The F-statistics show that models 2 and 3 are significantly better at predicting job satisfaction than a model with no predictors, with model 2 showing a stronger improvement over model 1 than model 3 does over model 2.
  • Lastly, based on the results, some conclusions for the hypothesis can be obtained. From the complex model, model 3, the proposed hypothesis 1 is not supported. Even though, BarriersOrganizational has a negative coefficient, it is not statistically significant (p > .05). This indicates that there is no direct relationship between organizational barriers and job satisfaction in this model. Similarly, the coefficients for BarriersSupport and BarriersTechnical are also non-significant, suggesting that these barriers do not have a direct effect on job satisfaction when interaction terms are included. The hypothesis 2 is also not supported by the result of model 3. The interaction terms (compensation and perceived barriers denoted as '7 x 10', '7 x 11', '7 x 12'; and gender and perceived barriers as '2 x 10', '2 x 11', and '2 x 12') included in the model are not statistically significant, with p-values greater than the conventional threshold of 0.05. This indicates that there is no evidence from this model to suggest that the relationship between job satisfaction and perceived barriers is moderated by the job-level features (Gender, CompensationAmount) included in these interaction terms. However, one interaction term between Gender and BarriersSupport (2 x 11) shows a marginally significant negative effect (p  < .10), suggesting a potential moderating effect of gender on the impact of support barriers on job satisfaction, but this effect is not strong enough to fully confirm the hypothesis.

Conclusion

In conclusion, result from model 3 cannot support hypothesis 1 or hypothesis 2 conclusively. However, it's important to acknowledge that in model 2, BarriersSupport did have a statistically significant negative impact on job satisfaction. Therefore, if the evidence across all models is considered, that there is possibility of partial support for Hypothesis 1 — that perceived barriers can negatively affect job satisfaction — but this support weakens when interaction terms are introduced in Model 3.

References

  • Davenport, T. H., & Patil, D. J. (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, 90(10 (October 2012)), 70-76.
  • Davenport, T. H., & Patil, D. J. (2022). Is Data Scientist Still the Sexiest Job of the 21st Century? Harvard Business Review. Retrieved from https://hbr.org/2022/07/is-datascientist-still-the-sexiest-job-of-the-21st-century.
  • Hartmann, P. M., Zaki, M., Feldmann, N., & Neely, A. (2016). Capturing value from big data - A taxonomy of data-driven business models used by start-up firms. International Journal of Operation & Production Management, 36, 1382–1406.
  • LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S., & Kruschwitz, N. (2011). Big Data, Analytics and the Path From Insights to Value. MIT Sloan Management Review, 52(2), 20-32.
  • Lee, Y., Madnick, S., Wang, R., Wang, F., & Zhang, H. (2014). A cubic framework for the chief data officer: Succeeding in a world of big data. MIS Quarterly Executive, 13(1), 1-13.
  • Taris, T. W., Feij, J. A., & Capel, S. (2006). Great Expectations – and What Comes of it: The Effects of Unmet Expectations on Work Motivation and Outcomes Among Newcomers. International Journal of Selection and Assessment, 14(3), 256-268. doi:10.1111/j.1468-2389.2006.00350.x
  • Yu, K. Y. T., & Davis, H. M. (2019). Integrating job search behavior into the study of job seekers' employer knowledge and organisational attraction. The International Journal of Human Resource Management, 30(9), 1448-1476. doi:10.1080/09585192.2017.1288152