Case Studies

HRPA

Human Resources Predictive Analytics

Artificial Intelligence

Challenge

A multinational company with more than 10,000 employees has been experiencing twice as many exits as its competitors for several years (>10% vs. ~5%).

The company's need is twofold:

  • Understand who is most likely to leave
  • Understand why

The company's HR department is tasked with providing these answers to management and identifying feasible retention strategies, but their success has been marginal. Moreover, the outcome of each strategy can only be evaluated at the end of the year, thus making the trial-and-error-repeat procedure too time-consuming (and money, as a result).

Hence, our proposal to apply Machine Learning to the data the company already had to create a model that could simultaneously provide high-quality interpretable forecasts

Solution

Given the complexity of the problem and the noisiness of the data, we realized that a single estimator, however powerful, would not be able to provide us with state-of-the-art results. We therefore decided to tackle the challenge using a more sophisticated approach called Ensemble Learning.

Ensemble Learning involves combining multiple estimators into a “meta-estimator” with the highest predictive power. Well-known examples of meta-estimators are random forests or gradient boosting. However, Ensemble Learning itself is a definition that includes a number of techniques, among which we have chosen to implement the most powerful and yet unexplored one: Stacked Generalization.

This leads to an enormous level of complexity. In fact, building a successful stacked ensemble model requires careful choice:

  • The individual estimators;
  • The hyperparameters of each estimator;
  • The architecture used to combine them.

Stacked Generalization differs from all other Ensemble Learning techniques because it is the only one that makes use of heterogeneous estimators. While random forests and gradient boosting algorithms use decision trees and neural networks use neurons, a stacked generalization algorithm can combine dozens of different algorithms, which could in turn also be meta-estimators.

This corresponds to finding the best combination of an unknown number of selected factors within a potentially infinite amount of options.

So how does one solve such a complex problem?
The solution is called AutoML.

AutoML stands for Automatic Machine Learning and corresponds to training an Artificial Intelligence to automatically select, optimize and organize Machine Learning patterns as a human would. In fact, the AI behind AutoML chooses a model, applies it to data, and evaluates the quality of its outputs. As more and more models are tested, the AI begins to understand what works with the data and what does not. This awareness makes it capable of making increasingly precise inferences that lead to better and better models. In addition, this can be run in a distributed environment, such as cloud architecture, to perform asynchronous model selection.

Benefit

By combining Stacked Generalization with AutoML via a cloud-based solution that harnessed the power of distributed computing, we were able to create a Stacked Generalization model that produced interpretable and surprisingly good results. In fact, despite lacking any kind of exogenous information, our model was able to correctly identify 90 percent of the people who left the company with 50 percent accuracy; in particular, of the two candidates identified to leave the company by our model, one actually left the company within the year.

Although the COVID-19 pandemic had a negative impact on the quality of our model's predictions (detection rate dropped to 66% and accuracy to 33%), the company's Deputy Chief HR Officer congratulated us for performing better than his entire division.

For more information about our solution, please contact us at info@ennova-research.com.