Mar 29, 2020
Predicting Support Ticket Escalations Using Machine Learning: Feature Engineering and Selection
Customer SupportB2B supportcustomer escalationsmachine learningsupport ticket escalations
In my previous post, I drew parallels between support organizations and biological systems, focusing on the complex interactions of multiple entities across time. The metaphors we invoked have played a large role in directing our approach to the knotty problem of support ticket escalations. However, good predictive models need to be provided not just with raw data, but with well-engineered features that capture the essence of the problem at hand. These features need to be extracted, filtered, and combined in a variety of ways before being fed into models; systematically thinking about these issues leads to the design of a machine learning pipeline.
In this post, I’m going to focus on the feature extraction and feature selection sections of the machine learning pipeline.
Our models are trained on a feature set representing various aspects of interaction in support tickets between the customer and support engineers. The basic idea here is to monitor the progress of a support ticket, while placing it in context. The way we capture context is by looking at a case through multiple axes – analyzing how a case evolves through time (we call these features “dynamic”), alongside the prior history of the customer’s relationship with the software company (we call these features “pre-contextual”) – combined with text-based information that we extract from ticket and comment bodies using natural language processing.
An example of a feature that is both pre-contextual and dynamic in nature is the ratio of case age vs median case resolution time. Median case resolution time is the median time to resolution for all those cases that were filed by the customer over a given time window, that were resolved successfully without any escalations. If the current case is taking longer than this metric, this may be a sign that the case is not moving quickly enough towards resolution as per the customer’s experience and expectations. In this sense, pre-contextual features capture various aspects of baseline expectations for a given customer.
Another example of a feature axis is looking at comment streams, both inbound (from customer to support) and outbound (from support to customer). Comment streams are a rich source of analysis that yield useful features: for instance, measuring if the wait time has increased steadily over time is a good behavioral proxy for growing frustration on the customer’s end. Support cases tend to evolve in bursts, with state changes manifested in shifting priorities – capturing this information is useful and helps build better models.
Finally, we leverage state-of-the-art natural language processing techniques to extract technical content to infer case complexity from comment text. We boost our text analytics with more complex features gleaned using deep learning techniques to capture sentiment and emotion embedded in the comment stream. These features have proven to be very useful in discerning between cases that have high emotional content expressed via comments, compared to those that are low in emotions but express dissatisfaction in other behavioral ways, for example by posting ticket comments in rapid succession.
Automated feature engineering and feature selection
A typical next stage in the machine learning pipeline is to perform feature selection. In other words, the features we extract above need to be analyzed, both in isolation as well as in combinations, in order to determine their usefulness towards the prediction problem.
One important meta-design element here is thinking about feature combinations. Though some features play a big role individually in predicting the dependent variable, sometimes combinations of features are also quite useful. Towards this end, we leverage automated machine learning tools such as TPOT and Featuretools to augment our human-driven ML pipeline construction. TPOT provides a framework to determine an optimized configuration of pre-processing, feature selection, model selection and hyper-parameter optimization, while Featuretools is an automated feature engineering framework that enables the combinatorial creation of features on the basis of entity relationships. Both libraries are useful and a worthwhile addition to a data science practitioner’s toolbox.
To sum up, the engineered features that go into the model training and scoring sections of the pipeline are sourced from a variety of conceptual axes, capturing a mix of behavioral patterns and past history to infer an implicit trajectory for each individual support ticket as it meanders its way through a support organization’s case management system.
In my next post, I’ll talk about the modeling approach we employed in bringing the support ticket escalation prediction system to life.