Productionalizing a data science solution after weeks or months of hard work is undoubtedly the most fun and satisfying part. Models do not exist for their own sakes; they exist to make a positive change in the business. So, models that are not productionalized are a problem half solved – they have not realized their true value. Productionalizing models involves not only testing and implementation, but also a plan for monitoring and updating the analytics as time goes on. We’ll walk through these in a moment and see how the methods we employ will allow us to get the maximum benefit from our investment of time and effort.
First, let’s review briefly where we’ve been. In Part 1 of our series on Data Science Methods, we discussed CRISP-DM, a data project methodology that is now in common use across industries. We looked at the reasons insurers pursue data science at the first step, Project Design. In Part 2 we looked at Building a Data Set and Exploratory Data Analysis. In Part 3, we covered what is involved in Building a Solution, including setting up the data in the right way in order to validate the solution.
Now, we are ready for the launch phase. Just like NASA, data scientists need green light across the board, only launching when they are perfectly ready and when they have addressed virtually every concern.
Test and Implement
Once an analytic model has been built and shown to perform well in the ‘lab’, it’s time to deploy it into the wild: a real live production environment. Many companies are hesitant to simply flip a switch to move their business processes from one approach to a new one. They prefer to take a more cautious approach and implement a new solution in steps or phases. Often they choose to use either an A/B test and control approach or a phased geographic deployment, for example. In a A/B test approach, the business results of the new analytic solution are compared to the solution that has been used in the past. For example, 50% of the leads in a marketing campaign are allocated to the new approach while 50% are allocated to the old approach, randomly. If the results from the new solution are superior, then it is fully implemented and the old solution is removed. Or, if results in one region of the country look promising, then the solution can be rolled out nationwide.
Depending on the computing platform, the code base of the analytic solution may be automatically dropped into existing business processes. Scores may be generated live or in batch, depending upon the need. Marketing, for instance, would be a good candidate to receive batch processed results. The data project may have been designed to pre-select good candidates for insurance who are also likely respondents. The results would return an entire prospect group within the data pool.
Live results meet a completely different set of objectives, often those that are required for real-time decisions. Giving a broker a real-time indication of our appetite to quote a particular piece of business would be a common use of real-time scoring.
Sometimes in order to move a model to production there’s some coding that needs to happen. This occurs when a model is built and proven in R, but the deployed version of the model has to be implemented in C for performance or platform considerations. The code has to be translated into the new language. Checks must be performed to confirm that variables, final scores, and the passing of correct values to end-users are all correct.
Monitor and Update
Some data projects are “one time only.” Once the data has appeared to answer the question, then business strategies can be addressed that will support that answer. Others, however, are designed for long-term use and re-use. These can be very valuable over their periods of use, but special considerations must be taken into account when the plan is to reuse the analytic components of a data project. If a model starts to change over time, you want to manage that change as it happens. Monitoring and updating will help the project hold its business value, as opposed to letting its value decrease over time as variables and circumstances change. Effective monitoring is insurance for data science models.
For example, a model designed to repeatedly identify “good” candidates for a particular life product may give excellent results at the outset. As the economy changes, or demographics change, credit scoring may exclude good candidates. As health data exchanges improve, new data streams may be better indicators of overall health. Algorithms or data sets may need to be adapted. Minor tweaks may be needed or a whole new project may prove to be best option if business conditions have drastically changed. Monitoring the intended business results compared to results at the outset and results over time will allow insurers to identify analysis features that no longer provide the most valid results.
Monitoring is important enough that it goes beyond running periodic reports and having hunches that the models have not lost effectiveness. Monitoring needs its own plan. How often will report(s) run? What is the criteria we can use to validate that the model is still working? Which indicators will tell us that the model is beginning to fail? These criteria are identified by both the data scientists and the business users who are in touch with the business strategy. Depending upon the project and previous experience, data scientists may even know intuitively which components within the method are likely to slide out of balance. They can create criteria to monitor those areas more closely.
Updating the model breathes new life into the original work. Depending upon what may be happening to the overall solution, the data scientist will know whether a small tweak to a formula is called for or an entirely new solution needs to be rebuilt based upon new data models. An update saves as much of the original time investment as possible without jeopardizing the results.
Though the methodology may seem complicated and there seem to be many steps, the results are what matter. Insurance data science continually fuels the business with answers of competitive and operational value. It captures accurate images of reality and allows users to make the best decisions. As data streams grow in availability and use, insurance data science will be poised to make the most of them.