Why should an insurer employ data science? How does data science differ from any other business analytics that might be happening within the organization? What will it look like to bring data science methodology into the organization?
In nearly every engagement, Majesco’s Data Science Team fields questions as foundational as these, as well as questions related to the details of business needs. Business leaders are smart to do their due diligence — asking IF data science will be valuable to the organization and HOW valuable it might be. Though both of those questions might be best answered in a face-to-face meeting, it might be helpful to provide an overview of Majesco’s Data Science Project Lifecycle, so that those considering the benefits of data science will begin to get a feel for how it operates. In the first of four blog posts, we are going to touch briefly on the history of data mining methodology and then look at what an insurer can expect when first engaging in the data science process. Throughout, we’re going to keep our eyes on the one focus of all of our efforts:
The goal of most data science is to apply the proper analysis to the right sets of data to provide answers. The proper analysis is just as important as the question that an insurer is attempting to answer. After all, if we are in pursuit of meaningful business insights, then we certainly don’t want to come to the wrong conclusions. There is nothing worse than moving your business full-speed ahead in the wrong direction based upon faulty analysis. Today’s analysis benefits from a thoughtfully-constructed data project methodology.
As data mining was on the rise in the 1990’s, it became apparent that there were a thousand ways a data scientist might pursue answers to business questions. Some of those methods were useful and good and some were suspect — they couldn’t truly be called methods. To help data scientists and their clients from arriving at the wrong conclusions, a methodology needed to be introduced. A defined yet flexible process would not only assist in managing a specific project scope, but it would also work toward verifying conclusions by building in pre-tests and post-project monitoring against expected results. In 1996, the Cross Industry Process for Data Mining (CRISP-DM) was introduced, the first step in the standardization of data mining projects. Though CRISP-DM was a general data project methodology, insurance had its hand in the development. Dutch insurer, OHRA, was one of the four sponsoring organizations to co-launch the standardization initiative.
Introducing Majesco’s Data Science Project Lifecycle
CRISP-DM has proven itself to be a strong foundation in the world of data science. In the last 20 years, even though the number of available data streams has skyrocketed and the tools and technology of analysis have improved, the overall methodology is still solid. Majesco uses a variance of CRISP-DM, honed over many years of experience in multiple industries.
Pursuing the right questions — finding the business nugget in the data mine
Before data mining project methodologies were introduced, one issue companies had was a lack of substantial focus on obtainable goals. Projects didn’t always have a success definition that would help the business in the end. Research could be vague and methods could be transient.
Research needs focus, so the key ingredient in Majesco’s Data Science methodology is business need. The insurer has a problem that they wish to solve. They have a question that has no readily-apparent answer. If an insurer hasn’t utilized data scientists in the past, this is a frequent point of entry. It is also the one of the greatest differentiators between traditional in-house data analysis and project-based data science methodology. Instead of tracking trends, data science methodology is focused on finding clear answers to defined questions. Normally these issues are more difficult to solve and represent a greater business risk, making it easy to justify seeking outside assistance.
Project Design — First Meeting and First Steps
Phase 1 of the Data Science Project Life Cycle is Project Design. In the Project Design phase, Majesco is listening and learning about the business problem or problems that the client is ready to address. For example, a P&C insurer might be wondering why loyalty is lowest in the three states in which they have the highest claims — Florida, Georgia and Texas. Is this an anomaly or is there a correlation between the two statistics? A predictive model could be built to predict the likelihood of attrition. The model score could then be used to determine what actions to take to reward and keep a good customer, or perhaps what actions could be taken to remove frequent or high-risk claimants from the books.
As the insurer unpacks their background and pain points, Majesco takes notes and asks questions. Does the customer have access to all of the data that is needed for analysis? Should the project be segmented in such a way that it provides for detailed analysis at multiple levels? For example, the insurer may need to run the same type of claims analysis across personal auto, commercial vehicle, individual home and business property. These would represent segmented claims models under the same project.
Majesco provides assumptions, definitions, possible solutions and a picture of the risks involved for the project, sorting out areas where segmented analysis may be needed. Majesco’s team also collects some information to assist in creating a cost-benefit analysis for the project.
As a part of the Project Design meetings, Majesco brings specifics to the table. They identify the analytic techniques that will be used and they discuss the features that their analysis can utilize. At the end of the Project Design phase, everyone knows which answers they are seeking and the questions that will be used to frame those answers. They have a clear understanding of the data that is available for their use and an outline of the full project.
With the clarity to move forward, the client and Majesco move into a closer examination of the data that will be used. In Part 2, we will look at the two-step data preparation process that is essential to building an effective solution. We will also look at how the proliferation of data sources is supplying insurers with greater analytic opportunities than ever.