Powered by the maturing of Big Data technologies, data revolution is on our doorsteps. It is widely posited that data is the new oil and is the most valuable resource. Technology visionaries also say that AI is the new electricity, in that it will touch every industry and every aspect of our life, and transform them. We are witnessing intense competition between the tech giants such as Google, Amazon, Uber, and Microsoft to scoop up the talent in this space with mind-boggling offers. How then are the rest of us going to prepare ourselves for this imminent paradigm shift?
What to Expect
Data Science and Machine Learning will have impact across functions in the organization. The most significant impact may be in customer-facing functions, including marketing and customer support. It can lead to better targeted campaigns, better lead generation and qualification, reduced customer churn, enhanced customer loyalty and profitability, and proactive fraud and anomaly detection.
The impact may manifest as:
Answer Questions: The richness and speed of data exploration means that most business questions can now be answered from the data. The interactivity and the visualization capabilities also means that each answer may spark more questions to be explored, questions that otherwise wouldn’t even have surfaced.
Anomaly Detection: Machine Learning algorithms can identify suspicious patterns of events. This may include fraudulent customer behavior, system issues, or operational issues.
Predict and Change Customer Behavior: Machine Learning driven prediction models can predict likely customer behaviors, their propensity scores, as well as their likely reaction to offers and incentives.
Recommendations: A special case of predict and change customer behavior, Machine Learning algorithms can be used to recommend actions.
Segment Customers: Machine Learning algorithms can figure out what characteristics are relevant to cluster customers into segments and automatically segment them.
Charting a Strategy
As with anything, most of us should adopt a crawl, walk, run strategy for embracing the data science revolution.
Crawl: The primary goal of the crawl phase should be to understand the data we have, where it comes from, what are the gaps, and what it is useful for. Use this phase as a preparation for the next phases and to get some basic insight about the data.
Walk: With the data science effort on firmer ground, the analysis should become more exploratory and deeper in nature. Goal of this analysis should be to answer business questions as well as to hunt for further avenues to explore..
Most companies should adopt a crawl, walk, run strategy for embracing the data science revolution
In addition to analysis, this phase should start exploring simple easily accessible Machine Learning techniques such as forecasting, causal impact analysis, and anomaly detection.
Run: As the data science team matures, Machine Learning may be used for the full range described in the “What to Expect” section. With widely available packages and frameworks such as Tensor Flow, even modest data science teams, can expect to leverage the power of Machine Learning. Actual mileage may vary, but in combination with the results from the previous phase should be able to justify the investment.
Where to Start Data Collection
If Data is the new oil, first thing you have to do is secure access to it.. While some data is available for purchase, the most important data for your business is your own first party data; other data may be used to supplement but cannot be used as replacement.
In the new world driven by big data, the nature of data that can be, and should be, captured has significantly changed. While technologies such as Apache Hadoop are widely available, there is significant intellectual property in how to correctly leverage these technologies. Coming from the enterprise applications world, where one of the big values that the software vendors bring to table is the complexity of the schema, I think of this as the counterpart of getting the “data schema” right for the big data world.
Assemble the Team
With a strategy for capturing data in place, the next thing CIOs need to address is putting together the data science team.
The Whiz Kids: I recommend starting with the whiz kids. No, I am not suggesting that you get in a talent war with Google or Uber. I am suggesting that you start with business persons who not only know the right questions to ask, but are also inquisitive and intellectually agile, and can keep asking the next level of questions at each stage of analysis.
The Data Dorks: Next, you need the data dorks. These are the people who understand the data from a business perspective as well as are able to shape and reshape it, as needed, for the different analysis as well as prediction algorithms. They also can look at the analysis and have an intuition on what it is saying and whether it makes sense. The ideal people to evolve to this role are your business analysts..
The Mathematical Programmers: Finally, you do need some members in the team who can actually program the analysis and the models using technologies such as R and Python. It is ideal if you can hire some data science trained engineers. If you find it difficult to compete for that resource pool, pick some of your analytic engineers, or those with SQL type skills, and have them pick up the skills needed, such as R or Python.
The last aspect I want to address in this article is that with great power comes great responsibility. With big data, security and governance becomes all the more important. There are two important aspects:
Personally Identifiable Information (PII): Ideally, your data at rest already encrypts all PII information and using it for further analysis is safe. Otherwise, the first step in the data science process should be to remove and/or obfuscate all personally identifiable information and other data requiring privacy.
Data Leakage: Data Science involves continuous data wrangling and saving and exporting of data at various intermediate states. Further, many of the technologies lend themselves to the temptation to work with locally saved copies of the data. Therefore, unless the governance structure is put in place from the onset, there may be an issue with data floating all over the place. I recommend that you require that all analysis be done on well-protected servers and any data copies be stored in well-defined server locations.
Big Data driven revolution is knocking on our doorsteps. With a little planning and foresight, we have the choice of embracing it and using it to our competitive advantage. The time to start planning and investing in your data science projects is now.