Wednesday, August 28, 2013

Predictive Analytics Automation


There is much discussion about whether Predictive Analytics (PA) can be automated or not. This is a false dichotomy.

Predictive Analytics is a strange beast - it needs to be ‘learned by learning‘ and ‘learned by doing‘ – BOTH! That is due to the interconnected nature of the field. To be a successful hyper-specialist in “left nostril” diseases, one needs to have done Anatomy, Physiology and Biochemistry in med school. Similarly, for PA, learning-by-learning (which takes at least 6 years of grad school) is not a step you can skip and go directly to learning-by-doing and hope to become a true curer of business diseases!

In PA, learning-by-doing can be an even steeper curve. As I have noted before in my blogs, PA skills will have to be rounded out with mathematical inventiveness and ingenuity applied repeatedly in a specific business vertical. These are the hallmarks of an uber Data Scientist. Clearly, an uber data scientist as described above cannot be bottled and passed around. Don’t even think of “automating” all the things that an uber data scientist does. So what do we do about “scaling”? Are there support pieces we can automate to scale the solution.

Comparison to a programing environment such as MATLAB is appropriate. MATLAB supplies you with all kinds of toolboxes. Similarly, in PA, many basic operations can be automated – clustering, learning, classification, etc. But, like MATLAB, you also need an environment where these toolboxes can be fine-tuned with inventiveness appropriate to the business vertical, mixed and matched and augmented with additional one-off solutions to address the overall business problem at hand. Otherwise, the solution will fall short (or flat!).

So, part of PA can be automated. PA toolboxes can be fine-tuned by data scientist associates and the overall solution can be conceived and put together with these toolboxes (with added “glue”) by the uber data scientist.

Note that everything I talked about here refers to PA solution development. Once the overall solution is developed, “production runs” by customer personnel and visualizations by executives of the PA solution developed above can be mostly automated (with data scientist looking over their shoulders – data can change on you on a dime; someone has to watch for the sanctity of the data and non-stationarity problems!). Production is where the solution needs to scale and it can.

In summary, PA solution development will require manual work by uber data scientists supported by data science associates; automated toolboxes for basic PA functions will help speed up the process and once the overall solution is manually cobbled together, production runs can be automated along with some amount of ongoing data science audit of the process and results.



Dr. PG Madhavan developed his expertise in analytics as an EECS Professor, Computational Neuroscience researcher, Bell Labs MTS, Microsoft Architect and startup CEO. Overall, he has extensive experience of 20+ years in leadership roles at major corporations such as Microsoft, Lucent, AT&T and Rockwell as well as four startups including Zaplah Corp as Founder and CEO. He is continually engaged hands-on in the development of advanced Analytics algorithms and all aspects of innovation (12 issued US patents with deep interest in adaptive systems and social networks).

No comments:

Post a Comment