Dr. PG Madhavan
developed his expertise in analytics as an
EECS Professor, Computational Neuroscience researcher, Bell Labs MTS, Microsoft
Architect and startup CEO. Overall, he has extensive experience of 20+ years in leadership roles at major corporations such as Microsoft,
Lucent, AT&T and Rockwell and startups,
Zaplah Corp (Founder and CEO), Global Logic, Solavei and Aldata. He is continually
engaged hands-on in the development of
advanced Analytics algorithms and all aspects of innovation (12 issued US
patents with deep interest in adaptive systems and social networks).
Big Data is big business but the “open secret”
in this business is that what the paying client really wants is Predictive
Analytics (“what should my business do next to improve?!”). To be explicit, Big Data is all about
the technology for storage and manipulation of lots of data (terabytes to
petabytes) as well as new types of data such as graph and unstructured
data. This is an extremely important precursor to doing anything useful with
data – 10 or 20 years ago, we had to settle for small amounts of representative
data or simulated data. Outcomes were clever and new data analysis *techniques*
rather than useful answers!
The next step is to make sense of the large amounts of
data at your disposal (now that you have Hadoop and NoSQL and Graph database). This is where visualization and Descriptive Analytics come
in. They provide historic “snap-shots” with graphs and charts – another
important precursor to coming up with answers to the question, “What should
my business do next to improve?”
In 2013, Big Data
is maturing with still many open technology challenges; Descriptive Analytics is in its pre-teen years. Predictive
Analytics is in its infancy, nurtured by its stronger siblings, Big
Data and Descriptive Analytics!
Predictive
Analytics (PA):
In a business context, Predictive Analytics (PA), attempts to answer the question, “What
should my business do next to improve?”
Prediction is an old preoccupation of mankind! What comes
next, next meal, next rain, next mate? Techniques have changed little from
Stone Age till recently; see what has happened before, notice the cyclical
nature and any new information; combine them to forecast what may come next.
Take tomorrow’s temperature for example:
1.
Predict it as today’s temperature.
2.
Predict it as the average of Tuesday’s (today)
temperature for the past month.
3.
Predict it as the average of Tuesday’s (today)
temperature of the month of August for the past 50 years.
4.
Look at the pattern of today’s, yesterday’s and
day before yesterday’s temperature; go back in the weather record and find “triples”
that match (your criteria of match). Collect the next day’s temperature after
matching patterns in the record and average them as your prediction of
tomorrow’s temperature.
5.
And so on . . .
As you can imagine (and can easily demonstrate to
yourself), predictions from 1 to 5 get better and better. If your business is
in shipping fresh flowers, your business may be able to use just this simple Predictive Analytics method to lower
your cost of shipping! Simple PA but
answers your “What next?” question. By the way, this PA technique is not as
naïve as it looks; complexity-theory-“quants” used to use its extended forms in
financial engineering on Wall Street.
So one feature of PA is clear; there is a time element,
historical data and future state. Historical data can be limited in time as in
the first method of temperature prediction or as extensive as in the fourth
method. Descriptive Analytics can be
of use here for sure – looking at the trends in the temperature data plot, one can
heuristically see where it is headed
and act accordingly (to ship flowers or not tomorrow). However, PA incorporates time as an essential
element in quantitative ways.
Quantitative
Methods in PA:
My intent here is not to provide a catalog of statistical
and probabilistic methods and when and where to use them. Hire a stats or math
or physics or engineering Ph.D. and they will know them backwards and forwards.
Applying it to Big Data in business however requires much more inventiveness
and ingenuity – let me explain.
PA has an essential time element to it. That makes
prediction possible but life difficult! There is a notion called
“non-stationarity”. While reading Taleb’s or Silver’s books, I have been
puzzled by finding that this word is hardly mentioned (not once in Taleb’s
books, if I am not mistaken). One reason may be that those books would have
ended up being 10 pages long instead of the actual many 100’s of pages!
Non-stationarity
is a rigorous concept but for our purposes think about it as “changing
behavior”. I do not look, act and think the same as I did 30 years ago – there
is an underlying similarity for sure but equally surely, the specifics have
changed. Global warming may be steady linear change but it will systematically
affect tree-ring width data over centuries. Some other changes are cyclical –
at various times, they are statistically the same but at other times, they are
different. Systems more non-stationary than this will be all over the place! Thus,
historical and future behavior and resultant data have variability that
constrains our ability to predict. Not all is lost – weather can be predicted
pretty accurately for up to a week now but not into the next many months (this
may be an unsolvable issue per complexity theory). Every system, including
business “systems”, has its window of predictability; finding the
predictability window and finding historical data that can help us predict
within that window is an art.
I do not want this blog to be a litany of problems but
there are two more that need our attention. Heteroscedasticity is the next fly in the ointment! This is also a
formal rigorous statistical concept but we will talk about it as “variability
upon variability”. Business data that we
want to study are definitely variable and if they vary in a “well-behaved” way,
we can handle them well. But if the variability
varies, which is often the case in naturally occurring data, we have
constraints in what we can hope to accomplish with that data. Similar to
non-stationary data, we have to chunk them, transform variables, etc. to make
them behave.
The third issue is that of “noise”. Noise is not just the hiss you hear when you listen to an AM radio
station. The best definition for “noise” is the data that you do not want. Desirable data is “signal” and anything
undesirable is “noise”. In engineered systems such as a communication channel,
these are clearly identifiable entities – “signal” is what you sent out at one
end and anything else in additional to the signal that you pick up at the
receiver is “noise”. In a business case, “unwanted” data or “noise” are not so
clearly identifiable and separable. Think of “Relative Spend” data among
various brands of beer in a chain of stores; or sentiment analysis results for
those brands. Depending on the objective of the analysis, the signal we are looking for may be
“purchase propensity” of each shopper (so that we can individually target them
with discount offers). Relative Spends and Likes are not pure measures of
“purchase propensity” – I may have bought Brand X beer because my friend asked
me to pick some up for him which has nothing to do with my purchase propensity for that brand! Purchases like that will
pollute my Relative Spend data. How do you separate this noise from the data.
There may also be pure noise – incorrect entry of data, data corruption and
such errors that affect all data uniformly.
Fortunately, there is a framework from engineering that
provides a comprehensive approach to containing these issues while providing
powerful analytics solutions.
Model-based
Analytics (MBA):
Model-based and model-free methods have a long history in
many areas of Engineering, Science and Mathematics. I take an Engineering
approach below.
Data that you have can be taken “as is” or can be
considered to be generated by an underlying model. “Model” is a very
utilitarian concept; it is like a “map” of the world. You can have a street map
or a satellite map – you use them for different purposes. If you need to see
the terrain, you look for a topographic map. A map is never fully accurate in
itself – it serves a purpose. As the old saw goes, if a world map has to be
accurate, the map will have to be as large as the world – what is the point of
a map then?
Let us call “model-free” methods as today’s “Data Analytics (DA)” to distinguish it
from “Model-based Analytics (MBA)”. DA analyzes measured data to aid business decisions and predictions. MBA attempts to model the system that generated the measured data.
Model-based methods form the basis of innumerable
quantitative techniques in Engineering. Theory and practice have shown that
data analysis approaches (similar to periodogram spectrum estimation) are
robust but not powerful, while model-based methods (similar to AR-spectrum
estimation) are powerful but not robust (incorrect model order, for example,
can lead to misleading results - needs expert practitioner).
MBA go beyond data slicing/ dicing and heuristics. In our
previous “brands of beer in a store chain” example, model-based approach hypothesizes that there is a system, either
explicit or implicit, behind the scenes generating customer purchase
behaviors and purchase propensities. From
measured data, MBA identifies the key attributes of the underlying hidden
system (to understand commerce business quantitatively) and provides ways to
regulate system outputs (to produce desirable business outcomes).
MBA does not solve but alleviates the three pain points
in Predictive Analytics quantitative methods: (1) Non-stationarity, (2)
Heteroscedasticity and (3) Noise.
MBA – Personal
Commerce Example:
I do not have an MBA (the university kind) nor have I
taken a Marketing course. So here is a layman’s take on Personal Commerce. My
main objective is to show a practical example of model-based predictive
analytics within MBA framework.
Personal Commerce has 2 objectives: (1) Customer
acquisition and (2) Customer retention. There are 3 ways to accomplish these 2
tasks: (1) Marketing, (2) Merchandizing and (3) Affinity programs.
Let us take “Merchandizing” (the business of bringing the
product to the shopper). An online example is “recommendation engine”. When you
log into Amazon, their PA software will bring products from multiple product
categories (books, electronics, etc.) to your tablet screen. An offline example
is brick-and-mortar store that arranges beer brands on their physical shelf such that it will entice their shoppers to buy a particular brand (assume that the
store has a promotion agreement with the brand and hence a business reason to
do so). This type of merchandizing is called “assortment optimization”. Note
that both Assortment Optimization and Recommendation Engine are general
concepts that have many variations in their applications. The MBA approach
below applies to the 2 Personal Commerce objectives and the 3 programs to
accomplish them.
Assortment
Optimization:
As practical example of MBA, let us explore Assortment
Optimization. From the various data sources available to you, you construct a
model of the shopper groups with their beer-affinity as the dependent variable.
Then construct a model of a specific store with the shopper groups as the
dependent variable. Once these 2 models are in hand, you combine them to obtain
the optimal shelf assortment for beer at that particular store so that the
store revenue can be maximized.
Clearly, I have not explained the details of the
construction of these models and how they can be combined to give you the
optimal product assortment. That is not the focus of this blog – it is to show
that such an approach will allow you to contain the 3 banes of PA quantitative
methods and hence get powerful results. In my actual use case, we achieve “sizes
of the prize” (in industry parlance; the potential peak increase in revenue)
greater than any current merchandizing Big Data methods!
(1)
Non-stationarity: As we discussed earlier,
different systems vary over time in their own way. If you always used the past
52 weeks of data, it may be appropriate for some products but not others. For
example, in certain cases of Fashion or FMCG, non-homogeniety can be minimized
by selecting shorter durations but not so short that you do not have enough
representative data!
(2)
Heteroscedasticity: There is a fundamental
concept here again of choosing just enough data (even if you have tons in your
Big Data store!) that address the business question you have but not too much.
When you have selected just enough data, you may also escape severe
heteroscedasticity. If not, variable transformations (such as log
transformation) may have to be adopted.
(3)
Noise: As we noted, Noise is unwanted data.
Consider the previous Merchandizing case but where you tried to analyze 2
product categories together, say, Beer and Soap. Since the fundamental purchase
propensity driving-forces are most likely different for these two product
categories, one will act as noise to the other – deal with them separately. In
addition, doing some eigen-decomposition pre-processing may allow you to
separate “signal from noise”.
Many of you will find this discussion inadequate – part
of it is because they are trade secrets and part of it is because there are no
magic formulas. Each business problem is different and calls for ingenuity and
insight particular to that data set
and business question.
I have only scratched the surface of Model-based
Analytics here. The sub-disciples of Electrical Engineering such as Digital
Signal Processing, Control Theory and Systems Theory are replete with
frameworks and solutions developed in the last two decades or so for deploying
model-based solutions and extending them to closed-loop systems. Thus, we go
beyond predictions to actions with their results fed-back into the model to
refine its predictions. Next five years will see a great increase in the
efficacy of Predictive Analytics solutions with the incorporation of more
model-based approaches.
Post Script: Other major “flies-in-the
ointment” are non-linearity and non-normality; I am of the opinion that methods
that are practical and efficient are still not available to battle these issues
(I know that the 1000’s of authors of books and papers of these fields will
disagree with me!). So, the approach I take is that non-linearity and
non-normality issues are minor in most cases and MBA techniques will work adequately;
when in a few cases I cannot make any headway, I reckon that these issues are so
extreme that I have a currently-intractable problem!