Analytics Made Simple
PG Madhavan, Ph.D.
Chief Algorist,
Syzen Analytics, Inc.
Seattle, WA, USA
Brief bio: PG developed his expertise as an EECS
Professor, Computational Neuroscience researcher, Bell labs MTS, Microsoft
Architect and multiple startup leader. He has over 100 publications &
platform presentations to Sales, Marketing, Product, Standards and Research
groups as well as 12 issued US patents. Major Contributions:
·
Computational
Neuroscience of Hippocampal Place Cell phenomenon related to the subject matter
of 2014 Nobel Prize in Medicine.
·
Random
Field Theory estimation methods, relationship to systems theory and industry
applications.
·
Systems
Analytics bringing model-based methods into current analytics practice.
·
Four
startups with two as Founder.
Four hard topics in Analytics are explained in plain English in this article:
1. Machine
Learning.
2. Why
is Predictive Analytics important to business?
3. Prediction
– the other dismal science?
4.
Future
of Analytics.
Machine Learning in plain English
If someone asks you, “What is ML?”, what will be your
conceptual, non-technical answer?
Mine is . . . ML
is “cluster”, “classify” and “convert”. I use these words in their English language sense and not as techniques. What do I mean by
that?
Cluster: Structure
in the data is information – find the structure.
Classify:
Transform structure into a Mathematical
form.
Convert:
Convert into insight/ action.
Do this by
Learning – meaning, use the ability to
generalize from experience.
This captures the essence of ML for me. From my
experience, I find that –
·
Convert: best done by a “paired” (Data Scientist + Domain Expert) combo.
·
Classify: there is a grab bag of tools and
techniques that the Data Scientist can exploit on one’s own. You can see my
attempt at unifying this bag of tricks here – “Unifying Machine Learning to create
breakthrough perspectives”.
·
Cluster: I am not referring to specific
clustering *algorithms* here. This step is where the Data Scientist works to sense,
identify and extract structure or patterns or features in the data which are
the bearers of information!
“Cluster” is the
hardest part – data do not tell you where it hides the structure. Finding patterns is an “art” where inspiration,
skill, experience, knowledge of inter-related theories, etc. play a major part.
In a current algorithm work that I am doing, it turned out (after *months* of
slicing and dicing the data) that rendering data into “phasors” (or complex
variables) revealed the structure hidden in the data “by itself”!
If you are able to get at the most descriptive and discriminatory
features
at the “Cluster” stage, the rest of the steps will just fall into place
(almost) and provide the best robust solution! If not, you may succeed but you
will work many times harder to Classify and Covnert and end up with non-optimal
answers.
It must be clear that my comments apply only to the first
time development of an algorithm for a new business problem; once an end-to-end
algorithm is in place, of course, the Cluster-Classify-Covnert steps can be automated for repeated application to similar data sets. But for the first-time ML algorithm
solution development, automation cannot replace art!
Why is Predictive Analytics important to business?
A prerequisite for performance at a high level in
business is the ability to understand and manage complexity. Complex systems to
be managed properly requires a ton of data at the right time. BIG Data provide
us the data we need; to put these data to work in order to take us to the high
levels of complexity required while still managing it, we have to anticipate
what is about to happen and react when it happens in a closed loop manner.
Predictive Analytics will allow us to push our “system” to the edge (without
“falling over”) in a managed fashion. This is why businesses embrace Predictive
Analytics - to manage businesses at a high level of performance at the edge of complexity overload.
Prediction – the other dismal science?
An insightful person once said, “Prediction is like
driving your car forward by looking only at the rearview mirror!”. If the road
is dead-straight, you are good . . . UNLESS
there is a stalled vehicle ahead in the middle of the road.
We should consider short-term and long-term prediction
separately. Long-term prediction is nearly a lost cause. In the 80’s and 90’s,
chaos and complexity theorists showed us that things can spin out of control
even when we have perfect past and present information (predicting weather
beyond 3 weeks is a major challenge, if not impossible). Even earlier,
stochastic process theory told us that “non-stationarity” where statistics
evolve (slowly or fast) can render longer term predictions unreliable.
If the underlying systems do not evolve quickly or
suddenly, there is some hope. Causal systems (in Systems Theory, it means that
no future information of any kind is available in the current state of the
system), where “the car is driven forward strictly by using the rearview
mirror”, outcomes are predictable in the sense that, as long as the “road is
straight” or “curves only gently”, we can be somewhat confident in predicting a
few steps ahead. This may be quite useful in some Data Science applications
(such as in Fintech).
Another type of prediction involves not the actual path
of future events (or the “state space trajectories” in the parlance) but the
occurrence of a “black swan” or an “X-event” (for an elegant in-depth
discussion, see John Casti, “X-Events: Complexity
Overload and the Collapse of Everything’, 2013). For that matter, ANY
unwanted event can be good to know about in advance – consider unwanted
destructive vibrations (called “chatter”) in machine tools, as an example;
early warning may be possible and very useful in saving expensive work pieces
(“Instantaneous Scale of
Fluctuation Using Kalman-TFD and Applications in Machine Tool Monitoring”).
We find that sometimes the underlying system does undergo some pre-event
changes (such as approach “complexity overload”, “state-space volume
inflation”, “increase in degrees of freedom”, etc.) which may be detectable and
trackable. However, there is NO escaping False Positives (and associated
wastage of resources preparing for the event that does not come) or False
Negatives (and be blind-sided when we are told it is not going to happen).
At Syzen Analytics, Inc., we use an explicit systems
theory approach to Analytics. In our SYSTEMS Analytics formulation (“Future of
Analytics – a definitive Roadmap”), the parameters of the system and its
variation over time are tracked adaptively in real-time which tells
us how long into the future we can predict safely – if the parameters evolve
slowly or cyclically, we have higher confidence in our predictive analytics
solutions.
Wanting to know the future has always been a human
preoccupation – we see that you cannot truly know the future but in some cases,
predictions to some extent are possible . . . surrounded by many caveats; more
of “excuses” than definitive answers. Sounds a lot like a dismal science!
Future of Analytics – Spatio-temporal data
As businesses push to higher levels of performance,
higher fidelity models are going to be necessary to produce more accurate and
hence valuable predictions and recommendations for business operations.
ALL data are
spatio-temporal! At the simplest to more complex levels -
·
Data can be considered isolated at the simplest
level – a “snap shot”.
·
Then we realize that data exist in a “social” network
with mutual interactions.
·
In reality, data exist in *embedded* forms
in “influence” networks of one type or the other which are distributed in time
and space – a “video”!
Spatial extent of data (distance) can be folded into time
if we assume a certain information diffusion speed. Graph-theoretic methods do
not account for time dimension. For accurate analysis, no escaping Dynamics
over Time; meaning the use of differential
(or difference) equations . . . and Systems Theory!
Systems Theory +
Analytics = “SYSTEMS Analytics”! A few example business applications are
shown above. As you can see, it spans most of the current Analytics use cases
and many more promising ones when network graphs and spatio-temporal nature of
data are fully incorporated in the coming years – basic theories and some
algorithms are already in hand. For specific technologies, see –
From the simple explanation of ML, the power and limitations
of prediction and the promising Analytics technology roadmap ahead, it is clear
that Data Science is indeed a rich area to mine that can create even bigger
impact on business performance in the coming years.
PG Madhavan