Wednesday, January 27, 2016

Analytics Made Simple

Analytics Made Simple

PG Madhavan, Ph.D.
Chief Algorist, Syzen Analytics, Inc.
Seattle, WA, USA

Brief bio: PG developed his expertise as an EECS Professor, Computational Neuroscience researcher, Bell labs MTS, Microsoft Architect and multiple startup leader. He has over 100 publications & platform presentations to Sales, Marketing, Product, Standards and Research groups as well as 12 issued US patents. Major Contributions:
·       Computational Neuroscience of Hippocampal Place Cell phenomenon related to the subject matter of 2014 Nobel Prize in Medicine.
·       Random Field Theory estimation methods, relationship to systems theory and industry applications.
·       Systems Analytics bringing model-based methods into current analytics practice.
·       Four startups with two as Founder.
 

Four hard topics in Analytics are explained in plain English in this article:
1.       Machine Learning.
2.       Why is Predictive Analytics important to business?
3.       Prediction – the other dismal science?
4.       Future of Analytics.

Machine Learning in plain English

If someone asks you, “What is ML?”, what will be your conceptual, non-technical answer?

Mine is . . . ML is “cluster”, “classify” and “convert”. I use these words in their English language sense and not as techniques. What do I mean by that?

Cluster: Structure in the data is information – find the structure.
Classify: Transform structure into a Mathematical form.
Convert: Convert into insight/ action.
Do this by Learning – meaning, use the ability to generalize from experience.

This captures the essence of ML for me. From my experience, I find that –
·       Convert: best done by a “paired” (Data Scientist + Domain Expert) combo.
·       Classify: there is a grab bag of tools and techniques that the Data Scientist can exploit on one’s own. You can see my attempt at unifying this bag of tricks here – “Unifying Machine Learning to create breakthrough perspectives”.
·       Cluster: I am not referring to specific clustering *algorithms* here. This step is where the Data Scientist works to sense, identify and extract structure or patterns or features in the data which are the bearers of information!

“Cluster” is the hardest part – data do not tell you where it hides the structure. Finding patterns is an “art” where inspiration, skill, experience, knowledge of inter-related theories, etc. play a major part. In a current algorithm work that I am doing, it turned out (after *months* of slicing and dicing the data) that rendering data into “phasors” (or complex variables) revealed the structure hidden in the data “by itself”!

If you are able to get at the most descriptive and discriminatory features at the “Cluster” stage, the rest of the steps will just fall into place (almost) and provide the best robust solution! If not, you may succeed but you will work many times harder to Classify and Covnert and end up with non-optimal answers.

It must be clear that my comments apply only to the first time development of an algorithm for a new business problem; once an end-to-end algorithm is in place, of course, the Cluster-Classify-Covnert steps can be automated for repeated application to similar data sets. But for the first-time ML algorithm solution development, automation cannot replace art!

Why is Predictive Analytics important to business?

A prerequisite for performance at a high level in business is the ability to understand and manage complexity. Complex systems to be managed properly requires a ton of data at the right time. BIG Data provide us the data we need; to put these data to work in order to take us to the high levels of complexity required while still managing it, we have to anticipate what is about to happen and react when it happens in a closed loop manner. Predictive Analytics will allow us to push our “system” to the edge (without “falling over”) in a managed fashion. This is why businesses embrace Predictive Analytics - to manage businesses at a high level of performance at the edge of complexity overload.

Prediction – the other dismal science?

An insightful person once said, “Prediction is like driving your car forward by looking only at the rearview mirror!”. If the road is dead-straight, you are good . . . UNLESS there is a stalled vehicle ahead in the middle of the road.

We should consider short-term and long-term prediction separately. Long-term prediction is nearly a lost cause. In the 80’s and 90’s, chaos and complexity theorists showed us that things can spin out of control even when we have perfect past and present information (predicting weather beyond 3 weeks is a major challenge, if not impossible). Even earlier, stochastic process theory told us that “non-stationarity” where statistics evolve (slowly or fast) can render longer term predictions unreliable.

If the underlying systems do not evolve quickly or suddenly, there is some hope. Causal systems (in Systems Theory, it means that no future information of any kind is available in the current state of the system), where “the car is driven forward strictly by using the rearview mirror”, outcomes are predictable in the sense that, as long as the “road is straight” or “curves only gently”, we can be somewhat confident in predicting a few steps ahead. This may be quite useful in some Data Science applications (such as in Fintech).

Another type of prediction involves not the actual path of future events (or the “state space trajectories” in the parlance) but the occurrence of a “black swan” or an “X-event” (for an elegant in-depth discussion, see John Casti, “X-Events: Complexity Overload and the Collapse of Everything’, 2013). For that matter, ANY unwanted event can be good to know about in advance – consider unwanted destructive vibrations (called “chatter”) in machine tools, as an example; early warning may be possible and very useful in saving expensive work pieces (“Instantaneous Scale of Fluctuation Using Kalman-TFD and Applications in Machine Tool Monitoring”). We find that sometimes the underlying system does undergo some pre-event changes (such as approach “complexity overload”, “state-space volume inflation”, “increase in degrees of freedom”, etc.) which may be detectable and trackable. However, there is NO escaping False Positives (and associated wastage of resources preparing for the event that does not come) or False Negatives (and be blind-sided when we are told it is not going to happen).

At Syzen Analytics, Inc., we use an explicit systems theory approach to Analytics. In our SYSTEMS Analytics formulation (“Future of Analytics – a definitive Roadmap”), the parameters of the system and its variation over time are tracked adaptively in real-time which tells us how long into the future we can predict safely – if the parameters evolve slowly or cyclically, we have higher confidence in our predictive analytics solutions.

Wanting to know the future has always been a human preoccupation – we see that you cannot truly know the future but in some cases, predictions to some extent are possible . . . surrounded by many caveats; more of “excuses” than definitive answers. Sounds a lot like a dismal science!

Future of Analytics – Spatio-temporal data

As businesses push to higher levels of performance, higher fidelity models are going to be necessary to produce more accurate and hence valuable predictions and recommendations for business operations.

ALL data are spatio-temporal! At the simplest to more complex levels -
·       Data can be considered isolated at the simplest level – a “snap shot”.
·       Then we realize that data exist in a “social” network with mutual interactions.
·       In reality, data exist in *embedded* forms in “influence” networks of one type or the other which are distributed in time and space – a “video”!

Spatial extent of data (distance) can be folded into time if we assume a certain information diffusion speed. Graph-theoretic methods do not account for time dimension. For accurate analysis, no escaping Dynamics over Time; meaning the use of differential (or difference) equations . . . and Systems Theory!


Systems Theory + Analytics = “SYSTEMS Analytics”! A few example business applications are shown above. As you can see, it spans most of the current Analytics use cases and many more promising ones when network graphs and spatio-temporal nature of data are fully incorporated in the coming years – basic theories and some algorithms are already in hand. For specific technologies, see –
·       For a full 30-minute discourse, Youtube video on “Future of Analytics – a definitive roadmap

From the simple explanation of ML, the power and limitations of prediction and the promising Analytics technology roadmap ahead, it is clear that Data Science is indeed a rich area to mine that can create even bigger impact on business performance in the coming years.

PG Madhavan



 

Sunday, October 25, 2015

Unifying Machine Learning to create breakthrough perspectives



Machine Learning – a unifying perspective & new paths

PG Madhavan, Ph.D.
Chairman, Syzen Analytics, Inc., Seattle, WA, USA
pgmad@syzenanalytics.com

Dr. PG Madhavan is the Founder of Syzen Analytics, Inc. He developed his expertise in Analytics as an EECS Professor, Computational Neuroscience researcher, Bell Labs MTS, Microsoft Architect and startup CEO. PG has been involved in four startups with two as Founder.
Major Original Contributions:
·       Computational Neuroscience of Hippocampal Place Cell phenomenon related to the subject matter of 2014 Nobel Prize in Medicine.
·       Random Field Theory estimation methods, relationship to systems theory and industry applications.
·       Early Bluetooth, Wi-Fi, 2.5G/EDGE and Ultra-wideband wireless technology standards and products.
·       Currently developing Systems Analytics bringing model-based methods into current Analytics practice.
PG has 12 issued US patents and over 100 publications & platform presentations to Sales, Marketing, Product, Industry Standards and Research groups. More at www.linkedin.com/in/pgmad

Pedro Domingos in his new book, “The Master Algorithm”, has done us a huge favor. As is true of any emerging technology field, Machine Learning (ML) is a “bag of tricks” today; it takes a while for a unifying framework to emerge. Then, one can see various aspects of ML as special cases of a general theory rather than a grab-bag of tools and techniques.

Pedro has taken a great early step to such unification. He has collected all major ML initiatives into a taxonomy that makes sense; five schools of thought: the evolutionaries, connectionists, symbolists, Bayesians, and analogizers. I believe this does not go far enough in the unification of ML thought however . . .

From the early days of “ML”, I see Pattern Recognition and Classification as a better unifying perspective. In particular, the classic textbook of Duda & Hart, “Pattern Classification & Scene Analysis”, published in 1973 is my starting point!

Duda & Hart’s approach in simple terms is as follows. Given labelled samples, obtain a class description consisting of either a distance metric (Euclidean, intra-class, etc.) or a probability density function and then derive a decision rule (Maximum A-posteriori Probability, Bayes, etc.) from the description. The decision rule specifies a decision boundary in feature space among classes.

Alternatively, decision surface can be derived directly from labelled samples which is then called a “Discriminant Function”, perceptron being an example. Then, most if not all current ML techniques can be seen as dueling methods to derive Discriminant Functions!

Discriminant Functions can be linear or nonlinear (neural network with back-propagation, deep learning, support vector machines, kernel PCA, etc.) and outputs can be binary, integer or real valued. Various learning algorithms can be seen as belonging to the family of iterative/ recursive/ adaptive learning algorithms (Least Mean Square being a great old standby!) that update the parameters of the Discriminant Function as new data arrive.

In the discussion above, features were considered as “static” and not context-sensitive (for identifying a word within a sentence as an example). Context-sensitivity or Dynamics can be added to improve classification by incorporating Markov models (or Hidden Markov Models for tractable computations). Markov model is a special case of State Space Models which are well-studied in Systems Theory.

Setting aside Supervised Classification when labelled samples (or “desired signals”) are available, what can we do when there is no supervision? This is the realm of much harder Unsupervised Learning, which is very useful in transforming basic features into more and more meaningful ones. One usually brings in some overall desirable property to guide unsupervised learning. From the domain of “blind processing” (Radar signal processing, for example), Mutual Information among classes can be minimized as a learning process in the belief that the “best” classification happens when the classes have least overlapping information (better “efficiency” in representation).

Instead of entropy-related quantities that are hard to estimate, it is likely that Scale of Fluctuation which is related to “order” and “state space volume” may be a quantity to optimize for a new unsupervised learning process. (For more information on Scale of Fluctuation, refer to my papers, “Instantaneous Scale of Fluctuation Using Kalman-TFD and Applications in Machine Tool Monitoring”, 1997 & “Kalman Filtering and time-frequency distribution of random signals”, 1996).

In all of the existing ML bags of tricks, we are still staying at the surface level! We are modeling the attributes or data DIRECTLY. What if we went one level deeper? Model the SYSTEM that generates the data! Syzen Analytics, Inc., takes such an explicit approach in what we call “SYSTEMS” Analytics” which has already demonstrated significant value in business applications.

In Syzen’s retail commerce application, our Systems Analytics approach hypothesizes that there is a system, either explicit or implicit, behind the scenes generating customer purchase behaviors and purchase propensities. This ‘one-level-deeper system model parameters’ can be more effective for pattern recognition and classification purposes instead of the data that the model generates! There is a long history of model parameters providing better estimates (in power spectrum analysis, for example). Scale of Fluctuation mentioned earlier seems to have another desirable property of quantifying “coupling” among deeper-level model parameters.

Context-sensitivity dynamics is a very good avenue to exploit. The dynamics could be over any independent variable (time always comes to mind first but it is only one of the possibilities). As I noted in my recent blog (“SYSTEMS Analytics – the next big thing in Big Data & Analytics”), “Extensions to Systems Analytics in the future will be inspired by the insight that in reality, data exist in *embedded* forms in preference and influence networks which are distributed in time and space” AND other independent dimensions (shopper preference, for example).

Let me pull all of the notions discussed so far into a diagram.




Once the patterns have been recognized and classes identified, the resulting classes can be used for all sorts of applications such as Recommendation Engine, Language Translation, Fraud Detection and many others. The approach I outline above allows you to take a unified approach till the application development stage. In doing so, the unified approach also points out new paths ahead for ML!

Some readers would have noticed an undertow of dichotomies while reading this “opinion piece”: Theoretic vs Heuristic; Formal vs Ad hoc; Mathematics vs AI; Electrical Engineering vs Computer Science academic departmental affiliations! I am firmly in the former camps. However, as an engineer, I am personally happy to start with heuristic solutions but quickly put them on firm mathematical foundations before “gotchas” and unintended consequences of ad hoc methods catch up with me. 

The unification of ML proposed here opens up a multilane highway – join the journey and create more breakthroughs with us or on your own!

In this blog, I have not provided many references – web search will get you most; Pedro Domingos’ “The Master Algorithm” book is an excellent source of ML-related literature. For the newer and less familiar work, please contact me directly.