Sunday, October 25, 2015

Unifying Machine Learning to create breakthrough perspectives



Machine Learning – a unifying perspective & new paths

PG Madhavan, Ph.D.
Chairman, Syzen Analytics, Inc., Seattle, WA, USA
pgmad@syzenanalytics.com

Dr. PG Madhavan is the Founder of Syzen Analytics, Inc. He developed his expertise in Analytics as an EECS Professor, Computational Neuroscience researcher, Bell Labs MTS, Microsoft Architect and startup CEO. PG has been involved in four startups with two as Founder.
Major Original Contributions:
·       Computational Neuroscience of Hippocampal Place Cell phenomenon related to the subject matter of 2014 Nobel Prize in Medicine.
·       Random Field Theory estimation methods, relationship to systems theory and industry applications.
·       Early Bluetooth, Wi-Fi, 2.5G/EDGE and Ultra-wideband wireless technology standards and products.
·       Currently developing Systems Analytics bringing model-based methods into current Analytics practice.
PG has 12 issued US patents and over 100 publications & platform presentations to Sales, Marketing, Product, Industry Standards and Research groups. More at www.linkedin.com/in/pgmad

Pedro Domingos in his new book, “The Master Algorithm”, has done us a huge favor. As is true of any emerging technology field, Machine Learning (ML) is a “bag of tricks” today; it takes a while for a unifying framework to emerge. Then, one can see various aspects of ML as special cases of a general theory rather than a grab-bag of tools and techniques.

Pedro has taken a great early step to such unification. He has collected all major ML initiatives into a taxonomy that makes sense; five schools of thought: the evolutionaries, connectionists, symbolists, Bayesians, and analogizers. I believe this does not go far enough in the unification of ML thought however . . .

From the early days of “ML”, I see Pattern Recognition and Classification as a better unifying perspective. In particular, the classic textbook of Duda & Hart, “Pattern Classification & Scene Analysis”, published in 1973 is my starting point!

Duda & Hart’s approach in simple terms is as follows. Given labelled samples, obtain a class description consisting of either a distance metric (Euclidean, intra-class, etc.) or a probability density function and then derive a decision rule (Maximum A-posteriori Probability, Bayes, etc.) from the description. The decision rule specifies a decision boundary in feature space among classes.

Alternatively, decision surface can be derived directly from labelled samples which is then called a “Discriminant Function”, perceptron being an example. Then, most if not all current ML techniques can be seen as dueling methods to derive Discriminant Functions!

Discriminant Functions can be linear or nonlinear (neural network with back-propagation, deep learning, support vector machines, kernel PCA, etc.) and outputs can be binary, integer or real valued. Various learning algorithms can be seen as belonging to the family of iterative/ recursive/ adaptive learning algorithms (Least Mean Square being a great old standby!) that update the parameters of the Discriminant Function as new data arrive.

In the discussion above, features were considered as “static” and not context-sensitive (for identifying a word within a sentence as an example). Context-sensitivity or Dynamics can be added to improve classification by incorporating Markov models (or Hidden Markov Models for tractable computations). Markov model is a special case of State Space Models which are well-studied in Systems Theory.

Setting aside Supervised Classification when labelled samples (or “desired signals”) are available, what can we do when there is no supervision? This is the realm of much harder Unsupervised Learning, which is very useful in transforming basic features into more and more meaningful ones. One usually brings in some overall desirable property to guide unsupervised learning. From the domain of “blind processing” (Radar signal processing, for example), Mutual Information among classes can be minimized as a learning process in the belief that the “best” classification happens when the classes have least overlapping information (better “efficiency” in representation).

Instead of entropy-related quantities that are hard to estimate, it is likely that Scale of Fluctuation which is related to “order” and “state space volume” may be a quantity to optimize for a new unsupervised learning process. (For more information on Scale of Fluctuation, refer to my papers, “Instantaneous Scale of Fluctuation Using Kalman-TFD and Applications in Machine Tool Monitoring”, 1997 & “Kalman Filtering and time-frequency distribution of random signals”, 1996).

In all of the existing ML bags of tricks, we are still staying at the surface level! We are modeling the attributes or data DIRECTLY. What if we went one level deeper? Model the SYSTEM that generates the data! Syzen Analytics, Inc., takes such an explicit approach in what we call “SYSTEMS” Analytics” which has already demonstrated significant value in business applications.

In Syzen’s retail commerce application, our Systems Analytics approach hypothesizes that there is a system, either explicit or implicit, behind the scenes generating customer purchase behaviors and purchase propensities. This ‘one-level-deeper system model parameters’ can be more effective for pattern recognition and classification purposes instead of the data that the model generates! There is a long history of model parameters providing better estimates (in power spectrum analysis, for example). Scale of Fluctuation mentioned earlier seems to have another desirable property of quantifying “coupling” among deeper-level model parameters.

Context-sensitivity dynamics is a very good avenue to exploit. The dynamics could be over any independent variable (time always comes to mind first but it is only one of the possibilities). As I noted in my recent blog (“SYSTEMS Analytics – the next big thing in Big Data & Analytics”), “Extensions to Systems Analytics in the future will be inspired by the insight that in reality, data exist in *embedded* forms in preference and influence networks which are distributed in time and space” AND other independent dimensions (shopper preference, for example).

Let me pull all of the notions discussed so far into a diagram.




Once the patterns have been recognized and classes identified, the resulting classes can be used for all sorts of applications such as Recommendation Engine, Language Translation, Fraud Detection and many others. The approach I outline above allows you to take a unified approach till the application development stage. In doing so, the unified approach also points out new paths ahead for ML!

Some readers would have noticed an undertow of dichotomies while reading this “opinion piece”: Theoretic vs Heuristic; Formal vs Ad hoc; Mathematics vs AI; Electrical Engineering vs Computer Science academic departmental affiliations! I am firmly in the former camps. However, as an engineer, I am personally happy to start with heuristic solutions but quickly put them on firm mathematical foundations before “gotchas” and unintended consequences of ad hoc methods catch up with me. 

The unification of ML proposed here opens up a multilane highway – join the journey and create more breakthroughs with us or on your own!

In this blog, I have not provided many references – web search will get you most; Pedro Domingos’ “The Master Algorithm” book is an excellent source of ML-related literature. For the newer and less familiar work, please contact me directly.



Sunday, May 17, 2015

“IA not AI” in Retail Commerce – Enhanced Tanpin Kanri



HighlightsEnhanced Tanpin Kanri is a specific example of Intelligence Augmentation. Store staff's local knowledge and engagement with their shoppers cannot be replaced; but Big Data & Analytics can provide a significant leg-up in their difficult job of hypothesis-generation by providing data-driven predictions that they can safely rely on and improve incrementally. In a highly competitive low-margin business such as fast moving consumer goods retail, pioneering use of IA in their operations will determine the winners.

Dr. PG Madhavan is the Founder and Chairman of Syzen Analytics, Inc. He developed his expertise in analytics as an EECS Professor, Computational Neuroscience researcher, Bell Labs MTS, Microsoft Architect and startup CEO. He has been involved in four startups with two as Founder. He has over 100 publications & platform presentations to Sales, Marketing, Product, Industry Standards and Research groups and 12 issued US patents. He conceived and leads the development of SYSTEMS Analytics and is continually engaged hands-on in the development of advanced Analytics algorithms.

Artificial Intelligence or “AI” is the technology of the future; it has been so for the past 50 years . . . and it continues to be today! Intelligence Augmentation or “IA” has been around for as long. IA as a paradigm for value creation by computers was demonstrated by Douglas Engelbart during his 1968 “Mother of all Demos” (and in his 1962 report, “Augmenting Human Intellect: A Conceptual Framework”). While Engelbart’s working demo had to do with the mouse, networking, hypertext, etc. (way before their day-to-day use), IA has increased in scope massively in the last 5 plus years. Now, Big Data & Analytics can truly augment human intelligence beyond anything Engelbart could have imagined.


IA is contrasted with Artificial Intelligence which in its early days was the realm of theorem proving, chess playing, expert systems and neural networks. AI has made large strides but its full scope is yet to be realized. IA on the other hand can be put to significant use and benefit today. Such is the story of Enhanced Tanpin Kanri in retail commerce.


As a keen observer of the retail ecosystem’s demand chain portion for the past 3 years or so, I am struck by the inefficiencies in large portions of Retail. Shoppers finding what they want on the shelf is considered a BIG problem in the industry (the so called “OOS problem” or Out-Of-Stock problem) – to the tune of $170 Billion per year! The following diagram illustrates the portion of Retail ecosystem on which we will focus in this blog.


The left half represents the current dominant model. “Push” model drives manufacturer’s FMCG (fast moving consumer good from “brands” such as Procter & Gamble, Kraft, Unilever and others) into the supply chain with store shelves as the final destination. As retailers move to the right half of the picture, they tend to be more agile and mature in their practices and are seeking a competitive edge through differentiation – they find that by focusing their store operations to satisfy the LOCAL customer via making available what she prefers on the store shelves, they can win. The poster-child of this revolution is 7-Eleven, the ubiquitous store at virtually every street corner around the globe!

7-Eleven’s use of the “Pull” model since early 2000’s has been very successful as captured in a case study at Harvard Business School in 2011 (HBS Case Study: 9-506-002 REV: FEBRUARY 23, 2011). Quoting from this study, “Toshifumi Suzuki, Chairman and CEO of Seven and I Holdings Co., was widely credited as the mastermind behind Seven-Eleven Japan’s rise” and goes on to say that ‘Suzuki’s emphasis on fresh merchandise, innovative inventory management techniques, and numerous technological improvements guided Seven-Eleven Japan’s rapid growth. At the core of these lay Tanpin Kanri, Suzuki’s signature management framework’.

Proof is in the pudding – “Tanpin Kanri has yielded merchandising decisions that has decreased inventory levels, while increasing margins and daily store sales” since the 2000’s, the HBS case study points out. So what exactly is Tanpin Kanri?

Tanpin Kanri or "management by single product," is an approach to merchandising pioneered by 7-Eleven in Japan that considers demand on a store-by-store and product-by-product basis. Essentially, it empowers store-level retail clerks to tweak suggested assortments and order quantities based on their own educated hypotheses . . . 

You can tell that Tanpin Kanri lies well to the right in the Retail Ecosystem diagram above. To call out some features:
·       PULL model
·       For a buyers’ market
·       Focus on How to satisfy customer
·       Item planning and supply driven by retailer and customer
·       Symbolic of Consumer Initiative

I believe that it is simply a matter of time before Tanpin Kanri variants dominate the Retail demand chain model. As shoppers clamor even more for their preferences to be made available, Retailers will evolve incrementally.

So, where does the PULL Model provide most bang for the buck today?


Where ever Product Density (number of products to be stocked per unit area) is high, “customer pull” will help prioritize what products to stock. Today’s Tanpin Kanri at 7-Eleven Japan accomplishes “customer pull” incorporation via super-diligent store staff manually making the choices.

Here are the “hypothesis testing” steps that 7-Eleven store staff goes through in operationalizing Tanpin Kanri. Based on frequent interactions and personal relationships with the shoppers at a store, the staff generates hypotheses of the shoppers’ needs, wants and dislikes. Based on such information, store staff formulates hypotheses of what to carry (or not) on their store shelves (“merchandising”). Sales during the following days and weeks allow them to ascertain if their hypotheses should be rejected or not; this continuous iteration goes on over time to “track” shopper preferences. Clearly, Tanpin Kanri methodology has been highly successful for 7-Eleven according to HBS case study.



IA based on Big Data & Analytics can play a major role in Tanpin Kanri. In any scientific methodology, coming up with meaningful hypotheses is the HARD part! In Tanpin Kanri case, it is the personal relationships, diligence and intelligence of the store staff that help generate the hypotheses. This is the super-important human value-add that no AI can fully replace! However, we can AUGMENT the hypothesis-driven Tanpin Kanri with Data-Driven precursor that enhances staff intelligence by providing them with predictions that they can build on to formulate their hypotheses.

IA happens in the data-driven precursor step. 7-Eleven has vast amounts of transaction and customer data in their data warehouse. They can be “data-mined” to find shopper preferences at a particular store which can form the basis of what to carry and how much on the store shelf. The data-driven predictions then become the “foundation” on which the store staff adds their own “deltas” based on the shopper quirks that they have surmised through their all-important personal relationships.

Syzen Analytics, Inc. has accomplished IA integration using Machine Learning and a new development in Analytics called “SystemsAnalytics”. A dumb “prediction” for SKU shares is same sales as last year (see the multi-colored bar chart in the middle of the picture below) – in other words, historical sales is the “information-neutral” prediction. But surely, we can do better than that with Systems Analytics.

Syzen is able to provide SKU-by-SKU, store-by-store and week-by-week predictions using typical T-Log data that every Retailer has in its data archives. A typical prediction of Syzen’s ROG-0 SaaS product for a typical SKU at a particular store looks like this.


·       The purple uneven “picket fence” is the weekly predictions – lowest bar chart. This is obtained by combining different “masks”.
·       The papaya-colored mask is the new and most significant one. The values are predicted based on appropriate past intervals of T-Log data digested via Systems Analytics and updated adaptively.
·       The numbered masks in the middle accounts for things that the store manager knows that will happen next year such as a local festival or a rock concert in the nearby park.
·       The MANUAL part of Tanpin Kanri now only involves the store staff simply making daily small adjustments to the SKU “facings” based on local shopper “gossip” to the purple bar chart!

Convenience stores are drawn to the Enhanced Tanpin Kanri method because of the maturity of operations they already possess. With more agile supply chains and the desire to differentiate their stores in response to their local clientele, Syzen finds a lot of enthusiasm among “high-density” Retailers for our predictive solution that makes Tanpin Kanri more scalable due to lesser dependency on super-diligent store staff. Advances in Systems Analytics and other quantitative methods will refine products such as Syzen’s ROG-0 SaaS in the future to sharpen shopper-preference based product assortment predictions.

Enhanced Tanpin Kanri is a specific example of Intelligence Augmentation. Store staff's knowledge of local happenings and engagement with their store shoppers cannot be replaced; but Big Data & Analytics can provide a significant leg-up in their difficult job of hypothesis-generation by providing data-driven predictions that they can safely rely on and improve incrementally. In a highly competitive low-margin business such as fast moving consumer goods retail, pioneering use of IA in their operations will determine the winners.

Syzen website: www.syzenanalytics.com


Monday, April 20, 2015

“Tanpin Kanri” is the next “Kaizen”!

“Tanpin Kanri” is the next “Kaizen”!

Does this spell the death of Retail Category Management as we know it?!

HighlightsTanpin Kanri will be the next big business process/ philosophy export out of Japan after Toyota’s “kaizen”. Why? Tanpin Kanri has yielded merchandising decisions that has decreased inventory levels, while increasing margins and daily store sales”, says an HBS case study. Syzen is finding a lot of enthusiasm among “high product density” Retailers for our predictive analytics SaaS solution that makes Tanpin Kanri scalable.

Dr. PG Madhavan is the Founder and Chairman of Syzen Analytics, Inc. He developed his expertise in analytics as an EECS Professor, Computational Neuroscience researcher, Bell Labs MTS, Microsoft Architect and startup CEO. He has been involved in four startups with two as Founder. He has over 100 publications & platform presentations to Sales, Marketing, Product, Industry Standards and Research groups and 12 issued US patents. He conceived and leads the development of SYSTEMS Analytics and is continually engaged hands-on in the development of advanced Analytics algorithms.

As a keen observer of the retail ecosystem’s demand chain portion for the past 3 years or so, I am struck by the inefficiencies in large portions of Retail. Shoppers finding what they want on the shelf is considered a BIG problem in the industry (the so called “OOS problem” or Out-Of-Stock problem) – to the tune of $170 Billion per year! There are many culprits – let us consider the systemic ones.


The following diagram illustrates the portion of Retail ecosystem on which we will focus in this blog.
The left half represents the current dominant model. “Push” model drives manufacturer’s FMCG (fast moving consumer good from “brands” such as Procter & Gamble, Kraft, Unilever and others) into the supply chain with store shelves as the final destination. As retailers move to the right half of the picture, they tend to be more agile and mature in their practices and are seeking a competitive edge through differentiation – they find that by focusing their store operations to satisfy the LOCAL customer via making available what she prefers on the store shelves, they can win. The poster-child of this revolution is 7-Eleven, the ubiquitous store at virtually every street corner around the globe!

7-Eleven’s use of the “Pull” model since early 2000’s has been very successful as captured in a case study at Harvard Business School in 2011 (HBS Case Study: 9-506-002 REV: FEBRUARY 23, 2011). Quoting from this study, “Toshifumi Suzuki, Chairman and CEO of Seven and I Holdings Co., was widely credited as the mastermind behind Seven-Eleven Japan’s rise” and goes on to say that ‘Suzuki’s emphasis on fresh merchandise, innovative inventory management techniques, and numerous technological improvements guided Seven-Eleven Japan’s rapid growth. At the core of these lay Tanpin Kanri, Suzuki’s signature management framework’.

Proof is in the pudding – “Tanpin Kanri has yielded merchandising decisions that has decreased inventory levels, while increasing margins and daily store sales” since the 2000’s, the HBS case study points out. So what exactly is Tanpin Kanri?

Tanpin Kanri or "management by single product," is an approach to merchandising pioneered by 7-Eleven in Japan that considers demand on a store-by-store and product-by-product basis. Essentially, it empowers store-level retail clerks to tweak suggested assortments and order quantities based on their own educated hypotheses . . . 

You can tell that Tanpin Kanri lies well to the right in the Retail Ecosystem diagram above. To call out some features:
·       PULL model
·       For a buyers’ market
·       Focus on How to satisfy customer
·       Item planning and supply driven by retailer and customer
·       Symbolic of Consumer Initiative

Looking at this list, you may be flabbergasted if I told you that this is NOT how all of Retail works! You may be wondering with all the emphasis on satisfying the customer and ours being a “buyers’ market” in virtually every case, why would any model other than varying shades of Tanpin Kanri exist in the decade of 2010?! That is exactly why I claim that there is an “inevitability” about Tanpin Kanri. And I conclude that Tanpin Kanri will be the next big business process/ philosophy export out of Japan after Toyota’s “kaizen”.

Push model exists virtually everywhere in US grocery retail, super and hyper markets, smaller store chains. The method of “Category Management” referring to a joint-decision process by CPG (consumer product goods) manufacturers and retailers determine what will be on the store shelves for the next 6 or 12 months. Often, all the stores in a chain (some as large as 3000 stores) will have the same “planogram” (a visual representation of product SKUs on the shelves) irrespective of the local clientele! Obviously simple to implement, why has Category Management not evolved to reflect customer preferences explicitly.

A FederalTrade Commission study may hint at the underlying issue; to quote, “The retailer and supplier also typically discuss funds – slotting, promotional, co-op advertising, or other introductory allowances or discounts – some of which would lower the retailer’s per unit purchase cost for an initial period of time”. As is obvious, this exchange of funds can short-change the customer since they are not directly part of the equation! CPG’s are hyper-wealthy compared to razor-thin margin Retailers – that the CPGs can improve the Retailer’s bottom-line directly with these funds will have a distorting effect on commerce due to the joint-decision process of Category Management.

I believe that it is simply a matter of time before Tanpin Kanri variants dominate the Retail demand chain model. If another FTC study finds the exchange of funds between CPGs and Retailers as collaboration of an "unsavory" nature, the transition to Tanpin Kanri will be rapid. If not, as shoppers clamor even more for their preferences to be made available, Retailers will evolve incrementally kicking and screaming – ultimately, we the shoppers pay them more money than CPGs!

So, where does Tanpin Kanri provide most bang for the buck today?
Where ever Product Density (number of products to be stocked per unit area) is high, “customer pull” will help prioritize what products to stock. Today’s Tanpin Kanri at 7-Eleven Japan accomplishes “customer pull” incorporation via super-diligent store staff manually making the choices. Surely, in these days of Big Data and Analytics, there must be a way to provide a helping hand to the store staff to predict “customer pull” or as we call it “Shopper Preferences” . . .

Syzen Analytics, Inc. has accomplished exactly that using Machine Learning and a new development in Analytics called “SystemsAnalytics”. Syzen is able to provide SKU-by-SKU, store-by-store and week-by-week predictions using typical T-Log data that every Retailer has in its data archives. A typical prediction of Syzen’s ROG-0 SaaS product for a typical SKU at a particular store looks like this.



·       The purple uneven “picket fence” is the weekly predictions. This is obtained by combining different “masks”.
·       The papaya-colored mask is the new and most significant one. The values are predicted based on appropriate past intervals of T-Log data digested via Systems Analytics and updated adaptively.
·       The numbered masks in the middle accounts for things that the store manager knows that will happen next year such as a local festival or a rock concert in the nearby park.
·       The MANUAL part of Tanpin Kanri now only involves the store staff simply making daily small adjustments to the SKU “facings” based on local shopper “gossip”!

As we discussed, many retailers are NOT ready to embrace periodic refinement to reflect shopper-preferences, especially if it involves foregoing lucrative fees from CPGs. However, the “old timers” can take advantage of the outputs above for old-fashioned assortment optimization – average the height of the purple figure and use the average as the SKU volume that they order every week; our result is store specific but they can average each SKU over all stores in the chain and generate one Planogram. Of course, they will be throwing out a lot of value (and revenue) by not being responsive to customer preferences on a store-by-store and SKU-by-SKU basis!

Convenience stores are drawn to this “predictive” Tanpin Kanri method because of the maturity of operations they already possess. With more agile supply chains and the desire to differentiate their store in response to their local clientele, Syzen is finding a lot of enthusiasm among “high-density” Retailers for our predictive solution that makes Tanpin Kanri more scalable due to lesser dependency on super-diligent store staff. Advances in Systems Analytics and other quantitative methods will refine products such as Syzen’s ROG-0 SaaS in the future to sharpen shopper-preference based product assortment predictions.

Syzen website: www.syzenanalytics.com

Monday, March 23, 2015

Doing a Startup is like doing a Ph.D.?


“You will flip rapidly from a day in which you are euphorically convinced you are going to own the world, to a day in which doom seems only weeks away and you feel completely ruined, and back again. Over and over and over.”

Which Ph.D. candidate has NOT felt this way during the long 4+ year journey?! J

MarcA goes on to say, “Second, in a startup, absolutely nothing happens unless you make it happen.” In a Ph.D., nobody forces you, not your supervisor (till after 4 years!), not your friends, not anybody. You get it done yourself due to some inner drive. And the long hours and the intense, unwavering focus – the same inner drive! One difference is that your march to dissertation is not affected much by things like stock market crashes, terrorist attacks and natural disasters; on the other hand, your galloping startup can grind to a halt if any of these happen at an inopportune time.

So, what lessons are transferrable?
·       The ability to work long hours with unflinching focus – in my opinion, this is really the main thing you learn while doing a Ph.D.! Increased knowledge in your domain that you acquire/ create during your Ph.D. has a short shelf-life; till other young upstarts come and revolutionize your field.
·       Carefully crafted presentations and write-ups. All acronyms expanded on their first occurrence. All figure axes labelled. Consistency of terms and definitions throughout a presentation. And many such excellent documentation habits.
·       Persistence – it is the hand-maiden of innovation and creativity. Ph.D. journey teaches this lesson in spades. This can be a plus or a minus – startups sometimes requires sudden “pivots” to survive; persistence, unless of the enlightened variety, can be a burden then.

When you have a Ph.D., at some time or the other, you knew more about some topic than anyone else in this world! Don’t carry over this “know-it-all-ness” to a startup – it will usually create friction!

On balance, there are more positive things to transfer than negative. Mr. Ph.D. will have to engage co-founders/ co-workers with the humility born out of the realization that startup is first of all a business and that too with many facets beyond his thesis topic. Ms. Ph.D.’s focus, hard work and persistence will lead her to having strong opinions and the willingness to fight for them - this is a great positive signal to all that she is deeply involved and care deeply about the startup. If co-workers can embrace this level of engagement without questioning motives, tremendous value-add will naturally follow!

PG Madhavan, Ph.D.
CAO & Founder
Syzen Analytics, Inc.
Bellevue, WA
+1-425-440-1487