Analytics

Outliers, Self-Importance, and Shared Understanding

I recently attended another overview of machine learning and analytics, but this time it was at my business school’s reunion and delivered by one of the most prominent thought leaders in the world of analytics. Nothing he said was wrong and I agreed with much of the sage advice he had to offer. Also, there is only so much that can be communicated in a 45 minutes session, but I was somewhat disappointed. This session was designed for executive level business leaders not working directly in the field of analytics. Here were the two big pieces of advice:  

  • Don’t lose the outliers and engage in too much aggregation. That’s where all the good stuff is (with the outliers). 

  • Place analytics within the heart of the business of the organization. It’s the most important thing. 

Fine and good. It’d be hard for me to argue against the potential usefulness of true outliers. However, I do think that the usefulness of outliers is highly dependent on the situation. It’s midway down my “most important” list. You might say it’s a bit of an outlier when too much aggregation rubs out the most meaningful insights.  

I also agree with placing analytics at the center of the business. However, I freely recognize that this is what every functional business discipline advises; finance professors recommend that financial analysis be placed at the center of all business decisions and practices, human resource professors advise that there is nothing more important than world-class personnel practices, business operations professors argue that more money flows through operations than any other part of the business and that operational effectiveness largely determines the strength and future of the corporation, etc. It’s not wrong to advise that analytics be placed at the center of everything, it’s just a teeny bit myopic.  

Here’s a seldom heard perspective I’d love to hear from an analytics thought leader addressing executives. In fact, I’d likely put this at the top of my personal “most important” list.  

The hardest work of analytics is fostering a shared understanding of organizational position, performance, and priorities.  

It’s a group thing, not an individual thing. Designing analytic dashboards is more like putting together a great restaurant with a brilliant kitchen combined with a strong staff and fantastic atmosphere than it is like writing a novel or painting a picture. Restaurants need a team of people working together and are enjoyed most always by people dining in groups, not alone. Great restaurants are about shared experiences, not solitary impressions. When executives share an understanding of evidence, coherent decision making can take place. At that point, the analytics system is supporting the organization, not driving differences in perspective. But great dashboards don’t happen by accident and they don’t arise naturally from a single great artist working alone. It’s a shared thing, not an individual thing.  

The Four Realms of Analytics

The four realms of analytics: descriptive analytics, predictive analytics, prescriptive analytics, and diagnostic analytics can be organized along a dimension from rules-based to probability-based and the dimension of time (past and future). This simple two-by-two matrix offers a powerful framework for organizing and describing the differences between analytical processes. While the four realms are often cited, they seldom appear without there being considerable confusion in understanding the distinctions between them. Rather than relying on dictionary definitions and unspecified connotations, this simple framework is offered as a way to communicate different types of analytics to lay audiences.

Oracle Data Mining vs. SAS is like Support Vector Machines vs. Neural Nets

I’m part of a small group of mathematics enthusiasts in Kansas City who meet about once a month on Saturday mornings to drink coffee and discuss mathematics. This past weekend it was my turn to do a presentation to the rest of the group and I chose to speak on the mathematical foundations of the Support Vector Machine algorithm in Oracle Data Mining. While I wasn’t surprised that some in the group had a better handle on Vapnik-Chervonenkis theory than I and gently “guided” me a few times, I was somewhat surprised at their positive reaction to my characterization of the “Oracle” approach to data mining in contrast with the “SAS” approach.