Smart New Choices for Oracle Data Mining (part 1)

I was at the Oracle Data Mining hands-on lab yesterday at Oracle OpenWorld (the room was packed to capacity) and one particularly sharp participant zoomed through the material and started exploring on his own. He posed the following question: What’s the difference between the Classification node and the Prediction Query node (new in SQL Developer 4.0 when used with the Oracle 12c Database)? Both require a target value, a case id, and result in classification models, so when do I use each? Here’s the way Charlie Berger (Senior Director, Product Management Data and Advanced Analytics) handled the question.


The Prediction Query nodes builds a classification model “on the fly”. This is useful when there are many “natural” partitions in the data that would benefit from using separate classification models and when data sets are dynamic and changing on a regular basis (and thus may benefit from adjustments in a classification model). Picture a wide or many dimensioned data set. Picture also a field that naturally creates partitions or “buckets”. Countries could be an example for a business which is building classification models for its customers. Sales dynamics could vary widely between countries and it wouldn’t necessarily make sense to build one classification model, but rather individual models for different countries. However, building, tuning, and managing more than a hundred different classification models may impose a significant cost on the business (and the analytics staff!).


A better solution is to employ a more automatic prediction process that leverages the natural difference between countries and responds to the dynamic changes in sales patterns. The prediction query node makes smart assumptions about automatic data preparation, sampling, algorithmic settings, etc. reducing the need to make choices for each model and generating result sets much more quickly. In contrast, the classification node enables analysts to fine tune their model through making all kinds of selections regarding the details of algorithms and data handling and make the resulting model a persistent object in the database.


In other words, these two approaches are perfectly complimentary. Rather than prioritize the addition of “features” to their already extremely capable classification algorithms, Oracle added the ability to produce business insights even faster and to improve predictive power by leveraging the natural partitions in the data. Brilliant. This is exactly *not* what most data geeks would do (they’d want more granular control over the individual classification algorithms), but it is what most savvy executives would want, actionable evidence that generate business insights faster.