Oracle Data Mining

Major Update Releases for Oracle’s R Technologies

March 2014 turned out to be a banner month for the team working on Oracle R Enterprise, Oracle R Distribution, and Oracle R Advanced Analytics for Hadoop. Link to Oracle R Technologies page

Last week Oracle R Enterprise 1.4 was released. This is an important release with key functionality for the analytics community, for Oracle database administrators, and for Enterprise CIOs. Something for everyone!

Data Scientists get ready to geek out

Perhaps the feature that is most interest to the professional R programming community is the support for R 3.01 (both open source R and Oracle’s newest Oracle R distribution). This means a *ton* of additional packages and newer package releases are available for use in embedded R processes on Oracle Database servers. The step up to R 3.01 is a major milestone. New in-data base algorithms include

  • ore.odmNMF (Non-Negative Matrix Factorization algorithm),
  • ore.odmOC (Oracle’s proprietary Orthogonal Partitioning Cluster algorithm),
  • ore.odmAssocRules (apriori algorithm for building market basket analyses and sophisticated recommendation engines).

Additionally, ore.neural (neural nets), ore.glm (generalized linear regression model), ore.esm (exponential smoothing for time series analyses) all saw major extensions and upgraded capabilities along with new support for Principal Component Analysis, ANOVA, and factor analysis on database data sets.

To see the new blazing speeds for analytics, check out the Oracle R Distribution 3.01 Benchmarks.

Oracle DBAs Get to Specify Degree of Parallelism for ORE Functions

Traditional Open Source R was developed with single-threaded processes as the rule, so users just accepted long run times along with limits on data sets imposed by the amount of memory their machines had. Enterprise data sets and enterprise data demand faster processing and far larger dataspaces.

Enterprise CIOs See the Integrated Future of Designed Data Warehouses, NoSQL Big Data, and Multi-Tenant Solutions

Compromise, strategic tradeoffs, lock-in, and local optimizations have dominated the world of CIOs working to develop a coherent vision for their organizations regarding descriptive, predictive, and prescriptive analytics. The combination of Oracle’s R technologies with Oracle 12c Database, Oracle NoSQL 3.0, Oracle’s Engineered Systems, and Oracle’s Big Data strategy means that CIOs can confidently set forth a defined analytics path for their organizations without becoming hostage to a proprietary analytics culture and system. Oracle R Enterprise 1.4 enhances Oracle Database’s ability as a world-class high performance computing environment for advanced statistical functions and parallel processing within a secure and managed system.

Smart New Choices for Oracle Data Mining (part 2)

Data visualization is near and dear to my heart (as it is for the majority of folks engaged in advanced analytics). The new Graph node in Oracle Data Miner, the GUI extension in SQL Developer 4.0 for, adds important visualization capabilities directly in the workflow interface where analysts graphically build data mining workflows. This allows analysts to output a standard graph or chart at any point in the workflow.

Once again, simplicity and efficiency guided the Oracle product management development team in terms of deciding what capabilities to add to the interface. The choices for chart types cover the fundamental options for visualizing data including: Line Chart, Bar Chart, Scatter Plot, Histogram, and Box Plot. While this is no means a comprehensive list, it does allow for an analyst to make a chart in a manner of minutes (the interface is clean and simple and again demonstrates a “utilitarian” emphasis on achieving fast results rather than a bloated interface with every bell and whistle requiring extensive time learning the interface. The inclusion of the Box Plot graph which visually shows the distribution of data within a set is particularly welcome.

Let’s say, for example, that you are doing a clustering analysis and discover an interesting correlation among two of the parameters that is important in creating your cluster definitions and you want to highlight it directly in a graph. Sure, you could use the built-in explore capabilities for the clusters which have their own graphing features (perhaps how you discovered it in the first place), but there is all kinds of other information included in those graphs and short of taking a screen shot, it would be hard to export it. You can simply attach an apply and your original data source node to your cluster node (remember, you need a data flow node to create an export table or graph) and create a simple scatterplot that highlights the relationship (graph the two important attributes on the X and Y axes and group by cluster number) in a matter of just a few minutes. This graph can then be exported or copied to the clipboard (and it can remain in the workflow and be updated just as any other object would be.)

While there were already some very powerful visualization capabilities built in to the Explore Data node and within each of the Model Nodes themselves, this new Graph node further extends Oracle Data Miner capabilities to be a powerful, everyday tool for in-database data mining. Furthermore, it completely lives up to Oracle Data Mining’s strategic vision of extending ODM with powerful, smart capabilities that focus on delivering value quickly rather than including every knob and dial under the sun.

Oracle Data Mining vs. SAS is like Support Vector Machines vs. Neural Nets

I’m part of a small group of mathematics enthusiasts in Kansas City who meet about once a month on Saturday mornings to drink coffee and discuss mathematics. This past weekend it was my turn to do a presentation to the rest of the group and I chose to speak on the mathematical foundations of the Support Vector Machine algorithm in Oracle Data Mining. While I wasn’t surprised that some in the group had a better handle on Vapnik-Chervonenkis theory than I and gently “guided” me a few times, I was somewhat surprised at their positive reaction to my characterization of the “Oracle” approach to data mining in contrast with the “SAS” approach.