Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape - DoT


Post Top Ad

Your Ad Spot

June 9, 2014

Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape

In parts One and Two of this series I gave a little explanation about what Machine Learning is and some of its potential benefits, uses, and challenges within the scope of Business Intelligence and Analytics.

In this installment of the series, and the last devoted to machine learning before we step into cognitive systems, I will attempt to provide a general overview of the Machine Learning (ML) market landscape, describing some, yes, only some, of the vendors and software products that are using ML for performing Analytics and Intelligence, so, here a brief market landscape overview.

Machine learning: a common guest with no invitation

It is quite surprising to find Machine Learning has a vast presence in many of today’s modern analytics applications. Its use is driven by:

  • The increasing need to crunch data that is more complex and more voluminous, at greater speed and with more accuracy—I mean really big data
  • The need to solve increasingly business problems that require methods out of conventional data analysis.

An increasing number of traditional and new software providers, forced by specific market needs to radically evolve their existing solutions or moved by the pure spirit of innovation, have followed the path of incorporating new data analytics techniques to their analytics offering stack, both explicitly, or simply hidden within white curtains.

For software providers that already offer advanced analytics tools such as data mining, incorporating machine learning functionality into their existing capabilities stack is an opportunity to evolve their current solutions and take analytics to the next level.

So, it is quite possible that if you are using an advanced business analytics application, especially for Big Data, you are already using some machine learning technology, whether you know it or not.

The machine learning software landscape, in brief 

One of the interesting aspects of this seemingly new need for dealing with increasingly large and complex sets of information is that many of the machine learning techniques originally used within pure research labs have already gained entrance to the business world, via their incorporation within analytics offerings. New vendors often may incorporate machine learning as the core of their analytics offering, or just as another of the functional features available in their stack.

Taking this into consideration, we can find a great deal of software products that offer machine learning functionality, to different degrees. Consider the following products, loosely grouped by type:

From the lab to the business

In this group we can find a number of products, most of them based on an open-source licensing model that can help organizations to test machine learning and maybe take their first steps.


A collection of machine learning algorithms written in Java that can be applied directly over a dataset, or can be called from a custom Java-coded program, Weka is one of the most popular machine learning tools used in research and academia. It is written under the GNU General Public License, so it can be downloaded and used freely, as long as you comply with the GNU license terms.

Because of its popularity, a lot of information is available about the use of and development with Weka. It still can prove to be challenging for some users not familiar with machine learning, but it’s quite good for those who want to uncover explore the bits and bytes of using machine learning analysis on large datasets.


Probably the most popular language and environment for statistical computing and graphics, R is a GNU project that comprises a wide variety of statistical and graphical techniques with a high degree of scalability. No wonder that R is one of the most widely used statistical tools used by students.

The way the R project is designed to work is by having a core or based system set of statistical features and functions that can be extended with a large set of function libraries provided within the Comprehensive R Archive Network (CRAN).

Within the CRAN library, it is possible to download the necessary functions for multivariate analysis, data mining, and machine learning. But it is fair to assume that it takes a bit of effort to put machine learning to work with R.

Note: R is also of special interest owing to its increasing popularity and adoption via a commercial offering for R called Revolution Analytics, an offering I discuss below.


Jubatus is an online distributed machine learning framework. It is distributed under GNU Lesser General Public License  version 2.1, which makes Jubatus another good option for the learning, trial, and—why not—exploitation of machine learning techniques within a reduced budget.

The framework can be installed in different flavors of Linux, such as Red Hat, Ubuntu, and others, as well as within the Mac OS X. Jubatus includes client libraries for C++, Python, Ruby, and Java. Some of its functional features include a list of machine learning libraries for applying different techniques such as graph mining, anomaly detection, clustering, classification, regression, recommendation, etc.

Apache Mahout

Mahout is Apache’s machine learning algorithm library. Distributed under a commercially friendly Apache software license, Mahout comprises a core set of algorithms for clustering, classification and collaborative filtering that can be implemented on distributed systems.

Mahout supports three basic types of algorithms or use cases to enable recommendation, clustering and classification tasks.

One interesting aspect of Mahout is its goal to build a strong community for the development of new and fresh machine learning algorithms.

Apache Spark

Spark is Apache Hadoop’s general engine for processing large-scale data sets. The Spark engine is also an open source engine that enables users to generate applications in Java, Scala, or Python.

Just like the rest of the Hadoop family, Spark is designed to deal with large amounts of data, both structured and unstructured. The Spark design supports cyclic data flow and in-memory computing, making it ideal for processing large data sets at high speed.

In this scenario, one of the engine’s main components is the MLlib, which is Spark’s machine learning library. The library works using the Spark engine to perform faster than MapReduce and can operate in conjunction with NumPy, Python’s core scientific computing package, giving MLlib a great deal of flexibility to design new applications in these languages.

Some of the algorithms included within MLlib are:

  • K-means clustering with K-means|| initialization
  • L1- and L2-regularized linear regression
  • L1- and L2-regularized logistic regression
  • Alternating least squares collaborative filtering, with explicit ratings or implicit feedback
  • Naïve-Bayes multinomial classification
  • Stochastic gradient descent

While this set of applications gives users hands-on machine learning, at no cost, they can still be somewhat challenging when it comes to putting these applications to work. Many of them require special skills in the art of machine learning or in Java or MapReduce to fully develop a business solution.

Still, these applications can enable new teams to start working on machine learning and experienced ones to develop complex solutions for both small and big data. 
(post-ads) Machine learning by the existing players

As we mentioned earlier in this series, the evolution of Business Intelligence is demanding an increasing incorporation of machine learning techniques into existing BI and Analytics tools.

A number of popular enterprise software applications have already expanded their functional coverage to include machine learning—a useful ally—within their stacks.

Here are just a couple of the vast number of software vendors that have added machine learning either to their core functionality or as an additional feature-product of their stack.


It is no secret that IBM is betting strong in the areas of advanced analytics and cognitive computing, especially with Watson, IBM’s cognitive computing initiative and an offering which we will examine in the cognitive computing part of this series. IBM can potentially enable users to develop machine learning analytics approaches via its SPSS product stack, which incorporates the ability to develop some specific machine learning algorithms via the SPSS Modeler.


Indubitably SAS is one of the key players in the advanced analytics arena, with a solid platform for performing mining and predictive analysis, for both general and industry vertical purposes. It has incorporated key machine language techniques to be adopted for different uses. Several ML techniques can be found within SAS’ vast analytics platform, from SAS Enterprise and Tex Miner products to its SAS High-Performance Optimization offering.

An interesting fact to consider is SAS’ ability to provide industry and line-of-business approaches for many of its software offerings, encapsulating functionality with prepackaged vertical functionality.

Embedded machine learning

Significantly, machine learning techniques are reaching the core of many of the existing powerhouses as well as the newcomers in the data warehouse and Big Data spaces. Via its incorporation as embedded technologies within their database technologies, some analytic and data warehouse providers have now incorporated machine learning techniques, to varying degrees, to their database structures. 


The New York-based company, a provider of Big Data and discovery software solutions, offers a set of what it calls in-database analytics in which a set of analytics capabilities is built right into 1010Data’s database management engine. Machine learning is included along with a set of in-database analytics such as clustering, forecasting, optimization, and others.


Among its multiple offerings for enterprise data warehouse and Big Data environments, Teradata offers the Teradata Warehouse Miner, an application that packages a set of data profiling and mining functions that includes machine learning algorithms alongside predictive and mining ones. The Warehouse Miner is able to perform analysis directly in the database without undergoing a data movement operation, which ease the process of data preparation. 


SAP HANA, which may be SAP’s most important technology initiative ever, will now support almost all (if not actually all) of SAP’s analytics initiatives, and its advanced analytics portfolio is not the exception.

Within HANA, SAP originally launched SAP HANA Advanced Analytics, in which a number of functions for performing mining and prediction take place. Under this set of solutions it is possible to find a set of specific algorithms for performing machine learning operations.

Additionally, SAP has expand its reach into predictive analysis and machine learning via the SAP InfiniteInsight predictive analytics and mining suite, a product developed by KXEN, which SAP recently acquired.

Revolution Analytics

As mentioned previously, the open source R language is becoming one of the most important resources for statistics and mining available in the market. Revolution Analytics, a company founded in 2007, has been able to foster the work done by the huge R community and at the same time develop a commercial offering for exploiting R benefits, giving R more power and performance resources via technology that enables the use of R for enterprise data intensive applications.

Revolution R Enterprise is Revolution Analytics’ main offering and contains the wide range of libraries provided by R enriched with major technology improvements for enabling the construction of enterprise-ready analytics applications. The application is available for download both as workstation and server versions as well as on demand via the AWS Marketplace.

The new breed of advanced analytics

The advent and hype of Big Data has also become a sweet spot for innovation in many areas of the data management spectrum, especially in the area of providing analytics for large volumes of complex data.

A new wave of fresh and innovative software providers is emerging with solutions that enable businesses to perform advanced analytics over Big Data and using machine learning as a key component or enabler for this analysis.

A couple of interesting aspects of these solutions:

  1. Their unique approach to providing specific solutions to complex problems, especially adapted for business environments, combining flexibility and ease of use to make it possible for business users with a certain degree of statistical and mathematical preparation to address complex problems in the business.
  2. Many have them have already, at least partially, configured and prepared specific solutions for common business problems within line-of-business and industries via templates or predefined models, easing the preparation, development, and deployment process.

Here is a sampling of some of these vendors and their solutions:


Being that Skytree’s tagline is “The Machine Learning Company,” it’s pretty obvious that the company has machine learning in its veins. Skytree has entered the Big Data Analytics space with a machine learning platform for performing mining, prediction, and recommendations with, according to Skytree, an enterprise-grade machine learning offering.

Skytree Server is its main offering. A Hadoop-ready machine learning platform with high-performance analytics capabilities, it can also connect to diverse data streams and can compute real-time queries, enabling high-performance analytics services for churn prediction, fraud detection, and lead scoring, among others.

Skytree also offers a series of plug-ins to be connected to the Skytree Server Foundation to improve Skytree’s existing capabilities with specific and more advanced machine learning models and techniques.


If you Google BigML, you will find that “BigML is Machine Learning for everyone.”

The company, founded in 2011 in Corvallis, Oregon, offers a cloud-based large-scale machine learning platform centered on business usability and at highly competitive costs by providing advanced analytics via a subscription-based offering.

The application enables users to prepare complete analytics solutions for a wide range of analysis scenarios, from collecting the data and designing the model to creating special analytics ensembles.

Since it is a cloud-based platform, users can start using BigML services via a number of subscription-based and/or dedicated options. An attractive approach for those organizations trying to make the best of advanced analytics with less use of technical and monetary resources.

Yottamine Analytics

Founded in 2009 by Dr. David Huang, Yottamine has taken Dr. Huang contributions to the theory of machine learning to practice and reflected it within the Yottamine Predictive Service (YPS).

YPS is an on-demand advanced analytics solution based on the use of web services, which allows users to build, deploy, and develop advanced big data analytics solutions.

As an on-demand solution it offers a series of subscription models based on clusters and nodes, with payment based on the usage of the service in terms of node hours—a pretty interesting quota approach. 

Machine learning is pervasive

Of course, this is just a sample of the many advanced analytics offerings that exist. Others are emerging. They use machine learning techniques to different degrees and for many different purposes, specific or general. New companies such as BuildingIQ, emcien, BayNote,  Recommind, and others are taking advantage of the use of machine learning to provide unique offerings in a wide range of industry and business sectors.

So what?

One of the interesting effects of companies dealing with increasing volumes of data and, of course, increasing problems to solve is that techniques such as Machine Learning and other Artificial Intelligence and Cognitive Computing methods are gaining terrain in the business world.

Companies and information workers are being forced to learn about these new disciplines and use them to find ways to improve analysis accuracy, the ability to react and decide, and prediction, encouraging the rise of what some call the data science discipline.

Many of the obscure tools for advanced analytics traditionally used in the science lab or at pure research centers are now surprisingly popular within many business organizations—not just within their research and development departments, but within all their lines of business.

But on the other hand, new software is increasingly able not only to help in the decision-making process, but also to be proactive in reproducing and automatically improving complex analysis models, recommendations, complex scenario analysis to enable early detection and prediction and, potentially, data-based decisions. 

Whether measuring social media campaign effectiveness, effectively predicting sales, detecting fraud, or performing churn analysis, these tools are remaking the way data analysis is done within many organizations.

But this might be just the beginning of a major revolution in the way software serves and interacts with humans. An increasing number of Artificial Intelligence disciplines, of which machine learning is a part, are rapidly evolving and reaching mainstream spaces in the business software world in the form of next-generation cognitive computing systems.

Offerings such as Watson from IBM might be the instigators of a new breed of solutions that go well beyond what we have so far experienced with regard to computers and the analysis process, So, I dare you to stay tuned for my next installment on Cognitive Systems and walk with me to discovery these new offerings.

1 comment:

  1. Machine learning is a promising area of ​​knowledge in which many students want to work. but not many students who have a predisposition to programming and have an education connected to programming can comprehend machine learning, not to mention that ordinary people, and of ourse humanitarians like professional writers who work for that presents the service that offers cheap essay writing cheap essay writing - cannot even understand the essence of machine learning. A person engaged in machine learning can make a lot of money, but not every person can complete this complex programming.


Post Top Ad

Your Ad Spot