Skip to main content

Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape

In parts One and Two of this series I gave a little explanation about what Machine Learning is and some of its potential benefits, uses, and challenges within the scope of Business Intelligence and Analytics.

In this installment of the series, and the last devoted to machine learning before we step into cognitive systems, I will attempt to provide a general overview of the Machine Learning (ML) market landscape, describing some, yes, only some, of the vendors and software products that are using ML for performing Analytics and Intelligence, so, here a brief market landscape overview.

Machine learning: a common guest with no invitation

It is quite surprising to find Machine Learning has a vast presence in many of today’s modern analytics applications. Its use is driven by:

  • The increasing need to crunch data that is more complex and more voluminous, at greater speed and with more accuracy—I mean really big data
  • The need to solve increasingly business problems that require methods out of conventional data analysis.

An increasing number of traditional and new software providers, forced by specific market needs to radically evolve their existing solutions or moved by the pure spirit of innovation, have followed the path of incorporating new data analytics techniques to their analytics offering stack, both explicitly, or simply hidden within white curtains.

For software providers that already offer advanced analytics tools such as data mining, incorporating machine learning functionality into their existing capabilities stack is an opportunity to evolve their current solutions and take analytics to the next level.

So, it is quite possible that if you are using an advanced business analytics application, especially for Big Data, you are already using some machine learning technology, whether you know it or not.

The machine learning software landscape, in brief 

One of the interesting aspects of this seemingly new need for dealing with increasingly large and complex sets of information is that many of the machine learning techniques originally used within pure research labs have already gained entrance to the business world, via their incorporation within analytics offerings. New vendors often may incorporate machine learning as the core of their analytics offering, or just as another of the functional features available in their stack.

Taking this into consideration, we can find a great deal of software products that offer machine learning functionality, to different degrees. Consider the following products, loosely grouped by type:

From the lab to the business

In this group we can find a number of products, most of them based on an open-source licensing model that can help organizations to test machine learning and maybe take their first steps.


A collection of machine learning algorithms written in Java that can be applied directly over a dataset, or can be called from a custom Java-coded program, Weka is one of the most popular machine learning tools used in research and academia. It is written under the GNU General Public License, so it can be downloaded and used freely, as long as you comply with the GNU license terms.

Because of its popularity, a lot of information is available about the use of and development with Weka. It still can prove to be challenging for some users not familiar with machine learning, but it’s quite good for those who want to uncover explore the bits and bytes of using machine learning analysis on large datasets.


Probably the most popular language and environment for statistical computing and graphics, R is a GNU project that comprises a wide variety of statistical and graphical techniques with a high degree of scalability. No wonder that R is one of the most widely used statistical tools used by students.

The way the R project is designed to work is by having a core or based system set of statistical features and functions that can be extended with a large set of function libraries provided within the Comprehensive R Archive Network (CRAN).

Within the CRAN library, it is possible to download the necessary functions for multivariate analysis, data mining, and machine learning. But it is fair to assume that it takes a bit of effort to put machine learning to work with R.

Note: R is also of special interest owing to its increasing popularity and adoption via a commercial offering for R called Revolution Analytics, an offering I discuss below.


Jubatus is an online distributed machine learning framework. It is distributed under GNU Lesser General Public License  version 2.1, which makes Jubatus another good option for the learning, trial, and—why not—exploitation of machine learning techniques within a reduced budget.

The framework can be installed in different flavors of Linux, such as Red Hat, Ubuntu, and others, as well as within the Mac OS X. Jubatus includes client libraries for C++, Python, Ruby, and Java. Some of its functional features include a list of machine learning libraries for applying different techniques such as graph mining, anomaly detection, clustering, classification, regression, recommendation, etc.

Apache Mahout

Mahout is Apache’s machine learning algorithm library. Distributed under a commercially friendly Apache software license, Mahout comprises a core set of algorithms for clustering, classification and collaborative filtering that can be implemented on distributed systems.

Mahout supports three basic types of algorithms or use cases to enable recommendation, clustering and classification tasks.

One interesting aspect of Mahout is its goal to build a strong community for the development of new and fresh machine learning algorithms.

Apache Spark

Spark is Apache Hadoop’s general engine for processing large-scale data sets. The Spark engine is also an open source engine that enables users to generate applications in Java, Scala, or Python.

Just like the rest of the Hadoop family, Spark is designed to deal with large amounts of data, both structured and unstructured. The Spark design supports cyclic data flow and in-memory computing, making it ideal for processing large data sets at high speed.

In this scenario, one of the engine’s main components is the MLlib, which is Spark’s machine learning library. The library works using the Spark engine to perform faster than MapReduce and can operate in conjunction with NumPy, Python’s core scientific computing package, giving MLlib a great deal of flexibility to design new applications in these languages.

Some of the algorithms included within MLlib are:

  • K-means clustering with K-means|| initialization
  • L1- and L2-regularized linear regression
  • L1- and L2-regularized logistic regression
  • Alternating least squares collaborative filtering, with explicit ratings or implicit feedback
  • Naïve-Bayes multinomial classification
  • Stochastic gradient descent

While this set of applications gives users hands-on machine learning, at no cost, they can still be somewhat challenging when it comes to putting these applications to work. Many of them require special skills in the art of machine learning or in Java or MapReduce to fully develop a business solution.

Still, these applications can enable new teams to start working on machine learning and experienced ones to develop complex solutions for both small and big data. 
(post-ads) Machine learning by the existing players

As we mentioned earlier in this series, the evolution of Business Intelligence is demanding an increasing incorporation of machine learning techniques into existing BI and Analytics tools.

A number of popular enterprise software applications have already expanded their functional coverage to include machine learning—a useful ally—within their stacks.

Here are just a couple of the vast number of software vendors that have added machine learning either to their core functionality or as an additional feature-product of their stack.


It is no secret that IBM is betting strong in the areas of advanced analytics and cognitive computing, especially with Watson, IBM’s cognitive computing initiative and an offering which we will examine in the cognitive computing part of this series. IBM can potentially enable users to develop machine learning analytics approaches via its SPSS product stack, which incorporates the ability to develop some specific machine learning algorithms via the SPSS Modeler.


Indubitably SAS is one of the key players in the advanced analytics arena, with a solid platform for performing mining and predictive analysis, for both general and industry vertical purposes. It has incorporated key machine language techniques to be adopted for different uses. Several ML techniques can be found within SAS’ vast analytics platform, from SAS Enterprise and Tex Miner products to its SAS High-Performance Optimization offering.

An interesting fact to consider is SAS’ ability to provide industry and line-of-business approaches for many of its software offerings, encapsulating functionality with prepackaged vertical functionality.

Embedded machine learning

Significantly, machine learning techniques are reaching the core of many of the existing powerhouses as well as the newcomers in the data warehouse and Big Data spaces. Via its incorporation as embedded technologies within their database technologies, some analytic and data warehouse providers have now incorporated machine learning techniques, to varying degrees, to their database structures. 


The New York-based company, a provider of Big Data and discovery software solutions, offers a set of what it calls in-database analytics in which a set of analytics capabilities is built right into 1010Data’s database management engine. Machine learning is included along with a set of in-database analytics such as clustering, forecasting, optimization, and others.


Among its multiple offerings for enterprise data warehouse and Big Data environments, Teradata offers the Teradata Warehouse Miner, an application that packages a set of data profiling and mining functions that includes machine learning algorithms alongside predictive and mining ones. The Warehouse Miner is able to perform analysis directly in the database without undergoing a data movement operation, which ease the process of data preparation. 


SAP HANA, which may be SAP’s most important technology initiative ever, will now support almost all (if not actually all) of SAP’s analytics initiatives, and its advanced analytics portfolio is not the exception.

Within HANA, SAP originally launched SAP HANA Advanced Analytics, in which a number of functions for performing mining and prediction take place. Under this set of solutions it is possible to find a set of specific algorithms for performing machine learning operations.

Additionally, SAP has expand its reach into predictive analysis and machine learning via the SAP InfiniteInsight predictive analytics and mining suite, a product developed by KXEN, which SAP recently acquired.

Revolution Analytics

As mentioned previously, the open source R language is becoming one of the most important resources for statistics and mining available in the market. Revolution Analytics, a company founded in 2007, has been able to foster the work done by the huge R community and at the same time develop a commercial offering for exploiting R benefits, giving R more power and performance resources via technology that enables the use of R for enterprise data intensive applications.

Revolution R Enterprise is Revolution Analytics’ main offering and contains the wide range of libraries provided by R enriched with major technology improvements for enabling the construction of enterprise-ready analytics applications. The application is available for download both as workstation and server versions as well as on demand via the AWS Marketplace.

The new breed of advanced analytics

The advent and hype of Big Data has also become a sweet spot for innovation in many areas of the data management spectrum, especially in the area of providing analytics for large volumes of complex data.

A new wave of fresh and innovative software providers is emerging with solutions that enable businesses to perform advanced analytics over Big Data and using machine learning as a key component or enabler for this analysis.

A couple of interesting aspects of these solutions:

  1. Their unique approach to providing specific solutions to complex problems, especially adapted for business environments, combining flexibility and ease of use to make it possible for business users with a certain degree of statistical and mathematical preparation to address complex problems in the business.
  2. Many have them have already, at least partially, configured and prepared specific solutions for common business problems within line-of-business and industries via templates or predefined models, easing the preparation, development, and deployment process.

Here is a sampling of some of these vendors and their solutions:


Being that Skytree’s tagline is “The Machine Learning Company,” it’s pretty obvious that the company has machine learning in its veins. Skytree has entered the Big Data Analytics space with a machine learning platform for performing mining, prediction, and recommendations with, according to Skytree, an enterprise-grade machine learning offering.

Skytree Server is its main offering. A Hadoop-ready machine learning platform with high-performance analytics capabilities, it can also connect to diverse data streams and can compute real-time queries, enabling high-performance analytics services for churn prediction, fraud detection, and lead scoring, among others.

Skytree also offers a series of plug-ins to be connected to the Skytree Server Foundation to improve Skytree’s existing capabilities with specific and more advanced machine learning models and techniques.


If you Google BigML, you will find that “BigML is Machine Learning for everyone.”

The company, founded in 2011 in Corvallis, Oregon, offers a cloud-based large-scale machine learning platform centered on business usability and at highly competitive costs by providing advanced analytics via a subscription-based offering.

The application enables users to prepare complete analytics solutions for a wide range of analysis scenarios, from collecting the data and designing the model to creating special analytics ensembles.

Since it is a cloud-based platform, users can start using BigML services via a number of subscription-based and/or dedicated options. An attractive approach for those organizations trying to make the best of advanced analytics with less use of technical and monetary resources.

Yottamine Analytics

Founded in 2009 by Dr. David Huang, Yottamine has taken Dr. Huang contributions to the theory of machine learning to practice and reflected it within the Yottamine Predictive Service (YPS).

YPS is an on-demand advanced analytics solution based on the use of web services, which allows users to build, deploy, and develop advanced big data analytics solutions.

As an on-demand solution it offers a series of subscription models based on clusters and nodes, with payment based on the usage of the service in terms of node hours—a pretty interesting quota approach. 

Machine learning is pervasive

Of course, this is just a sample of the many advanced analytics offerings that exist. Others are emerging. They use machine learning techniques to different degrees and for many different purposes, specific or general. New companies such as BuildingIQ, emcien, BayNote,  Recommind, and others are taking advantage of the use of machine learning to provide unique offerings in a wide range of industry and business sectors.

So what?

One of the interesting effects of companies dealing with increasing volumes of data and, of course, increasing problems to solve is that techniques such as Machine Learning and other Artificial Intelligence and Cognitive Computing methods are gaining terrain in the business world.

Companies and information workers are being forced to learn about these new disciplines and use them to find ways to improve analysis accuracy, the ability to react and decide, and prediction, encouraging the rise of what some call the data science discipline.

Many of the obscure tools for advanced analytics traditionally used in the science lab or at pure research centers are now surprisingly popular within many business organizations—not just within their research and development departments, but within all their lines of business.

But on the other hand, new software is increasingly able not only to help in the decision-making process, but also to be proactive in reproducing and automatically improving complex analysis models, recommendations, complex scenario analysis to enable early detection and prediction and, potentially, data-based decisions. 

Whether measuring social media campaign effectiveness, effectively predicting sales, detecting fraud, or performing churn analysis, these tools are remaking the way data analysis is done within many organizations.

But this might be just the beginning of a major revolution in the way software serves and interacts with humans. An increasing number of Artificial Intelligence disciplines, of which machine learning is a part, are rapidly evolving and reaching mainstream spaces in the business software world in the form of next-generation cognitive computing systems.

Offerings such as Watson from IBM might be the instigators of a new breed of solutions that go well beyond what we have so far experienced with regard to computers and the analysis process, So, I dare you to stay tuned for my next installment on Cognitive Systems and walk with me to discovery these new offerings.


  1. Machine learning is a promising area of ​​knowledge in which many students want to work. but not many students who have a predisposition to programming and have an education connected to programming can comprehend machine learning, not to mention that ordinary people, and of ourse humanitarians like professional writers who work for that presents the service that offers cheap essay writing cheap essay writing - cannot even understand the essence of machine learning. A person engaged in machine learning can make a lot of money, but not every person can complete this complex programming.

  2. Machine learning is a very advanced technology. Today, everything in the world is declining because machine learning is a very important programming language on standby My dream is to be a Programmer. But my field is different from a programming language. I research a lot of new topics and write topics on them like nursing education dissertation topics, Marketing Dissertation Topics, Dissertation Writing Services, etc. Our company provides quality services and great service to every customer.

  3. Remain on-pattern by shopping with Wayfair promotion codes intended to assist you with saving. ... 27 dynamic Wayfair markdown codes to shop furniture and stylistic layout for your home. wayfair discount code

  4. In the latest tech era, popularity of machine learning is increasing day by day. During my professional career, I met with many scholars who were in the field of robotics and machine learning. Talking with them, this was clearly told by them that very soon, machine learning will be enough improved at the level to control machines like humans. The same is conveyed by many dissertation writers in their researches on robotics.

  5. Case study solutions is a professional online case study solution provider in India. Our mission is to help you succeed in your business by providing high quality, affordable and timely case study solution.

  6. Your post provided some valuable insights - thank you for sharing your knowledge. Join the Most Comprehensive IELTS Coaching in Vadodara and Achieve Your Goals.


Post a Comment

Popular posts from this blog

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the industry of enterprise software, and serve as the basis for many other advances in the incorporation of other technologies related to artificial intelligence and cognitive computing. I also mentioned briefly how machine language is becoming increasingly important for many companies in the business intelligence and analytics industry. In this post I will discuss further the importance that machine learning already has and can have in the analytics ecosystem, especially from a Big Data perspective. Machine learning in the context of BI and Big Data analytics Just as in the lab, and other areas, one of the reasons why machine learning became extremely important and useful in enterprise software is its potential to deal not just with huge amounts of data and extract knowledge from it—which can somehow be addressed with disciplines such as data

Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

The range of what we think and do is limited by what we fail to notice. And because we fail to notice that we fail to notice there is little we can do to change until we notice how failing to notice shapes our thoughts and deeds. —R.D. Laing Amid the hype surrounding technology trends such as big data, cloud computing, or the Internet of Things, for a vast number of organizations, a quiet, persistent question remains unanswered: how do we ensure efficiency and control of our business operations? Business process efficiency and proficiency are essential ingredients for ensuring business growth and competitive advantage. Every day, organizations are discovering that their business process management (BPM) applications and practices are insufficient to take them to higher levels of effectiveness and control. Consumers of BPM technology are now pushing the limits of BPM practices, and BPM software providers are urging the technology forward. So what can we expect from the next

Teradata Open its Data Lake Management Strategy with Kylo: Literally

Still distilling good results from the acquisition of former consultancy company Think Big Analytics , Teradata , a powerhouse in the data management market took one step further to expand its data management stack and to make an interesting contribution to the open source community. Fully developed by the team at Think Big Analytics, in March of 2017 the company launched Kylo –a full data lake management solution– but with an interesting twist: as a contribution to the open source community. Offered as an open source project under the Apache 2.0 license Kylo is, according to Teradata, a new enterprise-ready data lake management platform that enables self-service data ingestion and preparation, as well the necessary functionality for managing metadata, governance and security. One appealing aspect of Kylo is it was developed over an eight year period, as the result of number of internal projects with Fortune 1000 customers which has enabled Teradata to incorporate several be