Skip to main content

Cloudera Analyst Event: Facing a New Data Management Era

I have to say that I attended this year’s Cloudera analyst event in San Francisco with a mix of excitement, expectation and a grain of salt also.

My excitement and expectation were fuelled with all that has been said about Cloudera and its close competitors in the last couple of years, and also by the fact that I am currently focusing my own research on big data and “New Data Platforms”. Moreover, when it comes to events hosted by vendors, I always recommend taking its statements with a grain of salt, because logically the information might be biased.

However, in the end, the event resulted in an enriching learning experience, full of surprises and discoveries. I learnt a lot about a company that is certainly collaborating big time in the transformation of the enterprise software industry.

The event certainly fulfilled many of my “want-to-know-more” expectations about Cloudera and its offering stack; the path the company has taken; and their view of the enterprise data management market.

Certainly, it looks like Cloudera is leading and strongly paving the way for a new generation of enterprise data software management platforms.

So, let me share with you a brief summary and comments about Cloudera’s 2017 industry analyst gathering.

OK, Machine Learning and Data Science are Hot Today

One of the themes of the event was Cloudera’s keen interest and immersion into Machine Learning and Data Science. Just a few days before the event, the company made two important announcements:

The first one was about the beta release of Cloudera Data Science Workbench (Figure 1), the company’s new self-service environment for data science on top of Cloudera Enterprise. This new offering comes directly from the smart acquisition of machine learning and data science startup,

Screencap of Cloudera's Data Science Workbench (Courtesy of Cloudera) 
Some of the capabilities of this product allow data scientists to develop on some of the most popular open source languages —R, Python and Scala— with native Apache Spark and Apache Hadoop integration, which in turn fastens project deployments, from exploration to production.

In this regard, Charles Zedlewski, senior vice president, Products at Cloudera mentioned that

“Cloudera is focused on improving the user experience for data science and engineering teams, in particular those who want to scale their analytics using Spark for data processing and machine learning. The acquisition of and its team provided a strong foundation, and Data Science Workbench now puts self-service data science at scale within reach for our customers.”

One key approach Cloudera takes with the Data Science Workbench is that it aims to enable data scientists to work in an truly open space that can expand its reach to use, for example, deep learning frameworks such as TensorFlow, Microsoft Cognitive Toolkit, MXnet or BigDL, but within a secure and contained environment.

Certainly a new offering with huge potential for Cloudera to increase its customer base, but also to reaffirm and grow its presence within existing customers which now can expand the use of the Cloudera platform without the need to look for third party options to develop on top on.

The second announcement showcases the launch of Cloudera Solution Gallery (Figure 2), which enables Cloudera to showcase its solution’s large partner base  —more than 2,800 globally— and a storefront of more than 100 solutions.

This news should not be taken lightly as it shows Cloudera capability to start building a complete ecosystem around this robust set of products, which in my view is a defining aspect of those companies who want to become an industry de-facto.

Figure 2. Cloudera Solution Gallery (Courtesy of Cloudera)

Cloudera: Way More than Hadoop

During an intensive two-day event filled with presentations, briefings and interviews with Cloudera’s executives and customers, a persistent message prevailed. While the company recognizes its origin as a provider of a commercial distribution for Hadoop, it is now making it clear that its current offering has expanded way beyond the Hadoop realm to become a full-fledged open source data platform. Hadoop is certainly in the core of Cloudera as the main data engine itself but, with support for 25 open source projects, its platform is currently able to offer much more than Hadoop distributed storage capabilities.
This is reflected through Cloudera’s offerings, from the full fledged Cloudera Enterprise Data Hub, its comprehensive platform, or via one of Cloudera’s special configurations:

Cloudera’s executives made it clear that the company strategy is to make sure they are able to provide, via open source offerings, efficient enterprise-ready data management solutions.

However, don’t be surprised if the message from Cloudera changes through time, especially if the company wants to put its aim on larger organizations that most of the times rely on providers that can center their IT services to the business and are not necessarily tied with any particular technology.

Cloudera is redefining itself so it can reposition its offering as a complete data management platform. This is a logical step considering that Cloudera wants to take a bigger piece of the large enterprise market, even when the company’s CEO stated that they “do not want to replace the Netezzas and Oracle’s of the world”.

Based on these events, it is clear to me that eventually, Cloudera will end up frontally competing in specific segments of the data management market —especially with IBM through its  IBM BigInsights, and Teradata, with multiple products that have left and keep leaving a very strong footprint in the data warehouse market. Either we like it or not, big data incumbents such as Cloudera seem to be destined to enter the big fight.

The Future, Cloudera and IoT

During the event I had also a chance to attend a couple of sessions specifically devoted to show Cloudera’s deployment in the context of IoT projects. Another thing worth notice is that, even when Cloudera has some really good stories to tell about IoT, the company seems not to be in a hurry to jump directly onto this wagon.

Perhaps it’s better to let this market get mature and consistent enough before devoting larger technical investments on it. It is always very important to know when and how to invest in an emerging market.

However, we should be very well aware that Cloudera, and the rest of the big data players, will be vital for the growth and evolution of the IoT market.

Figure 3. Cloudera Architecture for IoT (Courtesy of Cloudera)

It’s Hard to Grow Gracefully

Today it’s very hard, if not impossible, to deny that Hadoop is strongly immerse in the enterprise data management ecosystem of almost every industry. Cloudera’s analyst event was yet another confirmation. Large companies are now increasingly using some Cloudera’s different options and configurations for mission critical functions.

Then, for Cloudera the nub of the issue now is not about how to get to the top, but how to stay there, evolve and leave its footprint at the top.

Cloudera has been very smart and strategic to get to this position, yet it seems it has gotten to a place where the tide will get even tougher. From this point on, convincing companies to open the big wallet will take much more than a solid technical justification.

At the time of writing this post, I learnt that Cloudera has filed to go public and will trade on the NY Stock Exchange, and as an article on Fotune mentions:

“Cloudera faces tough competition in the data analytics market and cites in its filing several high-profile rivals, including Amazon Web Services, Google, Microsoft, Hewlett Packard Enterprise, and Oracle.”

It also mentions the case of Hortonworks, which:

“went public in late 2014 with its shares trading at nearly $28 during its height in April 2015. However, Hortonworks’ shares have dropped over 60% to $9.90 on Friday as the company has struggled to be profitable.”

In my opinion, in order for Cloudera to succeed while taking this critical step, they will have to show that they are more than well prepared business, technically and strategically wise, and also prepared and ready for the unexpected, because only then they will be able to grow gracefully and align to play big, with the big guys.

Keep always in mind that, as Benjamin Franklin said:

Without continual growth and progress, such words as improvement,
achievement, and success have no meaning.


  1. Amazing article, Which you have shared about the Data Management. your service is very nice and I liked it. If anyone promotes your service by promotion agencies, then visit at

  2. Thank you for sharing for such informative blog post. I really appreciate your efforts and dedication. Checkout here more info about virtual events platform

  3. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. machine learning projects for final year In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.

    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

  4. you share related information in this post.
    We are offering best training in all of India.

    Python Online Course Certification
    Big Data Hadoop Online Training

  5. Its superb as your other blog posts , appreciate it for posting. Hybrid Event Production

  6. I was surfing net and fortunately came across this site and found very interesting stuff here. Its really fun to read. I enjoyed a lot. Thanks for sharing this wonderful information. promotional work

  7. Really awesome blog!!! I finally found a great post here.I really enjoyed reading this article. Thanks for sharing valuable information.
    Python Training Institutes in Pune
    Best Training Institute for AWS in Pune
    Data Science Course in Pune

  8. Im no expert, but I believe you just made an excellent point. You certainly fully understand what youre speaking about, and I can truly get behind that. Shisha Delivery


Post a Comment

Popular posts from this blog

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the industry of enterprise software, and serve as the basis for many other advances in the incorporation of other technologies related to artificial intelligence and cognitive computing. I also mentioned briefly how machine language is becoming increasingly important for many companies in the business intelligence and analytics industry. In this post I will discuss further the importance that machine learning already has and can have in the analytics ecosystem, especially from a Big Data perspective. Machine learning in the context of BI and Big Data analytics Just as in the lab, and other areas, one of the reasons why machine learning became extremely important and useful in enterprise software is its potential to deal not just with huge amounts of data and extract knowledge from it—which can somehow be addressed with disciplines such as data

Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

The range of what we think and do is limited by what we fail to notice. And because we fail to notice that we fail to notice there is little we can do to change until we notice how failing to notice shapes our thoughts and deeds. —R.D. Laing Amid the hype surrounding technology trends such as big data, cloud computing, or the Internet of Things, for a vast number of organizations, a quiet, persistent question remains unanswered: how do we ensure efficiency and control of our business operations? Business process efficiency and proficiency are essential ingredients for ensuring business growth and competitive advantage. Every day, organizations are discovering that their business process management (BPM) applications and practices are insufficient to take them to higher levels of effectiveness and control. Consumers of BPM technology are now pushing the limits of BPM practices, and BPM software providers are urging the technology forward. So what can we expect from the next

Teradata Open its Data Lake Management Strategy with Kylo: Literally

Still distilling good results from the acquisition of former consultancy company Think Big Analytics , Teradata , a powerhouse in the data management market took one step further to expand its data management stack and to make an interesting contribution to the open source community. Fully developed by the team at Think Big Analytics, in March of 2017 the company launched Kylo –a full data lake management solution– but with an interesting twist: as a contribution to the open source community. Offered as an open source project under the Apache 2.0 license Kylo is, according to Teradata, a new enterprise-ready data lake management platform that enables self-service data ingestion and preparation, as well the necessary functionality for managing metadata, governance and security. One appealing aspect of Kylo is it was developed over an eight year period, as the result of number of internal projects with Fortune 1000 customers which has enabled Teradata to incorporate several be