Skip to main content

Hortonworks’s New Vision for Connected Data Platforms

Courtesy of Hortonworks
On March 1, I had the opportunity to attend this year’s Hortonworks Analyst Summit in San Francisco, where Hortonworks announced several product enhancements and new versions and a new definition for its strategy going forward.

Hortonworks seems to be making a serious attempt to take over the data management space, while maintaining a commitment to open sources and especially to the Apache Foundation. Thus as Hortonworks keeps gaining momentum, it’s also consolidating its corporate strategy and bringing a new balance to its message (combining both technology and business).

By reinforcing alliances, and at the same time moving further towards the business mainstream with a more concise messaging around enterprise readiness, Hortonworks is declaring itself ready to win the battle for the big data management space.

The big question is if the company’s strategy will be effective enough to succeed at this goal, especially in a market already overpopulated and fiercely defended by big software providers.

Digesting Hortonworks’s Announcements
The announcements at the Hortonworks Analyst Summit included news on both the product and partner fronts. With regards to products, Hortonworks announced new versions for both its Hadoop Data (HDP) and Hadoop Dataflow (HDF) platforms.

HDP—New Release, New Cycle
Alongside specific features to improve performance and reinforce ease of use, the latest release of Apache HDP 2.4 (figure 1) includes the latest generation of Apache’s large-scale data processing framework, Spark 1.6, along with Ambari 2.2, Apache’s project for making Hadoop management easier and more efficient.

The inclusion of Ambari seems to be an important key for the provision of a solid, centric management and monitoring tool for Hadoop clusters.

Figure 1. Hortonworks emphaszes enterprise readiness for its HDP version
(Image courtesy of Hortonworks)

Another key announcement with regard to HDP is the revelation of a new release cycle for HDP. Interestingly, it aims to provide users with a consistent product featuring core stability. The new cycle will enable, via yearly releases, HDP services such as HDFS, YARN, and MapReduce as well as Apache Zookeeper to align with a compatible version of Apache Hadoop with the “ODPi Core,” currently in version 2.7.1. These can provide standardization and ensure a stable software base for mission critical workloads.

On the flip side, those extended services that run on top of the Hadoop core, including Spark, Hive, HBase, Ambari and others will be continually released throughout the year to ensure these projects are continuously updated.

Last but not least, HDP’s new version also comes with the new Smartsense 1.2, Hortonworks’s issue resolution application, featuring automatic scheduling and uploading, as well as over 250 new recommendations and guidelines.


Growing NiFi to an Enterprise Level
Along with HDP, Hortonworks also announced version 1.2 of HDF, Hortonworks’s offering for managing data in motion by collecting, manipulating, and curating data in real time. The new version includes new streaming analytics capabilities for Apache NiFi, which powers HDF at its core, and support for Apache Storm and Apache Kafka (figure 2).

Another noteworthy feature coming to HDF is its support for integration with Kerberos, a feature which will enable and ease management of centralized authentication across the platform and other applications. According to Hortonworks, HDF 1.2 will be available to customers in Q1 of 2016.

Figure 2. Improved security and control added to Hortonworks new HDF version
(Image courtesy of Hortonworks)

Hortonworks Adds New Partners to its List
The third announcement from Hortonworks at the conference was a partnership with Hewlett Packard Labs, the central research organization of Hewlett Packard Enterprise (HPE).

The collaboration mainly has to do with a bipartisan effort to enhance performance and capabilities of Apache Spark. According to Hortonworks and HPE, this collaboration will be mainly focused on the development and analysis of a new class of analytic workloads which benefit from using large pools of shared memory.

Says Scott Gnau, Hortonworks’s chief technology officer, with regard to the collaboration agreement:

This collaboration indicates our mutual support of and commitment to the growing Spark community and its solutions. We will continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin.

According to both companies, this collaboration has already generated interesting results which include more efficient memory usage and increased performance as well as faster sorting and in-memory computations for improving Spark’s performance.

The result of these collaborations will be derived as new technology contributions for the Apache Spark community, and thus carry beneficial impacts for this important piece of the Apache Hadoop framework.

Commenting on the new collaborations, Martin Fink, executive vice president and chief technology officer of HPE and board member of Hortonworks, said:

We’re hoping to enable the Spark community to derive insight more rapidly from much larger data sets without having to change a single line of code. We’re very pleased to be able to work with Hortonworks to broaden the range of challenges that Spark can address.

Additionally Hortonworks signed a partnership with Impetus Technologies, Inc., another solution provider based on open source technology. The agreement includes collaboration around StreamAnalytix™, an application that provides tools for rapid and less code development of real-time analytics applications using Storm and Spark. Both companies have the aim that with the use of HDF and StreamAnalytix together, companies will gain a complete and stable platform for the efficient development and delivery of real-time analytics applications.

But The Real News Is …
Hortonworks is rapidly evolving its vision of data management and integration, and this was in my opinion the biggest news of the analyst event. Hortonworks’s strategy is to integrate the management of both data at rest (data residing in HDP) and data in motion (data HDF collects and curates in real-time), as being able to manage both can power actionable intelligence. It is in this context that Hortonworks is working to increase integration between them.

Hortonworks is now taking a new go-to-market approach to provide an increase in quality and enterprise readiness to its platforms. Along with ensuring that ease of use will avoid barriers for end use adoption its marketing message is changing. Now the Hadoop-based company sees the need to take a step further and convince businesses that open source does more than just do the job; it is in fact becoming the quintessential tool for any important data management initiative—and, of course, Hortonworks is the best vendor for the job. Along these lines, Hortonworks is taking steps to provide Spark with enterprise-ready governance, security, and operations to ensure readiness for rapid enterprise integration. This to be gained with the inclusion of Apache Ambari and other Apache projects.

One additional yet important aspect within this strategy has to do with Hortonworks’s work done around enterprise readiness, especially regarding issue tracking (figure 3) and monitoring for mission critical workloads and security reinforcement.

Figure 3. SmartSense 1.2 includes more than 250 recommendations
(Image courtesy of Hortonworks)

It will be interesting to see how this new strategy works for Hortonworks, especially within the big data market where there is extremely fierce competition and where many other vendors are pushing extremely hard to get a piece of the pie, including important partners of Hortonworks.

Taking its data management strategy to a new level is indeed bringing many opportunities for Hortonworks, but these are not without challenges as the company introduces itself into the bigger enterprise footprint of the data management industry.

What do you think about Hortonworks’s new strategy in data management? If you have any comments, please drop me a line below and I’ll respond as soon as I can.

(Originally published)


Popular posts from this blog

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the industry of enterprise software, and serve as the basis for many other advances in the incorporation of other technologies related to artificial intelligence and cognitive computing. I also mentioned briefly how machine language is becoming increasingly important for many companies in the business intelligence and analytics industry. In this post I will discuss further the importance that machine learning already has and can have in the analytics ecosystem, especially from a Big Data perspective. Machine learning in the context of BI and Big Data analytics Just as in the lab, and other areas, one of the reasons why machine learning became extremely important and useful in enterprise software is its potential to deal not just with huge amounts of data and extract knowledge from it—which can somehow be addressed with disciplines such as data

Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

The range of what we think and do is limited by what we fail to notice. And because we fail to notice that we fail to notice there is little we can do to change until we notice how failing to notice shapes our thoughts and deeds. —R.D. Laing Amid the hype surrounding technology trends such as big data, cloud computing, or the Internet of Things, for a vast number of organizations, a quiet, persistent question remains unanswered: how do we ensure efficiency and control of our business operations? Business process efficiency and proficiency are essential ingredients for ensuring business growth and competitive advantage. Every day, organizations are discovering that their business process management (BPM) applications and practices are insufficient to take them to higher levels of effectiveness and control. Consumers of BPM technology are now pushing the limits of BPM practices, and BPM software providers are urging the technology forward. So what can we expect from the next

Teradata Open its Data Lake Management Strategy with Kylo: Literally

Still distilling good results from the acquisition of former consultancy company Think Big Analytics , Teradata , a powerhouse in the data management market took one step further to expand its data management stack and to make an interesting contribution to the open source community. Fully developed by the team at Think Big Analytics, in March of 2017 the company launched Kylo –a full data lake management solution– but with an interesting twist: as a contribution to the open source community. Offered as an open source project under the Apache 2.0 license Kylo is, according to Teradata, a new enterprise-ready data lake management platform that enables self-service data ingestion and preparation, as well the necessary functionality for managing metadata, governance and security. One appealing aspect of Kylo is it was developed over an eight year period, as the result of number of internal projects with Fortune 1000 customers which has enabled Teradata to incorporate several be