Follow Us

Intelligent Automation for DevOps: An Interview with Rocana’s CTO & Co-Founder Eric Sammer


Recently big data and analytics company Rocana, a software company specialized in developing solutions that bring visibility for IT & DevOps announced the release of its data platform Rocana Ops.

It is in this context that we had the chance to have an excellent interview with Eric Sammer, CTO and Co-Founder of Rocana who kindly agreed to provide us with excellent insights in regards to the company, its software offering as well details from the new version.

Eric has served as a Senior Engineer and Architect at several large scale data-driven organizations, including Experian and Conductor. Most recently served as an Engineering Manager at Cloudera where he was responsible for working with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera's Enterprise Data Hub.

He is deeply entrenched in the open source community and has an ambition for solving difficult scaling and processing problems. Passionate about challenging assumptions and showing large, complex enterprises new ways to solve large, complex IT infrastructure challenges Eric now lead Rocana’s product development and company direction as CTO.

Eric is also the author of Hadoop Operations published by O'Reilly Media and is also a frequent speaker on technology and techniques for large scale data processing, integration, and system management.

Hi Eric, so, what was the motivation behind founding Rocana, the company, and developing Rocana Ops the product?

Rocana was founded directly in response to the growing sophistication of the infrastructure and technology that runs the modern business, and the challenges companies have in understanding those systems. Whether its visibility into health and performance, investigating specific issues, or holistically understanding the impact infrastructure health and well-being have on the business, many businesses are struggling with the complexity of their environments.

These issues have been exacerbated by trends in cloud computing, hybrid environments, microservices, and data-driven products and features such as product recommendations, real time inventory visibility, and customer account self-management that rely on data from, and about, the infrastructure and the business. There are a greater number of more varied data sources, producing finer-grained, data faster than ever before.

Meanwhile, the existing solutions to understand and manage these environments are not keeping pace. All of them focus on interesting, but limited, slices of the problem - just log search, just dashboards of metrics, just the last 30 minutes of network flow data, only security events - making it almost impossible to understand what’s happening. These tools tend to think of each piece of infrastructure as a special case rather than the data warehousing and advanced analytics problem it is.

Outside of core IT, it’s natural to source feeds of data from many different places, cleanse and normalize that data, and bring it into a central governed repository where it can be analyzed, visualized, or used to augment other applications.

We want to extend that thinking into infrastructure, network, cloud, database, platform, and application management to better run the business, while at the same time, opening up new opportunities to bring operational and business data together. That means all of data, from every data source, in real time, with full retention, on an open platform, with advanced analytics to make sense of that data.

How would you describe what Rocana Ops is?

Rocana Ops is a data warehouse for event-oriented data. That includes log events, infrastructure and application metrics, business transactions, IoT events, security events, or anything else with a time stamp. It includes the collection, transformation and normalization, storage, query, analytics, visualization, and management of all event-oriented data in a single open system that scales horizontally on cost-effective hardware or cloud platforms.

A normal deployment of Rocana Ops for our customers will take in anywhere from 10 to 100TB of new data every day, retaining it for years. Each event captured by the system is typically available for query in less than one second, and always online and “query-able thanks to a fully parallelized storage and query platform.

Rocana is placed in a very interesting segment of the IT industry. What are in your view, the differences between the common business analytics user and the IT user regarding the use of a data management and analytics solution? Different needs? Different mindsets? Goals?

I think the first thing to consider when talking about business analytics - meaning both custom-built and off-the-shelf BI suites - and IT focused solutions is that there has historically been very little cross-pollination of ideas between them. Business users tend to think about customized views on top of shared repositories, and building data pipelines to feed those repositories.

There tends to be a focus on reusing data assets and pipelines, lineage concerns, governance, and lifecycle management. IT users on the other hand, think about collection through analytics for each data source as a silo: network performance, application logs, host and process-level performance, and so on each have dedicated collection, storage, and analytics glued together in a tightly coupled package.

Unlike their business counterparts, IT users have very well known data sources and formats (relatively speaking) and analytics they want to perform. So in some ways, IT analytics have a more constrained problem space, but less integration. This is Conway’s Law in serious effect: the notion that software tends to mimic the organizational structures in which it’s developed or designed. These silos lead to target fixation.

IT users can wind up focusing on making sure the operating system is healthy, for example, while the business service it supports is unhealthy. Many tools tend to reinforce that kind of thinking. That extends to diagnostics and troubleshooting which is even worse. Again, we’re talking in generic terms here, but the business users tend to have a holistic focus on an issue relevant to the business rather than limited slices.

We want to open that visibility to the IT side of the house, and hopefully even bring those worlds together.

What are the major pains of IT Ops and how Rocana helps them to solve this pains?

Ops is really a combination of both horizontal and vertically focused groups. Some teams are tasked with building and/or running a complete vertical service like an airline check-in and boarding pass management system. Other teams are focused on providing horizontal services such as data center infrastructure with limited knowledge or visibility into what those tens of thousands of boxes do.

Let’s say customers can’t check-in and get their boarding passes on their mobile devices. The application ops team finds that a subset of application servers keep losing connections to database servers holding reservations, but there’s no reason why, and nothing has changed. Meanwhile, the networking team may be swapping out some bad optics in a switch that has been flaky thinking that traffic is being properly routed over another link. Connecting these two dots within a large organization can be maddeningly time consuming - if it even happens at all - leading to some of the high profile outages we see in the news.

Our focus is really on providing a shared view over all systems under management. Each team still has their focused view on their part of the infrastructure in Rocana Ops, but in this example, the application ops team could also trace the failing connections through to link state changes on switches and correlate that with traffic changes in network flow patterns.

Could you describe Rocana’s main architecture?

Following the data flow through Rocana Ops, data is first collected by one of the included data collection methods. These include native syslog, file and directory tailing, netflow and IPFIX, Windows event log, application and host metrics collection, and native APIs for popular programming languages, as well as REST.

As data is collected, basic parsing is performed turning all data into semi-structured events that can be easily correlated regardless of their source. These events flow into an event data bus forming a real-time stream of the cleansed, normalized events. All of the customer-configurable and extensible transformation, model building and application (for features like anomaly detection), complex event processing, triggering, alerting, and other data services are real time stream-oriented services.

Rocana's General Architecture (Courtesy of Rocana)

A number of representations of the data are stored in highly optimized data systems for natural language search, query, analysis, and visualization in the Rocana Ops application. Under the hood, Rocana Ops is built on top of a number of popular open source systems, in open formats, that may be used for other applications and systems making lock-in a non-issue for customers.

Every part of Rocana’s architecture - but notably the collection, processing, storage, and query systems - is a parallelized, scale-out system, with no single point of failure.

What are the basic or general requirements needed for a typical Rocana deployment?

Rocana Ops is really designed for large deployments as mentioned earlier - 10s to 100s of terabytes per day.

Typically customers start with a half-rack (10) of 2 x 8+ core CPUs, 12 x 4 or 8TB SATA II drives, 128 to 256GB RAM, and a 10Gb network (typical models are the HP DL380 G9 or Dell R730xd) or the cloud-equivalent (Amazon d2.4xl or 8xl) for the data warehouse nodes.

A deployment this size easily handles in excess of a few terabytes per day of data coming into the system from tens to hundreds of thousands of sources.

As customers onboard more data sources or want to retain more data, they begin adding nodes to the system. We have a stellar customer success team that helps customers plan, deploy, and service Rocana Ops, so customers don’t need to worry about finding “unicorn” staff.

What are then, the key functional differentiators of Rocana?

Customers pick Rocana for a few reasons: scale, openness, advanced data management features, and cost. We’ve talked a lot about scale already, but openness is equally critical.

Enterprises, frankly, are done with being locked into proprietary formats and vendors holding their data hostage. Once you’re collecting all of this data in one place, customers often want to use Rocana Ops to provide real time streams to other systems without going through expensive translations or extractions.

 Another major draw is the absence of advanced data management features in other systems such as record-level role-based access control, data lifecycle management, encryption, and auditing facilities. When your log events potentially contain personally identifiable information (PII) or other sensitive data, this is critical.

Finally, operating at scale is both a technology and economic issue. Rocana Ops’ licensing model is based on users rather than nodes or data captured by the system freeing customers to think about how best to solve problems rather than perform license math.

Recently, you've released Rocana Ops 2.0, could you talk about these release’s new capabilities?

Rocana Ops 2.0 is really exciting for us.

We’ve added Rocana Reflex, which incorporates complex event processing and orchestration features allowing customers to perform actions in response to patterns in the data. Actions can be almost anything you can think of including REST API calls to services and sending alerts.

Reflex is paired with a first responder experience designed to help ops teams to quickly triage alerts and anomalies, understand potential causes, collaborate with one another, and spot patterns in the data.

One of the major challenges customers face in deploying dynamic next-generation platforms is operational support, so 2.0 includes first-class support for Pivotal CloudFoundry instrumentation and visibility. Those are just a small sample of what we’ve done. It’s really a huge release!

How does Rocana interact with the open source community, especially the Apache Hadoop project?

Open source is core to what we do at Rocana, and it’s one of the reasons we’re able to do a lot of what we do in Rocana Ops.

We’re committed to collaborating with the community whenever possible. We’ve open sourced parts of Rocana Ops where we believe there’s a benefit to the community (like Osso - A modern standard for event-oriented data). As we build with projects like Apache Hadoop, Kafka, Spark, Impala, and Lucene, we look closely at places where we can contribute features, insight, feedback, testing, and (most often) fixes.

The vast majority of our engineers, customer success, and sales engineers come from an open source background, so we know how to wear multiple hats.

Foremost is always our customers’ success, but it’s absolutely critical to help advance the community along where we are uniquely positioned to help. This is an exciting space for us, and I think you’ll see us doing some interesting work with the community in the future.

Finally, what is in your opinion the best and geekiest song ever?

Now you’re speaking my language; I studied music theory.
Lateralus by Tool for the way it plays with the fibonacci sequence and other math without it being gimmicky or unnatural.
A close second goes to Aphex Twin’s Equation, but I won’t ruin that for you.



previous article
Newer Post
next article
Older Post



no

Name

Email *

Message *