Subscribe to our Newsletter
_Subscribe to our Newsletter
_Subscribe and Stay tuned!
altiscale data platform
altiscale insight cloud
Amazon Web Services
channel data management
cloud native con
data of things
decision support systems
Intelligent Automation for DevOps: An Interview with Rocana’s CTO & Co-Founder Eric Sammer
January 16, 2017
Recently big data and analytics company
, a software company specialized in developing solutions that bring visibility for IT & DevOps
announced the release of its data platform Rocana Ops
It is in this context that we had the chance to have an excellent interview with
Eric Sammer, CTO and Co-Founder of Rocana
who kindly agreed to provide us with excellent insights in regards to the company, its software offering as well details from the new version.
Eric has served as a Senior Engineer and Architect at several large scale data-driven organizations, including Experian and Conductor. Most recently served as an Engineering Manager at
where he was responsible for working with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera's Enterprise Data Hub.
He is deeply entrenched in the open source community and has an ambition for solving difficult scaling and processing problems. Passionate about challenging assumptions and showing large, complex enterprises new ways to solve large, complex IT infrastructure challenges Eric now lead Rocana’s product development and company direction as CTO.
Eric is also the author of
published by O'Reilly Media and is also a frequent speaker on technology and techniques for large scale data processing, integration, and system management.
Hi Eric, so, what was the motivation behind founding Rocana, the company, and developing Rocana Ops the product?
Rocana was founded directly in response to the growing sophistication of the infrastructure and technology that runs the modern business, and the challenges companies have in understanding those systems. Whether its visibility into health and performance, investigating specific issues, or holistically understanding the impact infrastructure health and well-being have on the business, many businesses are struggling with the complexity of their environments.
These issues have been exacerbated by trends in cloud computing, hybrid environments, microservices, and data-driven products and features such as product recommendations, real time inventory visibility, and customer account self-management that rely on data from, and about, the infrastructure and the business. There are a greater number of more varied data sources, producing finer-grained, data faster than ever before.
Meanwhile, the existing solutions to understand and manage these environments are not keeping pace. All of them focus on interesting, but limited, slices of the problem - just log search, just dashboards of metrics, just the last 30 minutes of network flow data, only security events - making it almost impossible to understand what’s happening. These tools tend to think of each piece of infrastructure as a special case rather than the data warehousing and advanced analytics problem it is.
Outside of core IT, it’s natural to source feeds of data from many different places, cleanse and normalize that data, and bring it into a central governed repository where it can be analyzed, visualized, or used to augment other applications.
We want to extend that thinking into infrastructure, network, cloud, database, platform, and application management to better run the business, while at the same time, opening up new opportunities to bring operational and business data together. That means all of data, from every data source, in real time, with full retention, on an open platform, with advanced analytics to make sense of that data.
How would you describe what Rocana Ops is?
Rocana Ops is a data warehouse for event-oriented data. That includes log events, infrastructure and application metrics, business transactions, IoT events, security events, or anything else with a time stamp. It includes the collection, transformation and normalization, storage, query, analytics, visualization, and management of all event-oriented data in a single open system that scales horizontally on cost-effective hardware or cloud platforms.
A normal deployment of Rocana Ops for our customers will take in anywhere from 10 to 100TB of new data every day, retaining it for years. Each event captured by the system is typically available for query in less than one second, and always online and “query-able thanks to a fully parallelized storage and query platform.
Rocana is placed in a very interesting segment of the IT industry. What are in your view, the differences between the common business analytics user and the IT user regarding the use of a data management and analytics solution? Different needs? Different mindsets? Goals?
I think the first thing to consider when talking about business analytics - meaning both custom-built and off-the-shelf BI suites - and IT focused solutions is that there has historically been very little cross-pollination of ideas between them. Business users tend to think about customized views on top of shared repositories, and building data pipelines to feed those repositories.
There tends to be a focus on reusing data assets and pipelines, lineage concerns, governance, and lifecycle management. IT users on the other hand, think about collection through analytics for each data source as a silo: network performance, application logs, host and process-level performance, and so on each have dedicated collection, storage, and analytics glued together in a tightly coupled package.
Unlike their business counterparts, IT users have very well known data sources and formats (relatively speaking) and analytics they want to perform. So in some ways, IT analytics have a more constrained problem space, but less integration. This is
in serious effect: the notion that software tends to mimic the organizational structures in which it’s developed or designed. These silos lead to target fixation.
IT users can wind up focusing on making sure the operating system is healthy, for example, while the business service it supports is unhealthy. Many tools tend to reinforce that kind of thinking. That extends to diagnostics and troubleshooting which is even worse. Again, we’re talking in generic terms here, but the business users tend to have a holistic focus on an issue relevant to the business rather than limited slices.
We want to open that visibility to the IT side of the house, and hopefully even bring those worlds together.
What are the major pains of IT Ops and how Rocana helps them to solve this pains?
Ops is really a combination of both horizontal and vertically focused groups. Some teams are tasked with building and/or running a complete vertical service like an airline check-in and boarding pass management system. Other teams are focused on providing horizontal services such as data center infrastructure with limited knowledge or visibility into what those tens of thousands of boxes do.
Let’s say customers can’t check-in and get their boarding passes on their mobile devices. The application ops team finds that a subset of application servers keep losing connections to database servers holding reservations, but there’s no reason why, and nothing has changed. Meanwhile, the networking team may be swapping out some bad optics in a switch that has been flaky thinking that traffic is being properly routed over another link. Connecting these two dots within a large organization can be maddeningly time consuming - if it even happens at all - leading to some of the high profile outages we see in the news.
Our focus is really on providing a shared view over all systems under management. Each team still has their focused view on their part of the infrastructure in Rocana Ops, but in this example, the application ops team could also trace the failing connections through to link state changes on switches and correlate that with traffic changes in network flow patterns.
Could you describe Rocana’s main architecture?
Following the data flow through Rocana Ops, data is first collected by one of the included data collection methods. These include native syslog, file and directory tailing, netflow and IPFIX, Windows event log, application and host metrics collection, and native APIs for popular programming languages, as well as REST.
As data is collected, basic parsing is performed turning all data into semi-structured events that can be easily correlated regardless of their source. These events flow into an event data bus forming a real-time stream of the cleansed, normalized events. All of the customer-configurable and extensible transformation, model building and application (for features like anomaly detection), complex event processing, triggering, alerting, and other data services are real time stream-oriented services.
Rocana's General Architecture (Courtesy of Rocana)
A number of representations of the data are stored in highly optimized data systems for natural language search, query, analysis, and visualization in the Rocana Ops application. Under the hood, Rocana Ops is built on top of a number of popular open source systems, in open formats, that may be used for other applications and systems making lock-in a non-issue for customers.
Every part of Rocana’s architecture - but notably the collection, processing, storage, and query systems - is a parallelized, scale-out system, with no single point of failure.
What are the basic or general requirements needed for a typical Rocana deployment?
Rocana Ops is really designed for large deployments as mentioned earlier - 10s to 100s of terabytes per day.
Typically customers start with a half-rack (10) of 2 x 8+ core CPUs, 12 x 4 or 8TB SATA II drives, 128 to 256GB RAM, and a 10Gb network (typical models are the HP DL380 G9 or Dell R730xd) or the cloud-equivalent (Amazon d2.4xl or 8xl) for the data warehouse nodes.
A deployment this size easily handles in excess of a few terabytes per day of data coming into the system from tens to hundreds of thousands of sources.
As customers onboard more data sources or want to retain more data, they begin adding nodes to the system. We have a stellar customer success team that helps customers plan, deploy, and service Rocana Ops, so customers don’t need to worry about finding “unicorn” staff.
What are then, the key functional differentiators of Rocana?
Customers pick Rocana for a few reasons: scale, openness, advanced data management features, and cost. We’ve talked a lot about scale already, but openness is equally critical.
Enterprises, frankly, are done with being locked into proprietary formats and vendors holding their data hostage. Once you’re collecting all of this data in one place, customers often want to use Rocana Ops to provide real time streams to other systems without going through expensive translations or extractions.
Another major draw is the absence of advanced data management features in other systems such as record-level role-based access control, data lifecycle management, encryption, and auditing facilities. When your log events potentially contain personally identifiable information (PII) or other sensitive data, this is critical.
Finally, operating at scale is both a technology and economic issue. Rocana Ops’ licensing model is based on users rather than nodes or data captured by the system freeing customers to think about how best to solve problems rather than perform license math.
Recently, you've released Rocana Ops 2.0, could you talk about these release’s new capabilities?
Rocana Ops 2.0 is really exciting for us.
We’ve added Rocana Reflex, which incorporates complex event processing and orchestration features allowing customers to perform actions in response to patterns in the data. Actions can be almost anything you can think of including REST API calls to services and sending alerts.
Reflex is paired with a first responder experience designed to help ops teams to quickly triage alerts and anomalies, understand potential causes, collaborate with one another, and spot patterns in the data.
One of the major challenges customers face in deploying dynamic next-generation platforms is operational support, so 2.0 includes first-class support for Pivotal CloudFoundry instrumentation and visibility. Those are just a small sample of what we’ve done. It’s really a huge release!
How does Rocana interact with the open source community, especially the Apache Hadoop project?
Open source is core to what we do at Rocana, and it’s one of the reasons we’re able to do a lot of what we do in Rocana Ops.
We’re committed to collaborating with the community whenever possible. We’ve open sourced parts of Rocana Ops where we believe there’s a benefit to the community (like
Osso - A modern standard for event-oriented data
). As we build with projects like
, we look closely at places where we can contribute features, insight, feedback, testing, and (most often) fixes.
The vast majority of our engineers, customer success, and sales engineers come from an open source background, so we know how to wear multiple hats.
Foremost is always our customers’ success, but it’s absolutely critical to help advance the community along where we are uniquely positioned to help. This is an exciting space for us, and I think you’ll see us doing some interesting work with the community in the future.
Finally, what is in your opinion the best and geekiest song ever?
Now you’re speaking my language; I studied music theory.
Lateralus by Tool for the way it plays with the
and other math without it being gimmicky or unnatural.
A close second goes to Aphex Twin’s Equation, but I won’t ruin that for you.
DrivenBI Helps Companies Drive Analytics to the Next Level
December 21, 2016
Privately held company
was formed in 2006 by a group of seasoned experts and investors in the business intelligence (BI) market in Taiwan and the United States. Currently based in Pasadena, California, the company has been steadily growing in the ten years since, gaining more than 400 customers in both the English and Chinese markets.
Led by founder and CEO Ben Tai (previously VP of global services with the former BusinessObjects, now part of SAP), DrivenBI would be considered part of what I call a new generation of BI and analytics solutions that is changing the analytics market panorama, especially in the realm of cloud computing.
A couple of weeks ago, I had the opportunity to speak with DrivenBI’s team and to have a briefing and demonstration, most of it in regards to their current analytics offerings and the company’s business strategy and industry perspective, all of which I will share with you here.
How DrivenBI Drives BI
DrivenBI’s portfolio is anchored by SRK, DrivenBI’s native cloud self-service BI platform and collaboration hub.
SRK provides a foundation for sourcing and collecting data in real time within a collaborative environment. Being a cloud platform, SRK can combine the benefits of a reduced IT footprint with a wide range of capabilities for efficient data management.
The SRK native cloud-centralized self-service BI solution offers many features, including:
the ability to blend and work with structured and unstructured data using industry-standard data formats and protocols;
a centralized control architecture providing security and data consistency across the platform;
a set of collaboration features to encourage team communication and speed decision making; and agile reporting and a well-established data processing logic.
SRK’s collaborative environment featuring data and information sharing between users within a centralized setting allows users to maintain control over every aspect and step of the BI and analytics process (figure 1).
Figure 1. DrivenBI’s SRK self-driven and collaborative platform (courtesy of DrivenBI)
DrivenBI: Driving Value throughout Industries, Lines of Business, and Business Roles
One important aspect of the philosophy embraced by DrivenBI has to do with its design approach, providing, within the same platform, valuable services across the multiple functional areas of an organization, including lines of business such as finance and marketing, inventory control, and resource management, as well as across industries such as fashion, gaming, e-commerce, and insurance.
Another element that makes DrivenBI an appealing offering is its strategic partnerships, specifically with Microsoft Azure and Salesforce.com. DrivenBI has the ability to integrate with both powerhouse cloud offerings.
I had the opportunity to play around a bit with DrivenBI’s platform, and I was impressed with the ease of use and intuitive experience in all stages of the data analytics process, especially for dynamic reporting and dashboard creation (figure 2).
Figure 2. DrivenBI’s SRK dashboard (courtesy of DrivenBI)
Other relevant benefits of the DrivenBI platform that I observed include:
elimination/automation of some heavy manual processes;
analysis and collaboration capabilities, particularly relevant for companies with organizational and geographically distributed operations, such as widespread locations, plants, and global customers;
support for multiple system data sources, including structured operational data, unstructured social media sources, and others.
As showcased in its business-centered approach and design, DrivenBI is one of a new generation of BI and analytics offerings that enable a reduced need for IT intervention in comparison to peer solutions like
. These new-generation solutions are offered through cloud delivery, a method that seems to suit analytics and BI offerings and their holistic take on data collection well. By replacing expensive IT-centric BI tools, the DrivenBI cloud platform is useful for replacing or minimizing the use of complex spreadsheets and difficult analytics processes.
DrivenBI’s Agile Analytics
My experience with DrivenBI was far more than “interesting.” DrivenBI is a BI software solution that is well designed and built, intuitive, and offers a fast learning curve. Its well-made architecture makes the solution easy to use and versatile. Its approach—no spreadsheets, no programming, no data warehouse—is well-suited to those organizations that truly need agile analytics solutions. Still, I wonder how this approach fits with large BI deployments that require robust data services, especially in the realms of merging traditional analytics with big data and Internet of Things (IoT) strategies.
To sample what DrivenBI has to offer, I recommend checking out its SRK demo:
(Originally published on
Yep, I’m Writing a Book on Modern Data Management Platforms
December 06, 2016
Over the past couple of years, I have spent lots of time talking with vendors, users, consultants, and other analysts, as well as plenty of people from the data management community about the wave of new technologies and continued efforts aimed at finding the best software solutions to address the increasing number of issues associated with managing enterprise data. In this way, I have gathered much insight on ways to exploit the potential value of enterprise data through efficient analysis for the purpose of “gathering important knowledge that informs better decisions.
Many enterprises have had much success in deriving value from data analysis, but a more significant number of these efforts have failed to achieve much, if any, useful results. And yet other users are still struggling with finding the right software solution for their business data analysis needs, perhaps confused by the myriad solutions emerging nearly every single day.
It is precisely in this context that I’ve decided to launch this new endeavor and write a book that offers a practical perspective on those new data platform deployments that have been successful, as well as practical use cases and plausible design blueprints for your organization or data management project. The information, insight, and guidance that I will provide is based on lessons I’ve learned through research projects and other efforts examining robust and solid data management platform solutions for many organizations.
In the following months, I will be working hard to deliver a book that serves as a practical guide for the implementation of a successful modern data management platform.
The resources for this project will require crowdfunding efforts, and here is where your collaboration will be extremely valuable.
There are several ways in which you can participate:
Participating in our
Data Management Platforms survey
to obtain a nice discount right off the bat)
Pre-ordering the book (soon, I’ll provide you with details on how to pre-order your copy, but in the meantime, you can show your interest by signing up at the link below)
Providing us with information about your own successful enterprise use case, which we may use in the book
To let us know which of these options best fits with your spirit of collaboration, and to receive the latest updates on this book, as well as other interesting news, you just need to sign up to our
email list here
. Needless to say, the information you provide will be kept confidential and used only for the purpose of developing this book.
In the meantime, I’d like to leave you with a brief synopsis of the contents of this book, with more details to come in the near future:
New Data Management Platforms
Discovering Architecture Blueprints
About the Book
What Is This Book About?
This book is the result of a comprehensive study into the improvement, expansion, and modernization of different types of architectures, solutions, and platforms to address the need for better and more effective ways of dealing with increasing and more complex volumes of data.
In conducting his research for the book, the author has made every effort to analyze in detail a number of successful modern data management deployments as well as the different types of solutions proposed by software providers, with the aim of providing guidance and establishing practical blueprints for the adoption and/or modernization of existing data management platforms.
These new platforms have the capability of expanding the ability of enterprises to manage new data sources—from ingestion to exposure—more accurately and efficiently, and with increased speed.
The book is the result of extensive research conducted by the author examining a wide number of real-world, modern data management use cases and the plethora of software solutions offered by various software providers that have been deployed to address them. Taking a software vendor‒agnostic viewpoint, the book analyzes what companies in different business areas and industries have done to achieve success in this endeavor, and infers general architecture footprints that may be useful to those enterprises looking to deploy a new data management platform or improve an already existing one.
Disrupting the data market: Interview with EXASOL’s CEO Aaron Auld
November 14, 2016
Processing data fast and efficiently has become a never ending race. With the increasing need for data consumption by companies comes along a never ending “
need for speed
” for processing data and consequently, the emergence of new generation of database software solutions that emerging to fulfill this need for high performance data processing.
These new database management systems that incorporate novel technology provide high speed, and more efficient access and processing of large bulks of data.
is one of this disruptive "new" database solution. Headquartered out of Nuremberg, Germany and with offices around the globe, EXASOL has worked hard to bring a fresh, new approach to the data analytics market via the offering of a world-class database solution.
In this interview, we took the opportunity to chat with EXASOL’s Aaron Auld about the company and its innovative database solution.
Aaron Auld is the Chief Executive Officer as well as the Chairman of the Board at EXASOL, positions he has held since July 2013. He was made a board member in 2009.
As CEO and Chairman, Aaron is responsible for the strategic direction and execution of the company, as well as growing the business internationally.
Aaron embarked on his career back in 1996 at MAN Technologie AG, where he worked on large industrial projects and M&A transactions in the aerospace sector. Subsequently, he worked for the law firm Eckner-Bähr & Colleagues in the field of corporate law.
After that, the native Brit joined Océ Printing Systems GmbH as legal counsel for sales, software, R&D and IT. He then moved to Océ Holding Germany and took over the global software business as head of corporate counsel. Aaron was also involved in the IPO (Prime Standard) of Primion Technology AG in a legal capacity, and led investment management and investor relations.
Aaron studied law at the Universities of Munich and St. Gallen. Passionate about nature, Aaron likes nothing more than to relax by walking or sailing and is interested in politics and history.
So, what is EXASOL and what is the story behind it?
EXASOL is a technology vendor that develops a high-performance in-memory analytic database that was built from the ground up to analyze large volumes of data extremely fast and with a high degree of flexibility.
The company was founded back in the early 2000's in Nuremberg, Germany, and went to market with the first version of the analytic database in 2008.
Now in its sixth generation, EXASOL continues to develop and market the in-memory analytic database working with organizations across the globe to help them derive business insight from their data that helps them to drive their businesses forward.
How does the database work? Could you tell us some of the main features?
We have always focused on delivering an analytic database ultra-fast, massively scalable analytic performance. The database combines in-memory, columnar storage and massively parallel processing technologies to provide unrivaled performance, flexibility and scalability.
The database is tuning-free and therefore helps to reduce the total cost of ownership while enabling users to solve analytical tasks instead of having to cope with technical limits and constraints.
With the recently-announced version 6, the database now offers a data virtualization and data integration framework which allows users to connect to more data sources than ever before.
Also, alongside out-of-the-box support for
, users can integrate the analytics programming language of their choice and use it for in-database analytics.
Especially today, speed of data processing is important. I’ve read EXASOL has taken some benchmarks in this regard. Could you tell us more about it?
One of the truly independent set of benchmark tests available is offered by the
Transactional Processing Council (TPC)
. A few years ago we decided to take part in the TPC-H benchmark and ever since we have topped the tables in terms of not only performance (i.e. analytic speeds) but also in terms of price/performance (i.e. cost aligned with speed) when analyzing data volumes ranging from 100GB right up to 100TB. No other database vendor comes close.
The information is available online
One of the features of EXASOL is that, if I’m not mistaken, is deployed on commodity hardware. How does EXASOL’s design guarantee optimal performance and reliability?
Offering flexible deployment models in terms of how businesses can benefit from EXASOL has always been important to us at EXASOL.
Years ago, the concept of the data warehouse appliance was talked about as the optimum deployment model, but in most cases it meant that vendors were forcing users to use their database on bespoke hardware, on hardware that then could not be re-purposed for any other task. Things have changed since: while the appliance model is still offered, ours is and always has been one that uses commodity hardware.
Of course, users are free to download our software and install it on their own hardware too.
It all makes for a more open and transparent framework where there is no vendor lock-in, and for users that can only be a good thing. What’s more, because the hardware and chip vendors are always innovating, when a new processor or server is released, users only stand to benefit as they will see yet even faster performance when they run EXASOL on that new technology.
We recently discussed this in a
promotional video for Intel
Price point related, is it intended only for large organizations, what about medium and small ones with needs for fast data processing?
We work with organizations both large and small. The common denominator is always that they have an issue with their data analytics or incumbent database technology and that they just cannot get answers to their analytic queries fast enough.
Price-wise, our analytic database is extremely competitively priced and we offer organizations of all shapes and sizes to use our database software on terms that best fit their own requirements, be that via a perpetual license model, a subscription model, a bring-your-own license model (BYOL) – whether on-premises or in the cloud.
What would be a minimal configuration example? Server, user licensing etc.?
Users can get started today with the EXASOL Free Small Business Edition. It is a single-node only edition of the database software and users can pin up to 200GB of data into RAM.
We believe this is a very compelling advantage for businesses that want to get started with EXASOL.
Later, when data volumes grow and when businesses want to make use of advanced features such as in-database analytics or data virtualization, users can then upgrade to the EXASOL Enterprise Cluster Edition which offers much more in terms of functionality.
Regarding big data requirements, could you tell us some of the possibilities to integrate or connect EXASOL with big data sources/repositories such as Hadoop and others?
EXASOL can be easily integrated into every IT infrastructure. It is SQL-compliant and, is compatible with leading BI and ETL products such as
, and provides the most flexible
connector on the market.
Furthermore, through an extensive data virtualization and integration framework, users can now analyze data from more sources more easily and faster than ever before.
Recently, the company announced that EXASOL is now available on Amazon. Could you tell us a bit more about the news? EXASOL is also available on Azure, right?
As more and more organizations are deploying applications and their systems in the cloud, it’s therefore important that we can allow them to use EXASOL in the cloud, too. As a result, we are now available on
Amazon Web Services
as well as
. What’s more, we continue to offer our own cloud and hosting environment, which we call
Finally, on a more personal topic. Being a Scot who lives in Germany, would you go for a German beer or a Scottish whisky?
That’s an easy one. First enjoy a nice German beer (ideally, one from a Munich brewery) before dinner, then round the evening off with by savoring a nice Scottish whisky. The best of both worlds.
Logging challenges for containerized applications: Interview with Eduardo Silva
cloud native con
November 02, 2016
Next week, another edition of Cloud Native Con conference will take place in the great city of Seattle. One of the key topics in this edition has to do with containers, a software technology that is enabling and easing the development and deployment of applications by encapsulating them for further deployment with only a simple process.
In this installment, we took the opportunity to chat with Eduardo Silva a bit about containers and his upcoming session:
Logging for Containers
which will take place during the conference.
Eduardo Silva is a principal Open Source developer at
Treasure Data Inc
where he currently leads the efforts to make logging ecosystem more friendly in Embedded, Containers and Cloud services.
He also directs the Monkey Project organization which is behind the Open Source projects
Monkey HTTP Server
A well known speaker, Eduardo has been speaking in events across South America and in recent
events in the US, Asia and Europe.
Thanks so much for your time Eduardo!
What is a container and how is it applied specifically in Linux?
When deploying applications, is always desired to have full control over given resources, likely we would like to have this application isolated as much as possible, Containers is the concept of package an application with it entire runtime environment in an isolated way.
In order to accomplish this, from an operating system level, Linux provide us with two features that lead to implement the concept of containers:
cgroups (control goups)
allow us to limit the resource usage for one or more processes, so you can define how much CPU or memory a program(s) might use when running.
on the other hand
(associated to users and groups) allow us to define restricted access to specific resources such as mount points, network devices and IPC within others.
For short, if you like programming, you can implement your own containers with a few system calls. Since this could be a tedious work from an operability perspective, there are libraries and services that abstract the whole details and let you focus on what really matters: deployment and monitoring.
So, what is the difference between a Linux Container and, for example a virtual machine?
A container aims to be a granular unit of an application and its dependencies, it's one or a group of processes. A Virtual Machine runs a whole Operating System which you might guess should be a bit heavy.
So, if we ought to define some advantages of containers versus virtualization, could you tell us a couple of advantages and disadvantages of both?
There're many differences… pros and cons, so taking into account our Cloud world-environment when you need to deploy applications at scale (and many times just on-demand), containers provide you the best choice, deploy a container just takes a small fraction of a second, while deploying a Virtual Machine may take a few seconds and a bunch of resources that most likely will be wasted.
Due to the opportunities it brings, there are some container projects and solutions out there such as LXC, LXD or LXCFS. Could you share with us what is the difference between them? Do you have one you consider your main choice and why?
Having the technology to implement containers is the first step, but as I said before, not everybody would like to play with system calls, instead different technologies exists to create and manage containers.
provide the next level of abstraction to manage containers,
is a user-space file system for containers (works on top of
Since I don't play with containers at low level, I don't have a strong preference.
And what about solutions such as Docker, CoreOS or Vagrant? Any take on them?
is the big player nowadays, it provide good security and mechanisms to manage/deploy containers.
have a prominent container engine caller Rocket (rkt), I have not used it but it looks promising in terms of design and implementation, orchestration services like
are already providing support for it.
You are also working on a quite interesting project called Fluent-Bit. What is the project about?
I will give you a bit of context. I'm part of the open source engineering team at Treasure Data, our primary focus in the team is to solve data collection and data delivery for a wide range of use cases and integrations, to accomplish this,
exists. It's a very successful project which nowadays is solving Logging challenges in hundreds of thousands of systems, we are very proud of it.
A year ago we decided to dig into the embedded Linux space, and as you might know the capacity of these devices in terms of CPU, Memory and Storage are likely more restricted than a common server machine.
Fluentd is really good but it also have its technical requirements, it's written in a mix of Ruby + C, but having Ruby in most of embedded Linux could be a real challenge or a blocker. That's why a new solution has born:
Fluent Bit is a data collector and log shipper written 100% in C, it have a strong focus on Linux but it also works on BSD based systems, including OSX/MacOS. Its architecture have been designed to be very lightweight and provide high performance from collection to distribution.
Some of it features are:
Input / Output plugins
Event driven (async I/O operations)
Despite it was initially conceived for embedded Linux, it has evolved, gaining features that makes it cloud friendly without loss of performance and lightweight goals.
If you are interested into collect data and deliver it to somewhere, Fluent Bit allows you to do that through the built-in plugins, some of them are:
Forward: Protocol on top of TCP, get data from Fluentd or Docker Containers
Head: read initial chunks of bytes from a file.
Health: check remote TCP server healthy.
kmsg: read Kernel log messages.
CPU: collect CPU metrics usage, globally and per core.
Mem: memory usage of the system or from a specific running process.
TCP: expect for JSON messages over TCP.
Treasure Data (our cloud analytics platform)
NATS Messaging Server
So as you can see, with Fluent Bit it would be easy to aggregate Docker logs into
, monitor your current OS resources usage or collect JSON data over the network (TCP) and send it to your own HTTP end-point.
The use-cases are multiple and this is a very exciting tool, but not just from an end user perspective, but also from a technical implementation point of view.
The project is moving forward pretty quickly an getting exceptional new features such as support to write your own plugins in Golang! (yes, C -> Go), isn't it neat ?
You will be presenting at CNCF event CloudNativeCon & KubeCon in November. Can you share with us a bit of what you will be presenting about in your session?
I will share our experience with Logging in critical environments and dig into common pains and best practices that can be applied to different scenarios.
It will be everything about Logging in the scope of (but not limited to) containers, microservices, distributed Logging, aggregation patterns, Kubernetes, Open Source solutions for Logging and demos.
I'd say that everyone who's a sysadmin, devops or developer, will definitely benefit from the content of this session, Logging "is" and "required" everywhere.
Finally, on a personal note. Which do you consider to be the geekiest songs of this century?
That's a difficult question!
I am not an expert on geek music but I would vouch for Spybreak from Propellerheads (Matrix).
Subscribe to our Newsletter
View previous campaigns.
Machine Learning and Cognitive Systems, Part 2: Big Data Analytics
In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the...
Machine Learning and Cognitive Systems, Part 1: A Primer
IBM’s recent announcements of three new services based in Watson technology make it clear that there is pressure in the enterprise softw...
TIBCO Spotfire Aims for a TERRific Approach to R
terrific /təˈrɪfɪk/ adjective 1. very great or intense: a terrific noise 2. (informal) very good; excellent: a terrific singe...
Machine Learning and Cognitive Systems, Part 3: A ML Vendor Landscape
Tweet In parts One and Two of this series I gave a little explanation about what Machine Learning is and some of its potenti...
IT Sapiens, for Those Who Are Not
Perhaps one of the most refreshing moments in my analyst life is when I get the chance to witness the emergence of new tech companies—innova...
Tweets by @jgptec
altiscale data platform
altiscale insight cloud
Amazon Web Services
channel data management
cloud native con
data of things
decision support systems
Copyright © 2016 D of Things. Powered by