Skip to main content

Logging challenges for containerized applications: Interview with Eduardo Silva

Next week, another edition of Cloud Native Con conference will take place in the great city of Seattle. One of the key topics in this edition has to do with containers, a software technology that is enabling and easing the development and deployment of applications by encapsulating them for further deployment with only a simple process.

In this installment, we took the opportunity to chat with Eduardo Silva a bit about containers and his upcoming session: Logging for Containers which will take place during the conference.

Eduardo Silva is a principal Open Source developer at Treasure Data Inc where he currently leads the efforts to make logging ecosystem more friendly in Embedded, Containers and Cloud services.

He also directs the Monkey Project organization which is behind the Open Source projects Monkey HTTP Server and Duda I/O.

A well known speaker, Eduardo has been speaking in events across South America and in recent Linux Foundation events in the US, Asia and Europe.

Thanks so much for your time Eduardo!

What is a container and how is it applied specifically in Linux?

When deploying applications, is always desired to have full control over given resources, likely we would like to have this application isolated as much as possible, Containers is the concept of package an application with it entire runtime environment in an isolated way.
In order to accomplish this, from an operating system level, Linux provide us with two features that lead to implement the concept of containers: cgroups and namespaces.

  • cgroups (control goups) allow us to limit the resource usage for one or more processes, so you can define how much CPU or memory a program(s) might use when running.
  • on the other hand namespaces (associated to users and groups) allow us to define restricted access to specific resources such as mount points, network devices and IPC within others.

For short, if you like programming, you can implement your own containers with a few system calls. Since this could be a tedious work from an operability perspective, there are libraries and services that abstract the whole details and let you focus on what really matters: deployment and monitoring.

So, what is the difference between a Linux Container and, for example a virtual machine?

A container aims to be a granular unit of an application and its dependencies, it's one or a group of processes. A Virtual Machine runs a whole Operating System which you might guess should be a bit heavy.

So, if we ought to define some advantages of containers versus virtualization, could you tell us a couple of advantages and disadvantages of both?

There're many differences… pros and cons, so taking into account our Cloud world-environment when you need to deploy applications at scale (and many times just on-demand), containers provide you the best choice, deploy a container just takes a small fraction of a second, while deploying a Virtual Machine may take a few seconds and a bunch of resources that most likely will be wasted.
(post-ads)

Due to the opportunities it brings, there are some container projects and solutions out there such as LXC, LXD or LXCFS. Could you share with us what is the difference between them? Do you have one you consider your main choice and why?

Having the technology to implement containers is the first step, but as I said before, not everybody would like to play with system calls, instead different technologies exists to create and manage containers. LXC and LXD provide the next level of abstraction to manage containers, LXCFS is a user-space file system for containers (works on top of Fuse).
Since I don't play with containers at low level, I don't have a strong preference.

And what about solutions such as Docker, CoreOS or Vagrant? Any take on them?

Docker is the big player nowadays, it provide good security and mechanisms to manage/deploy containers. CoreOS have a prominent container engine caller Rocket (rkt), I have not used it but it looks promising in terms of design and implementation, orchestration services like Kubernetes are already providing support for it.

You are also working on a quite interesting project called Fluent-Bit. What is the project about?

I will give you a bit of context. I'm part of the open source engineering team at Treasure Data, our primary focus in the team is to solve data collection and data delivery for a wide range of use cases and integrations, to accomplish this, Fluentd exists. It's a very successful project which nowadays is solving Logging challenges in hundreds of thousands of systems, we are very proud of it.
A year ago we decided to dig into the embedded Linux space, and as you might know the capacity of these devices in terms of CPU, Memory and Storage are likely more restricted than a common server machine.
Fluentd is really good but it also have its technical requirements, it's written in a mix of Ruby + C, but having Ruby in most of embedded Linux could be a real challenge or a blocker. That's why a new solution has born: Fluent Bit.
Fluent Bit is a data collector and log shipper written 100% in C, it have a strong focus on Linux but it also works on BSD based systems, including OSX/MacOS. Its architecture have been designed to be very lightweight and provide high performance from collection to distribution.
Some of it features are:

  • Input / Output plugins
  • Event driven (async I/O operations)
  • Built-in Metrics
  • Security: SSL/TLS
  • Routing
  • Buffering
  • Fluentd Integration

Despite it was initially conceived for embedded Linux, it has evolved, gaining features that makes it cloud friendly without loss of performance and lightweight goals.
If you are interested into collect data and deliver it to somewhere, Fluent Bit allows you to do that through the built-in plugins, some of them are:

  • Input
    • Forward: Protocol on top of TCP, get data from Fluentd or Docker Containers
    • Head: read initial chunks of bytes from a file.
    • Health: check remote TCP server healthy.
    • kmsg: read Kernel log messages.
    • CPU: collect CPU metrics usage, globally and per core.
    • Mem: memory usage of the system or from a specific running process.
    • TCP: expect for JSON messages over TCP.
  • Output
    • Elasticsearch database
    • Treasure Data (our cloud analytics platform)
    • NATS Messaging Server
    • HTTP end-point

So as you can see, with Fluent Bit it would be easy to aggregate Docker logs into Elasticsearch, monitor your current OS resources usage or collect JSON data over the network (TCP) and send it to your own HTTP end-point.
The use-cases are multiple and this is a very exciting tool, but not just from an end user perspective, but also from a technical implementation point of view.
The project is moving forward pretty quickly an getting exceptional new features such as support to write your own plugins in Golang! (yes, C -> Go), isn't it neat ?

You will be presenting at CNCF event CloudNativeCon & KubeCon in November. Can you share with us a bit of what you will be presenting about in your session?

I will share our experience with Logging in critical environments and dig into common pains and best practices that can be applied to different scenarios.
It will be everything about Logging in the scope of (but not limited to) containers, microservices, distributed Logging, aggregation patterns, Kubernetes, Open Source solutions for Logging and demos.
I'd say that everyone who's a sysadmin, devops or developer, will definitely benefit from the content of this session, Logging "is" and "required" everywhere.

Finally, on a personal note. Which do you consider to be the geekiest songs of this century?

That's a difficult question!
 I am not an expert on geek music but I would vouch for Spybreak from Propellerheads (Matrix).



Comments

Popular posts from this blog

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the industry of enterprise software, and serve as the basis for many other advances in the incorporation of other technologies related to artificial intelligence and cognitive computing. I also mentioned briefly how machine language is becoming increasingly important for many companies in the business intelligence and analytics industry. In this post I will discuss further the importance that machine learning already has and can have in the analytics ecosystem, especially from a Big Data perspective. Machine learning in the context of BI and Big Data analytics Just as in the lab, and other areas, one of the reasons why machine learning became extremely important and useful in enterprise software is its potential to deal not just with huge amounts of data and extract knowledge from it—which can somehow be addressed with disciplines such as data

Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

The range of what we think and do is limited by what we fail to notice. And because we fail to notice that we fail to notice there is little we can do to change until we notice how failing to notice shapes our thoughts and deeds. —R.D. Laing Amid the hype surrounding technology trends such as big data, cloud computing, or the Internet of Things, for a vast number of organizations, a quiet, persistent question remains unanswered: how do we ensure efficiency and control of our business operations? Business process efficiency and proficiency are essential ingredients for ensuring business growth and competitive advantage. Every day, organizations are discovering that their business process management (BPM) applications and practices are insufficient to take them to higher levels of effectiveness and control. Consumers of BPM technology are now pushing the limits of BPM practices, and BPM software providers are urging the technology forward. So what can we expect from the next

Teradata Open its Data Lake Management Strategy with Kylo: Literally

Still distilling good results from the acquisition of former consultancy company Think Big Analytics , Teradata , a powerhouse in the data management market took one step further to expand its data management stack and to make an interesting contribution to the open source community. Fully developed by the team at Think Big Analytics, in March of 2017 the company launched Kylo –a full data lake management solution– but with an interesting twist: as a contribution to the open source community. Offered as an open source project under the Apache 2.0 license Kylo is, according to Teradata, a new enterprise-ready data lake management platform that enables self-service data ingestion and preparation, as well the necessary functionality for managing metadata, governance and security. One appealing aspect of Kylo is it was developed over an eight year period, as the result of number of internal projects with Fortune 1000 customers which has enabled Teradata to incorporate several be