Saturday, 17 December 2016

Part3: Build your own monitoring system using Riemann,Graphite,Collectd.

In previous article , part2, we have discussed about Graphite integration with riemann. In this article I will give overview of Collectd and some advanced stream processing options in riemann.

Collectd Overview

Collectd is a daemon and gathers metrics from various sources, e.g. the operating system, applications, logfiles and external devices, and stores this information or makes it available over the network.

Collectd itself is a big item to discuss and there are lot of things you can achieve with it.But here I will discuss only the area of our interest. What we will do is we will tell collectd to send the metrics collected by it to the graphite server!!!!!!!!! pretty amazing right???

Collectd installation and Plugin concept

Download collectd from this link according to the flavour of your Linux distribution.

For my case steps are:

1) sudo apt-get install collectd 
2) service start collectd

Done. Your collectd is installed and running.
Now lets take a look at very important config file related to collectd.
In my case it is located at /etc/collectd/collectd.conf
If you will open this file you can observer some list of plugins and configuration related to each plugin.

In collectd we have concept of plugins. We need different types of plugin to fetch different types of metrics and doing monitoring activity.
  
Above is plugin to fetch cpu related info of the system on which collectd is running. We will se outcome of this plugin very soon on Graphite.

Now you must have figured out that if we want collectd to forward this metrics to graphite than we must be having some plugin for that. Ohhhh yeah!!!! your guess is right. We do have a plugin for it.





















two things we are doing here. First we are defining write_graphite plugin and second we are providing config for that plugin. The host name of graphite server is localhost as it is installed in same VM. All collectd related graphs will be rendered in graphite under prefix name we have set here. After adding graphite related plugin save the file and restart the collectd service.


Below is the outcome on Graphite dashboard for collectd:

























Here I am stopping my discussion for collectd and moving towards last and important section, riemann stream processing.

Riemann stream processing examples

I will show you few stream processing examples in riemann.

1) Send email based on service status

Below is the configuration for sending mail from Gmail. You can do similar things for your SMTP.
Add this configuration in your riemann.config file and restart riemann.
 (def email (mailer {:host "smtp.gmail.com"  
             :port 465  
             :ssl true  
             :tls true  
             :user "myaccout@gmail.com"  
             :pass "mypassword"  
             :from "myaccout@gmail.com"}))  
 (streams  
   (where (state "critical")  
    (email "xyz@gmail.com")))  

two things we are doing here.
1) Declaring email related configuration. This could be vary depending on the SMTP provider.
2) I am defining one stream rule such that if state of any service is critical than send out mail to some email id.

Lets send "critical" state from java code for our "fridge" service created in part1.
 RiemannClient c = RiemannClient.tcp("localhost", 5555);  
     c.connect();  
     c.event().  
         service("fridge").  
         state("critical").  
         metric(10).  
         tags("appliance", "cold").  
         send().  
         deref(5000, java.util.concurrent.TimeUnit.MILLISECONDS);  

Lets see the received mail in xyz@gmail.com













This the default mail template used by riemann. You can change the format and details of email. I am leaving that part for your assignment.


2) Email the exception 
Add  below stream processing rule in your riemann.config file.
 (streams  
   (where (service "exception-alert")  
    (email "xyz@gmail.com")))  

Lets send some exception from java code:
 RiemannClient c = RiemannClient.tcp("localhost", 5555);  
     c.connect();  
     try {  
       // some business logic  
       throw new NullPointerException("NullPointer exception in your system..Somebody will be in trouble!!! ");  
     } catch (Exception e) {  
       c.event().  
           service("exception-alert").  
           state(e.getLocalizedMessage()). // you can send full stacktrace also  
           tags("error", "exception", "failure").  
           send().  
           deref(5000, java.util.concurrent.TimeUnit.MILLISECONDS);  
     }  

Lets see the received mail in xyz@gmail.com:












what else you can do ?
1) Send email alert if some VM/service is down.
2) Filter and process stream depending on hostname,service name, metric value, service state, tag values etc... and perform some actions based on that.
3) You can set threshold values for metrics received and perform some actions if threshold value is crossed. e.g: VM cpu is very high, above 95%, some business specific constraint value is violated...

These are just few examples I have given. Check out the link I have posted at the end of article for riemann.

Below is the updated architecture diagram:



























Collectd daemon will send all system related generic metrics to Graphite.


In this three series of article I have just scratched the surface for this area. There are thousand different things and possibilities you can think and achieve with this monitoring framework.

Below are the useful links for different types of config,plugins and integration of other systems you can do with riemann,Graphite and collectd. I have just explained 3% of entire.Rest things you can add as per your need and use case of system.

Riemann:
http://riemann.io/clients.html
http://riemann.io/howto.html

Graphite:
https://graphiteapp.org/#integrations
http://graphite.readthedocs.io/en/latest/tools.html
http://grafana.org/

Collectd:
https://collectd.org/
https://collectd.org/wiki/index.php/Plugin

This is the last article of this series.
Hope you have enjoyed!!!

Please post your comments and doubts!!!

Part2: Build your own monitoring system using Riemann,Graphite,Collectd.

In previous article , part1, we have discussed about riemann installation and basic event sending from java application. In this article we will see riemann integration with Graphite. First lets discuss about why we need Graphite.

1) In Riemann the events are stored only till the TTL-time to live- value , we need something to store the events for longer term so that in future we can look at the statistics and get idea about system behaviour at the time of failure or error scenarios.
2) Riemann is stateless system and the riemann-dash board is also stateless. There are ways to store the definition of created dashboards but still they will show live data only.

While graphite has 2 great capabilities.
1)Storage
2)Easy and powerful dashboard UI.

Lets start with Graphite installation.

Graphite Installation

In this link you can see 4 different way of installing graphite and other component needed for it.
I am using 4th way : Installing From Synthesize

Synthesize provides script which automates installation of all necessary dependency and components needed for Graphite. But Synthesize installation method available for Ubuntu 14.04 version only. If you are using some other version and flavour of Linux than you should go with other way.

Installation steps for my case:
$ cd synthesize
$ sudo ./install
that's it !! Done!!

open Graphite dashboard in browser:





Lets integrate graphite with riemann.

Riemann Graphite Integration

Open riemann.config file. In my case it is located at /etc/riemann/riemann.config



































The red part I have highlighted is the newly added config for Graphite.
First I have provided location of Graphite VM. In my case it is the same machine so I am using localhost.
The next thing is stream processing rules.You can specify which services you want to render on Graphite. Here I am declaring both "fridge" and "jvm.nonheap.memory" service to render on Graphite dashboard. We have created this services in part1.



















As you can see Graphite has capabilities to store the metrics so you can configure time/date range. One more thing you should observe is Graphite creates new folder structure for each "." present in the service name. Here jvm.nonheap.memory folder structure you can see. So that you can organise and send your metrics accordingly.


What next you can do 

Grafana is the next thing you can add in your framework. In simple word Grafana is a dashboard which can operate upon the data stored in Graphite storage. So basically Graphite will be there but Grafana can use Graphite's data and provide much much better and advanced dashboard options.
Explore more on this from here : http://docs.grafana.org/

Below is the updated architecture diagram:



























So riemann processes the events and pushes the metrics data associated with events to Graphite for storage. Garphite stores it and display it on the dashboard. Grafana can leverage the data present with Graphite for further rendering.

That's it for now...

part3 is my next article on this series.

Please post your comments and doubts!!!

Part1: Build your own monitoring system using Riemann,Graphite,Collectd.

In this 3 article series of build your own monitoring system for your application I will give basic idea about the different tools and technologies you can use and I will demonstrate how they communicate with each other. I will also explain "what will be the next step"  or "what else you can add in this monitoring framework".

The main components we will discuss are :
1)Riemann
2)Graphite
3)Collectd

I will also give overview for some surrounding tools and plugins that we can attach to the above main components.

For all this exercise I am using Ubuntu 14.04.5 LTS. You can use any other Linux distribution of your choice. I will show installation steps for Ubuntu , but for other Linux distribution, the steps are not much different and difficult. Many user guides are available on the internet. Windows users I am feeling sorry for you as Riemann and Graphite are not supported in windows as of now.

Lets start with Riemann.

Riemann

As per http://riemann.io/ ,what is riemann : A network event stream processing system, in Clojure

This is the theory. Let me break down the words and explain you as a developer point of view.

Network: Some server accessible on network (true... when you start riemann, it starts a server port and listen on that port for events , what is event ???)

Event : Some event in your application which has some data, some metrics associated with it, which can be stored and can be analysed.
E.g: User response time for some DB related operation, time to complete some ETL process , Number of times some operation performed ,Some memory or cpu related metrics.....

Stream processing : With both Network and Event now you have flow of events coming into the riemann. So definitely you are going to do something with that stream of events. We will write some rules for processing of the events. 

Clojure: The event processing rules in riemann we have to write as a Clojure script in riemann.config file.
Lets start with installation.

Riemann Installation

I will explain it in very brief as it is very simple!!!

1) Download riemann installer of your flavor(.deb, .rpm , .tar) from http://riemann.io/
2)Install it. For my case I just have to run the .deb file and its done, riemann installed.
3) Lets download some utility and dashboard for riemann. For that ruby will be needed.
    Run below commands to download dashboard and utility.
     - sudo apt install ruby
     -sudo gem install riemann-client riemann-tools riemann-dash

The installation part is done for riemann.

Starting riemann and riemann dashboarad

Lets start riemann and riemann-dash

1)  service riemann start  OR riemann OR riemann /etc/riemann/riemann.config
      You can start riemann using any one of the above commands.



























So you can see that riemann server has started listening on port 5555. This is the default port configuration. You can override that settings from riemann.config file.

2) Run command: riemann-dash 





So riemann dashboard has started on port 4567, Lets open it in the browser.





















Now press "Ctrl" and click on the area I have marked with red circle in the above image.
Once clicked , after that press "e" and you can see one popup window. Populate the values of that popup in the same way I did it in the below image.





















"true" in query section means display all the streams coming into riemann.
Once you click on "Apply", you can see some system events being sent to riemann.






















Now the next thing we are going to do is very important and interesting.

Riemann-Clients

We have riemann server up and running. Now we want to send our user defined events from our application to riemann for processing. In http://riemann.io/clients.html page you can see riemann has already client library present for C,C++,C#,Clojure,Elixir,Erlang,Go,Java,Lua,Node.js,OCaml,Perl,Python,Rubu,Scala and also supporting many other Tools, programs and plugins which can be integrated with riemann.

I am going to use riemann-java-client for this purpose and will send my own user defined events from my java program using the riemann java client library.

Pom file structure:
 <?xml version="1.0" encoding="UTF-8"?>  
 <project xmlns="http://maven.apache.org/POM/4.0.0"  
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">  
   <modelVersion>4.0.0</modelVersion>  
   <groupId>com.ali</groupId>  
   <artifactId>riemann</artifactId>  
   <version>1.0-SNAPSHOT</version>  
 <repositories>  
   <repository>  
     <id>clojars.org</id>  
     <url>http://clojars.org/repo</url>  
   </repository>  
 </repositories>  
   <dependencies>  
     <dependency>  
       <groupId>io.riemann</groupId>  
       <artifactId>riemann-java-client</artifactId>  
       <version>0.4.2</version>  
     </dependency>  
   </dependencies>  
 </project>  


Below example java class I have copied from riemann-java client git homepage only with some modifications.
 import io.riemann.riemann.client.RiemannClient;  
   
 import java.io.IOException;  
   
   
 public class BasicEventTest {  
   public static void main(String... args) throws IOException, InterruptedException {  
     RiemannClient c = RiemannClient.tcp("localhost", 5555); // creating connection object 
     c.connect();  // connecting to riemann server
     int temperature = 1;  
     while (true) {  
       if (temperature > 10) temperature = 1;  
       Thread.sleep(1000);  
       c.event().                // creating event
           service("fridge").  
           state("running").  
           metric(temperature++).  
           tags("appliance", "cold").  
           send().              // sending event
           deref(5000, java.util.concurrent.TimeUnit.MILLISECONDS);  
     }  
   }  
 }  
   

Let me explain what we are doing here:

1) creating riemann client which will connect to specified riemann server .
2) Creating event and giving service name, state, metric value, tags for extra metadata and sending it to riemann server.

Once you run this program , you can see the event on the riemann dashboard.

you can see that the temperature we are sending as a metric value is associated with service name "fridge". Once you click on the fridge service it will show more details associated with down below.
Here in our case value of temperature is changing every one second because we are sending new metric value every 1 second in our java program.
ttl value is 60 here so each event will be present in the riemann for 60 seconds only.  You can configure it in riemann.config file.
Remember this point as this will be the main point for need of Graphite  , In part2 I will explain need and benefit of graphite.

This service and metric was dummy. But you can think of any useful events and metrics inside your application and send them to riemann for processing.

Lets see one more example of Event sending from java application:
 import io.riemann.riemann.client.RiemannClient;  
   
 import java.io.IOException;  
 import java.lang.management.ManagementFactory;  
 import java.lang.management.MemoryMXBean;  
   
 public class BasicEventTest {  
   public static void main(String... args) throws IOException, InterruptedException {  
     RiemannClient c = RiemannClient.tcp("localhost", 5555);  
     c.connect();  
     while (true) {  
       Thread.sleep(1500);  
       MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean();  
       c.event().  
           service("jvm.nonheap.memory").  
           // state("running").   state is not needed here  
           metric(memoryMXBean.getNonHeapMemoryUsage().getUsed() / 1024).  
           tags("jvm nonheap used memory").  
           send().  
           deref(5000, java.util.concurrent.TimeUnit.MILLISECONDS);  
     }  
   }  
 }  
   

In this example we are sending JVM non heap used memory using Memory Mbean and sending its value to riemann.


























Similarly we can send all jvm related metrics like jvm cpu, thread count ,heap memory usage etc...using Mbeans.

In this article we have discussed basic event sending from riemann client to riemann server and its basic display on riemann dashboard.

In part2 we will see riemann integration with Graphite and also discuss why it is needed.
In part3 we will see some advanced stream processing examples in riemann and some basic overview about Collected.

Below diagram depicts the architecture covered till now in this series.
After each article of this series I will update this diagram with newly learned things.























That's it for now...

part2 is my next article on this series.

Please post your comments and doubts!!!