Saturday 17 December 2016

Part3: Build your own monitoring system using Riemann,Graphite,Collectd.

In previous article , part2, we have discussed about Graphite integration with riemann. In this article I will give overview of Collectd and some advanced stream processing options in riemann.

Collectd Overview

Collectd is a daemon and gathers metrics from various sources, e.g. the operating system, applications, logfiles and external devices, and stores this information or makes it available over the network.

Collectd itself is a big item to discuss and there are lot of things you can achieve with it.But here I will discuss only the area of our interest. What we will do is we will tell collectd to send the metrics collected by it to the graphite server!!!!!!!!! pretty amazing right???

Collectd installation and Plugin concept

Download collectd from this link according to the flavour of your Linux distribution.

For my case steps are:

1) sudo apt-get install collectd 
2) service start collectd

Done. Your collectd is installed and running.
Now lets take a look at very important config file related to collectd.
In my case it is located at /etc/collectd/collectd.conf
If you will open this file you can observer some list of plugins and configuration related to each plugin.

In collectd we have concept of plugins. We need different types of plugin to fetch different types of metrics and doing monitoring activity.
  
Above is plugin to fetch cpu related info of the system on which collectd is running. We will se outcome of this plugin very soon on Graphite.

Now you must have figured out that if we want collectd to forward this metrics to graphite than we must be having some plugin for that. Ohhhh yeah!!!! your guess is right. We do have a plugin for it.





















two things we are doing here. First we are defining write_graphite plugin and second we are providing config for that plugin. The host name of graphite server is localhost as it is installed in same VM. All collectd related graphs will be rendered in graphite under prefix name we have set here. After adding graphite related plugin save the file and restart the collectd service.


Below is the outcome on Graphite dashboard for collectd:

























Here I am stopping my discussion for collectd and moving towards last and important section, riemann stream processing.

Riemann stream processing examples

I will show you few stream processing examples in riemann.

1) Send email based on service status

Below is the configuration for sending mail from Gmail. You can do similar things for your SMTP.
Add this configuration in your riemann.config file and restart riemann.
 (def email (mailer {:host "smtp.gmail.com"  
             :port 465  
             :ssl true  
             :tls true  
             :user "myaccout@gmail.com"  
             :pass "mypassword"  
             :from "myaccout@gmail.com"}))  
 (streams  
   (where (state "critical")  
    (email "xyz@gmail.com")))  

two things we are doing here.
1) Declaring email related configuration. This could be vary depending on the SMTP provider.
2) I am defining one stream rule such that if state of any service is critical than send out mail to some email id.

Lets send "critical" state from java code for our "fridge" service created in part1.
 RiemannClient c = RiemannClient.tcp("localhost", 5555);  
     c.connect();  
     c.event().  
         service("fridge").  
         state("critical").  
         metric(10).  
         tags("appliance", "cold").  
         send().  
         deref(5000, java.util.concurrent.TimeUnit.MILLISECONDS);  

Lets see the received mail in xyz@gmail.com













This the default mail template used by riemann. You can change the format and details of email. I am leaving that part for your assignment.


2) Email the exception 
Add  below stream processing rule in your riemann.config file.
 (streams  
   (where (service "exception-alert")  
    (email "xyz@gmail.com")))  

Lets send some exception from java code:
 RiemannClient c = RiemannClient.tcp("localhost", 5555);  
     c.connect();  
     try {  
       // some business logic  
       throw new NullPointerException("NullPointer exception in your system..Somebody will be in trouble!!! ");  
     } catch (Exception e) {  
       c.event().  
           service("exception-alert").  
           state(e.getLocalizedMessage()). // you can send full stacktrace also  
           tags("error", "exception", "failure").  
           send().  
           deref(5000, java.util.concurrent.TimeUnit.MILLISECONDS);  
     }  

Lets see the received mail in xyz@gmail.com:












what else you can do ?
1) Send email alert if some VM/service is down.
2) Filter and process stream depending on hostname,service name, metric value, service state, tag values etc... and perform some actions based on that.
3) You can set threshold values for metrics received and perform some actions if threshold value is crossed. e.g: VM cpu is very high, above 95%, some business specific constraint value is violated...

These are just few examples I have given. Check out the link I have posted at the end of article for riemann.

Below is the updated architecture diagram:



























Collectd daemon will send all system related generic metrics to Graphite.


In this three series of article I have just scratched the surface for this area. There are thousand different things and possibilities you can think and achieve with this monitoring framework.

Below are the useful links for different types of config,plugins and integration of other systems you can do with riemann,Graphite and collectd. I have just explained 3% of entire.Rest things you can add as per your need and use case of system.

Riemann:
http://riemann.io/clients.html
http://riemann.io/howto.html

Graphite:
https://graphiteapp.org/#integrations
http://graphite.readthedocs.io/en/latest/tools.html
http://grafana.org/

Collectd:
https://collectd.org/
https://collectd.org/wiki/index.php/Plugin

This is the last article of this series.
Hope you have enjoyed!!!

Please post your comments and doubts!!!

1 comment:

  1. Is there anyway collectd can send metrics data to graphite over TLS1.2?

    ReplyDelete