Grafana MySQL and InfluxDB annotations

What are Annotations and why are useful ?

Annotations is a way to add a Marker over the Graph (Panel). Depending on where you store the annotations, you would have the possibility to mark the Start and the End. This datasources support $start and $end in grafana:

  1. Grafana internal datastore
  2. Elasticsearch
  3. MySQL

Why are useful?

It’s a simple way of highlighting and visualization an action over a Graph. For example: Somebody is running a script over any element of the infrastructure. You can add tags, and text. In the tags you can add a datacenter, server, or a change ID and in the text, you can add a description and a hyperlink. This is very useful when you have an incident caused by a change in the infrastructure, because you will see when it started and finished the change over a metric (for example: successful logins). Obviously you need to have a good source of truth.

I’ve created a docker-compose environment and pushed into github:

https://github.com/sitereliabilityengineer/grafana_annotations_demo

If you check the file docker-compose.yml.all you will se that we have 3 docker containers: Grafana, MySQL and InfluxDB.

To install everything execute the bash script: bootstap.sh. It will ask you for a password (to connect to the databases) and it will replace the password in all the needed files. It will take 4-7 minutes depending on your environment. It also provision the datasources and two dashboards. If you want to know how the datasources/dashboards are provisiones, take a look into $PROJECT/grafana/dashboard.yml, datasource.yml, both .json files and the Dockerfile.

The previous script will also populate both databases with some example datapoints and annotations with this two scripts: populate_influxdb.py and populate_mysql.py

Grafana Access:

URL: http://localhost:3000

The first time you access to grafana you will need to put: admin/admin (user/password) and then change it.

There are two provisioned dashboards:

An example of both Dashboards:

MySQL Dashboard
InfluxDB Dashboard

To stop everything you need to execute docker-compose -f docker-compose.yml.all down at the PROJECT level.

If you want to delete everything, Execute: removeAll.sh

This last script won’t delete the data inside $PROJECT/influxdb/data and you will need to do it manually.

Please let me know if you have any question.

Networking Problem: I can’t connect to your service (tcp) failed: Connection timed out

Networking Problem: I can’t connect to your service (tcp) failed: Connection timed out

Imagine that a friend is trying to connect to one of your services and he mention that when he tries to connect, finally displays a  “(tcp) failed: Connection timed out”

The first thing, I go and check if I could connect to the service, then I’ll check if the service is working properly, if it’s right, I will go and check the firewall…..

Wow, I have all open in iptables, everybody could connect to that service, but I need to deal with my friend and tell him something! Because he told me, that he doesn’t have any rule that could block the connections.

First of all, I’m going to try to simulate this problem.

I open the port listening in X ip.

nc -l 127.0.0.2 3000

 

Then I start sniffing:

tcpdump -vvv -s0 -i lo -w lo.pcap

 

With netcat I also try to connect to the service:

nc -v -z 127.0.0.2 3000
nc: connect to 127.0.0.2 port 3000 (tcp) failed: Connection timed out

 

And now I open the .pcap with wireshark.

127.0.0.1 is my FRIENDS IP and 127.0.0.2 is the service in port 3000.

Here we could see, how my FRIEND/CLIENT send me a SYN, but when I answer with the SYN,ACK the client send me a retransmission of the SYN, and here it’s where the loop starts, because I also have to send him again a SYN,ACK.

The first thing that I think: the origin is blocking the incoming SYN,ACK

 

So I ask my friend for the RULES, and here they are:

 

Changing the limits on the fly

Changing the limits on the fly

I never did this before, but now that I know that it works, I will do it more often.

To show how it works, I changed the number of processes (nproc) hard limit of the user koteo in the limits.conf to 10

With the prlimit command I display the limits of the first PID of user “koteo”, that matches with the limits of the file limits.conf

Now I execute bash until I get the error “Cannot fork”.

I execute prlimit with the parameter –nproc=1024:1024 (soft:hard) and the parameter –pid $pid (we get $pid from pgrep) . We just changed the soft and hard limit to 1024, as you could see at the bottom of the next screenshot.

Now I can execute again the bash command, after the error “Cannot fork”.

Here I show you, that we actually have more than 10 processes for koteo user.

 

If I’m not wrong, this started working with kernels 2.6.32+

I know that in few versions of the kernel works “echo -n “Max processes=SOFT_L:HARD_L” > /proc/$PID/limits” but not in the one that I have. It displays the error:

`write(2, “: Invalid argument”, 18: Invalid argument) = 18`

 

 

Query your webservices with a simple python script

Query your webservices with a simple python script

Hello all!!

I usually need yo check webservices. I check the http code and the time it takes to give me back the result of the soap query. It works only with ssl.

To execute the code:

python ./query_webservice.py -file /tmp/file.xml -host ws.example.com -context /context/ws -soapv v1.1

You will need to have a valid file.xml and specify the soap version

The output result:

HTTP_CODE: 200 HEALTH: OK
Exec_Total_Time: 35 ms

#!/usr/bin/env python
#-*- coding: utf-8 -*-
# Koldo Oteo - (koteo [at] sitereliabilityengineer.io)
# December 18th 2017
import sys, time
import argparse
import httplib
import xml.dom.minidom

### Parse arguments
parser = argparse.ArgumentParser(description='Example:  ./query_webservice.py -file /tmp/file.xml \
                                 -host hostname.domain -context /context -soapv v1.1')
parser.add_argument('-file', action='store', dest='xml',
                    help='xml File Name')
parser.add_argument('-host', action='store', dest='host',
                    help='Webservice host')
parser.add_argument('-context', action='store', dest='context',
                    help='Webservice context')
parser.add_argument('-soapv', action='store', dest='soapv',
                    help='Soap Version v1.1 or v1.2')
# Print Parser Help
if len(sys.argv) == 1:
    parser.print_help()
    sys.exit(1)
param = parser.parse_args()
###

### FUNCTION TO Read xml File
def read_xml():
   with open(param.xml, 'r') as f:
      xmlmsg = f.read()
      return xmlmsg

###

### FUNCTION TO POST XML TO WEBSERVICE
def post_xml(xmlmsg):
   """HTTP XML Post request"""
   if param.soapv == "v1.2":
      headers = {"Content-type": "application/soap+xml","Content-Length": "%d" % len(xmlmsg), "charset": "utf-8", "SOAPAction": "", "User-Agent": "PythonSOAPClient"}
   elif param.soapv == "v1.1":
      headers = {"Content-type": "text/xml","Content-Length": "%d" % len(xmlmsg), "charset": "utf-8", "SOAPAction": "", "User-Agent": "PythonSOAPClient"}
   conn = httplib.HTTPSConnection(param.host)
   conn.request("POST", param.context, "", headers)
   # Send xml
   conn.send(xmlmsg)
   response = conn.getresponse()
   print "HTTP_CODE: %s  HEALTH: %s" % (response.status, response.reason)
   data = response.read()
   #resultxml = xml.dom.minidom.parseString(data)
   #print (resultxml.toprettyxml())
   conn.close()

###
# READ XML FILE
xmlmsg = read_xml()

# GET EXECUTION TOTAL TIME AND POST XML
start_time = time.time()
post_xml(xmlmsg)
print("Exec_Total_Time: %s ms" % int(round((time.time() - start_time) * 1000)))


###

Some tips for finding performance issues in Linux

Some tips for finding performance issues in Linux

Sometimes we have some trouble with processes that demands lot of IO. There´s a great tool for that, iotop.

I executed fio, to generate some read stress:

mkdir -p /tmp/data ; fio –runtime=300 –time_based –name=random-read –rw=randread –size=128m –directory=/tmp/data

As you could see, fio is on the top of the iotop’s view, and the io is at 99%, displaying the DISK READ K/s.

If you don´t have iotop, you could do it with a little script. It´s not going to give you all that information that iotop shows, but is really good.

Basicly, this script shows the process in “D” State. Processes that are waiting for I/O are commonly in an “uninterruptible sleep” state or “D”; given this information we can simply find the processes that are constantly in a wait state.

cd /proc ; for pid in [0-9]* ; do awk ‘$2 == “D” {print “The process with PID: ‘${pid}’ is in ‘D’ State”;}’ $pid/status ; done

You could see some detailed information with iostat.

iostat -xdy 2 5 (x= Display extended statistics, d= Display the device utilization and y= Omit first report with statistics since system boot)

We see that, %util is very high. It´s very useful to see the r/s w/s (In this case the problem are the reads)

Sometimes we could reach the limit of “open files”.

Error: “Too many open files (24)”

You could see the total of files opened in the system with a simple shell script or lsof:

# for pid in /proc/[0-9]* ; do echo $(ls $pid/fd | wc -l) ; done | sort -n | awk ‘{ SUM += $1} END { print SUM }’

# lsof -Xn -a -d ^mem -d ^cwd -d ^rtd -d ^txt -d ^DEL | wc -l

If you need, to display the info, only for one user, you will need to pass the argument -u $USER to lsof

If you’ve overpassed the user limit, you will need to change this limit with:

 

Changing the limit for the user: (Edit your .profile and add it or change it
in limits.conf

* ulimit -Hn $NEW_LIMIT ($HOME/.profile)

* Or maybe you will need to do this change globally for all the users, 
editing /etc/sysctl.conf and modifying the value of fs.file-max = $NEW_GLOBAL_LIMIT