Grafana MySQL and InfluxDB annotations

What are Annotations and why are useful ?

Annotations is a way to add a Marker over the Graph (Panel). Depending on where you store the annotations, you would have the possibility to mark the Start and the End. This datasources support $start and $end in grafana:

  1. Grafana internal datastore
  2. Elasticsearch
  3. MySQL

Why are useful?

It’s a simple way of highlighting and visualization an action over a Graph. For example: Somebody is running a script over any element of the infrastructure. You can add tags, and text. In the tags you can add a datacenter, server, or a change ID and in the text, you can add a description and a hyperlink. This is very useful when you have an incident caused by a change in the infrastructure, because you will see when it started and finished the change over a metric (for example: successful logins). Obviously you need to have a good source of truth.

I’ve created a docker-compose environment and pushed into github:

https://github.com/sitereliabilityengineer/grafana_annotations_demo

If you check the file docker-compose.yml.all you will se that we have 3 docker containers: Grafana, MySQL and InfluxDB.

To install everything execute the bash script: bootstap.sh. It will ask you for a password (to connect to the databases) and it will replace the password in all the needed files. It will take 4-7 minutes depending on your environment. It also provision the datasources and two dashboards. If you want to know how the datasources/dashboards are provisiones, take a look into $PROJECT/grafana/dashboard.yml, datasource.yml, both .json files and the Dockerfile.

The previous script will also populate both databases with some example datapoints and annotations with this two scripts: populate_influxdb.py and populate_mysql.py

Grafana Access:

URL: http://localhost:3000

The first time you access to grafana you will need to put: admin/admin (user/password) and then change it.

There are two provisioned dashboards:

An example of both Dashboards:

MySQL Dashboard
InfluxDB Dashboard

To stop everything you need to execute docker-compose -f docker-compose.yml.all down at the PROJECT level.

If you want to delete everything, Execute: removeAll.sh

This last script won’t delete the data inside $PROJECT/influxdb/data and you will need to do it manually.

Please let me know if you have any question.

Your site is blocked from a region and you don´t know why? Maybe SNI related?

Your site is blocked from a region and you don´t know why? Maybe SNI related?

The other day at work, we received issues from customers. They told us that they couldn´t access to their web Instance. In all the cases, the origin was the same country.

The first thing that I ask, is to give me a screenshot of the error, but it wasn´t a great help.

We also asked for a telnet instance 443

My surprise was that, telnet worked well, but then….

With curl doesn´t work well. Maybe It´s pointing to something related to SSL.

We decided to sniff on the client side, loadbalancer, etc…

We could see RSTs just after the client Client Hello. It indicates that it could be a problem with the handshake, maybe ciphers, etc etc..

Here I don´t have the loadbalancer .pcap files, but as far as I remember, in the loadbalancers we also received RSTs. So what´s sending the RSTs?

Let´s examine the Client Hello inside the .pcap file.

Let´s go down until Server Name Indication extension. After server Name length. It should be the name of the severname (I deleted in this case)

What´s the Server Name Indication Extension: Server Name Indication, often abbreviated SNI, is an extension to TLS that allows multiple hostnames to be served over HTTPS from the same IP address.

Let´s try to change the headers with curl or openssl.

Here you could see how it worked. Why? Let´s go and open the new .pcap file.

In this new .pcap we can´t see the the SNI extension and we can´t also see any of the RST

Ok, if we have more than one site in that ip address it would be a mess, because we would access to the first loaded certificate, but in this case we only wanted to see that the problem was with SNI filtering.

Here you have a nice website to check a website from different agents. In this case agents in China and other regions, to see the differences.

https://www.websitepulse.com/tools/china-firewall-test

SNI filtering is used very often by Internet providers to block access to torrent sites or similar. In this case it might be related to the great firewall (China).

New life in Dublin

I´m in Dublin living for 1 month !! Only in this year I´ve been living in 3 different countries (Spain, U.K, Ireland)

I´m very excited with this challenge, it´s something that everybody need to do 1 time in his life, but is also very stressful….

I´m in a new role (SRE)

Hello Dublin!

This is a photo I shot few days ago….

 

Networking Problem: I can’t connect to your service (tcp) failed: Connection timed out

Networking Problem: I can’t connect to your service (tcp) failed: Connection timed out

Imagine that a friend is trying to connect to one of your services and he mention that when he tries to connect, finally displays a  “(tcp) failed: Connection timed out”

The first thing, I go and check if I could connect to the service, then I’ll check if the service is working properly, if it’s right, I will go and check the firewall…..

Wow, I have all open in iptables, everybody could connect to that service, but I need to deal with my friend and tell him something! Because he told me, that he doesn’t have any rule that could block the connections.

First of all, I’m going to try to simulate this problem.

I open the port listening in X ip.

nc -l 127.0.0.2 3000

 

Then I start sniffing:

tcpdump -vvv -s0 -i lo -w lo.pcap

 

With netcat I also try to connect to the service:

nc -v -z 127.0.0.2 3000
nc: connect to 127.0.0.2 port 3000 (tcp) failed: Connection timed out

 

And now I open the .pcap with wireshark.

127.0.0.1 is my FRIENDS IP and 127.0.0.2 is the service in port 3000.

Here we could see, how my FRIEND/CLIENT send me a SYN, but when I answer with the SYN,ACK the client send me a retransmission of the SYN, and here it’s where the loop starts, because I also have to send him again a SYN,ACK.

The first thing that I think: the origin is blocking the incoming SYN,ACK

 

So I ask my friend for the RULES, and here they are:

 

Changing the limits on the fly

Changing the limits on the fly

I never did this before, but now that I know that it works, I will do it more often.

To show how it works, I changed the number of processes (nproc) hard limit of the user koteo in the limits.conf to 10

With the prlimit command I display the limits of the first PID of user “koteo”, that matches with the limits of the file limits.conf

Now I execute bash until I get the error “Cannot fork”.

I execute prlimit with the parameter –nproc=1024:1024 (soft:hard) and the parameter –pid $pid (we get $pid from pgrep) . We just changed the soft and hard limit to 1024, as you could see at the bottom of the next screenshot.

Now I can execute again the bash command, after the error “Cannot fork”.

Here I show you, that we actually have more than 10 processes for koteo user.

 

If I’m not wrong, this started working with kernels 2.6.32+

I know that in few versions of the kernel works “echo -n “Max processes=SOFT_L:HARD_L” > /proc/$PID/limits” but not in the one that I have. It displays the error:

`write(2, “: Invalid argument”, 18: Invalid argument) = 18`