Networking Problem: I can’t connect to your service (tcp) failed: Connection timed out

Networking Problem: I can’t connect to your service (tcp) failed: Connection timed out

Imagine that a friend is trying to connect to one of your services and he mention that when he tries to connect, finally displays a  “(tcp) failed: Connection timed out”

The first thing, I go and check if I could connect to the service, then I’ll check if the service is working properly, if it’s right, I will go and check the firewall…..

Wow, I have all open in iptables, everybody could connect to that service, but I need to deal with my friend and tell him something! Because he told me, that he doesn’t have any rule that could block the connections.

First of all, I’m going to try to simulate this problem.

I open the port listening in X ip.

nc -l 127.0.0.2 3000

 

Then I start sniffing:

tcpdump -vvv -s0 -i lo -w lo.pcap

 

With netcat I also try to connect to the service:

nc -v -z 127.0.0.2 3000
nc: connect to 127.0.0.2 port 3000 (tcp) failed: Connection timed out

 

And now I open the .pcap with wireshark.

127.0.0.1 is my FRIENDS IP and 127.0.0.2 is the service in port 3000.

Here we could see, how my FRIEND/CLIENT send me a SYN, but when I answer with the SYN,ACK the client send me a retransmission of the SYN, and here it’s where the loop starts, because I also have to send him again a SYN,ACK.

The first thing that I think: the origin is blocking the incoming SYN,ACK

 

So I ask my friend for the RULES, and here they are:

 

Changing the limits on the fly

Changing the limits on the fly

I never did this before, but now that I know that it works, I will do it more often.

To show how it works, I changed the number of processes (nproc) hard limit of the user koteo in the limits.conf to 10

With the prlimit command I display the limits of the first PID of user “koteo”, that matches with the limits of the file limits.conf

Now I execute bash until I get the error “Cannot fork”.

I execute prlimit with the parameter –nproc=1024:1024 (soft:hard) and the parameter –pid $pid (we get $pid from pgrep) . We just changed the soft and hard limit to 1024, as you could see at the bottom of the next screenshot.

Now I can execute again the bash command, after the error “Cannot fork”.

Here I show you, that we actually have more than 10 processes for koteo user.

 

If I’m not wrong, this started working with kernels 2.6.32+

I know that in few versions of the kernel works “echo -n “Max processes=SOFT_L:HARD_L” > /proc/$PID/limits” but not in the one that I have. It displays the error:

`write(2, “: Invalid argument”, 18: Invalid argument) = 18`

 

 

Query your webservices with a simple python script

Query your webservices with a simple python script

Hello all!!

I usually need yo check webservices. I check the http code and the time it takes to give me back the result of the soap query. It works only with ssl.

To execute the code:

python ./query_webservice.py -file /tmp/file.xml -host ws.example.com -context /context/ws -soapv v1.1

You will need to have a valid file.xml and specify the soap version

The output result:

HTTP_CODE: 200 HEALTH: OK
Exec_Total_Time: 35 ms

#!/usr/bin/env python
#-*- coding: utf-8 -*-
# Koldo Oteo - (koteo [at] sitereliabilityengineer.io)
# December 18th 2017
import sys, time
import argparse
import httplib
import xml.dom.minidom

### Parse arguments
parser = argparse.ArgumentParser(description='Example:  ./query_webservice.py -file /tmp/file.xml \
                                 -host hostname.domain -context /context -soapv v1.1')
parser.add_argument('-file', action='store', dest='xml',
                    help='xml File Name')
parser.add_argument('-host', action='store', dest='host',
                    help='Webservice host')
parser.add_argument('-context', action='store', dest='context',
                    help='Webservice context')
parser.add_argument('-soapv', action='store', dest='soapv',
                    help='Soap Version v1.1 or v1.2')
# Print Parser Help
if len(sys.argv) == 1:
    parser.print_help()
    sys.exit(1)
param = parser.parse_args()
###

### FUNCTION TO Read xml File
def read_xml():
   with open(param.xml, 'r') as f:
      xmlmsg = f.read()
      return xmlmsg

###

### FUNCTION TO POST XML TO WEBSERVICE
def post_xml(xmlmsg):
   """HTTP XML Post request"""
   if param.soapv == "v1.2":
      headers = {"Content-type": "application/soap+xml","Content-Length": "%d" % len(xmlmsg), "charset": "utf-8", "SOAPAction": "", "User-Agent": "PythonSOAPClient"}
   elif param.soapv == "v1.1":
      headers = {"Content-type": "text/xml","Content-Length": "%d" % len(xmlmsg), "charset": "utf-8", "SOAPAction": "", "User-Agent": "PythonSOAPClient"}
   conn = httplib.HTTPSConnection(param.host)
   conn.request("POST", param.context, "", headers)
   # Send xml
   conn.send(xmlmsg)
   response = conn.getresponse()
   print "HTTP_CODE: %s  HEALTH: %s" % (response.status, response.reason)
   data = response.read()
   #resultxml = xml.dom.minidom.parseString(data)
   #print (resultxml.toprettyxml())
   conn.close()

###
# READ XML FILE
xmlmsg = read_xml()

# GET EXECUTION TOTAL TIME AND POST XML
start_time = time.time()
post_xml(xmlmsg)
print("Exec_Total_Time: %s ms" % int(round((time.time() - start_time) * 1000)))


###

Some tips for finding performance issues in Linux

Some tips for finding performance issues in Linux

Sometimes we have some trouble with processes that demands lot of IO. There´s a great tool for that, iotop.

I executed fio, to generate some read stress:

mkdir -p /tmp/data ; fio –runtime=300 –time_based –name=random-read –rw=randread –size=128m –directory=/tmp/data

As you could see, fio is on the top of the iotop’s view, and the io is at 99%, displaying the DISK READ K/s.

If you don´t have iotop, you could do it with a little script. It´s not going to give you all that information that iotop shows, but is really good.

Basicly, this script shows the process in “D” State. Processes that are waiting for I/O are commonly in an “uninterruptible sleep” state or “D”; given this information we can simply find the processes that are constantly in a wait state.

cd /proc ; for pid in [0-9]* ; do awk ‘$2 == “D” {print “The process with PID: ‘${pid}’ is in ‘D’ State”;}’ $pid/status ; done

You could see some detailed information with iostat.

iostat -xdy 2 5 (x= Display extended statistics, d= Display the device utilization and y= Omit first report with statistics since system boot)

We see that, %util is very high. It´s very useful to see the r/s w/s (In this case the problem are the reads)

Sometimes we could reach the limit of “open files”.

Error: “Too many open files (24)”

You could see the total of files opened in the system with a simple shell script or lsof:

# for pid in /proc/[0-9]* ; do echo $(ls $pid/fd | wc -l) ; done | sort -n | awk ‘{ SUM += $1} END { print SUM }’

# lsof -Xn -a -d ^mem -d ^cwd -d ^rtd -d ^txt -d ^DEL | wc -l

If you need, to display the info, only for one user, you will need to pass the argument -u $USER to lsof

If you’ve overpassed the user limit, you will need to change this limit with:

 

Changing the limit for the user: (Edit your .profile and add it or change it
in limits.conf

* ulimit -Hn $NEW_LIMIT ($HOME/.profile)

* Or maybe you will need to do this change globally for all the users, 
editing /etc/sysctl.conf and modifying the value of fs.file-max = $NEW_GLOBAL_LIMIT

Where has all my disk space gone? (linux)

Where has all my disk space gone? (linux)

Sometimes I receive some nagios alerts, displaying a high usage of a filesystem. The first thing I do, a df -h and after that I du -csh /directory (that I suspect could be guilty).

My surprise came after the du, the du tell me /directory is innocent!!! Let me show you an example.

With df we see 17Gb used

The du displays 7.9Gb used! Something strange happens

I´m a bit abset-minded so I start to du -csh /directory1, then /usr/loca/directory200, then /directory3000 until I remember!! Maybe the file is deleted, but not truncated first, and the file descriptor stills open??? !! Ohh, lets see….

So I execute: ‘lsof -X | grep “(deleted)\|COMMAND”|more’ and I see lot of stuff…..

Now I see there are lots of deleted files using space. They are “deleted” but the fd stills open. For example file .out00029 is using 54Mb. The lsof displays the usage in bytes, but we could do “56952801 / (1024.0 * 1024.0) = 54Mb” to get Mb

Now I go to /proc/29937/fd (29937 is the PID of the process that haves the fd open) and I do ls:

This file haves two file descriptors, still used by the PID 29937 (it´s a weblogic server). In the lsof you could see fd 1w and 2w and in the ls you could see 2 and 1, just after the time. The w means that the file descriptor is marked as writeable.

One trick to free that space is to truncate the fd directly. I do it this way, but there are more ways to do it:

# cd /proc/29937/fd and # :> 1 (to truncate file descriptor 1) or # :> 2 (truncate fd 2)

I´ve created a .py for displaying deleted files, that are still in use (fd open) and ordering this fd by size and displaying the location of the fd, so you could truncate it. Take care!! You have to know what you are doing, because you could delete something important.

Now I´m going to put a screenshot of the output of my script. My apologies with my bad code, I´m not a rock star programming, but I try to do it in ‘my’ best way.

And here the location of the .py in my github:

https://github.com/SiteReliabilityEngineering/sre/blob/master/deleted_files_fd_open

I hope, it could help!!!