Sometimes I receive some nagios alerts, displaying a high usage of a filesystem. The first thing I do, a df -h and after that I du -csh /directory (that I suspect could be guilty).
My surprise came after the du, the du tell me /directory is innocent!!! Let me show you an example.
With df we see 17Gb used
The du displays 7.9Gb used! Something strange happens
I´m a bit abset-minded so I start to du -csh /directory1, then /usr/loca/directory200, then /directory3000 until I remember!! Maybe the file is deleted, but not truncated first, and the file descriptor stills open??? !! Ohh, lets see….
So I execute: ‘lsof -X | grep “(deleted)\|COMMAND”|more’ and I see lot of stuff…..
Now I see there are lots of deleted files using space. They are “deleted” but the fd stills open. For example file .out00029 is using 54Mb. The lsof displays the usage in bytes, but we could do “56952801 / (1024.0 * 1024.0) = 54Mb” to get Mb
Now I go to /proc/29937/fd (29937 is the PID of the process that haves the fd open) and I do ls:
This file haves two file descriptors, still used by the PID 29937 (it´s a weblogic server). In the lsof you could see fd 1w and 2w and in the ls you could see 2 and 1, just after the time. The w means that the file descriptor is marked as writeable.
One trick to free that space is to truncate the fd directly. I do it this way, but there are more ways to do it:
# cd /proc/29937/fd and # :> 1 (to truncate file descriptor 1) or # :> 2 (truncate fd 2)
I´ve created a .py for displaying deleted files, that are still in use (fd open) and ordering this fd by size and displaying the location of the fd, so you could truncate it. Take care!! You have to know what you are doing, because you could delete something important.
Now I´m going to put a screenshot of the output of my script. My apologies with my bad code, I´m not a rock star programming, but I try to do it in ‘my’ best way.
And here the location of the .py in my github:
I hope, it could help!!!