FINDING INODES ON A REALLY FULL UP SYSTEM
The solution is the following command, except we are going to break it down – first we cd into the directory that has the culprit filesystem and run the command:
cd /c
find . -type f | cut -d “/” -f 2 | sort | uniq -c | sort -n
NOTE: if you have other filesystems mounted within /c in this case then include the -xdev part, or honestly just unmount them so they dont interfere – but if you cant this will help:
find . -xdev -type f | cut -d “/” -f 2 | sort | uniq -c | sort -n
for the following examples you can use -xdev as well, I will not show anymore examples with it, just remember to put it between the “.” and the “-type” arguments in the find command.
It finds inodes like magic – but it can be too much on a system that has alot of inodes.
If you have 137 million inodes, thats like 137 million files, if each filename has on average 64 bytes(thats just an overestimation for most cases, but close enough to give good order of magnitude estimation) then your pumping 8 gigs of of stream at every pipe |. Thats alot of memory. So its better to process these one by one like this, or pump the output of the “find . -type f” command to a remote system using the fastest ssh algorithm with compression.
ONE BY ONE METHOD
####################
Before this if you can mount all the ram and swap as you can find that will be the best as this will be intense on the system: swapon /dev/md1
cd /c
find . -type f > /tmp/step1
cat /tmp/step1 | cut -d “/” -f 2 > /tmp/step2
cat /tmp/step2 | sort > /tmp/step3
cat /tmp/step3 | uniq -c > /tmp/step4
cat /tmp/step4 | sort -n > /tmp/final.out
Or for the last command do the following so you can see the output as well
cat /tmp/step4 | sort -n | tee /tmp/final.out
Note: The tee command just does exactly what it sounds like it does, it Ts. It takes input from stdin (the output of the left side of the pipe) and outputs it to the screen(stdout) and to a file. Its useful because we dont want to just output to the screen, what if the screen explodes, we need to save it as well to a nice file.
Then view the output of final.out:
cat /tmp/final.out
Unmount the swap: swapoff /dev/md1
THROUGH ANOTHER SYSTEM VIA SSH
##################################
If the system has so many files that it will just be too much for your small system to process, then pump the output of find . -type f through ssh into another system (hopefully a much more powerful system)
The powerful system in this case has an ssh server on port 22 locally as per usual, but through the firewall I use port 54321 (and forward that to the ssh machines port 22). The public/wan address is: forge.kostiahome.com. Note if you port forward port 22 to your ssh server, or have an ssh server with a wan ip for its interface then you can neglect the part that says “-p 54321”
ON SYSTEM DAMAGED WITH FILES:
The following command is generic.
GENERIC FORM: find . -type f | ssh [remote-host] -p [ssh-port] “cat – > [remote-file]”
find . -type f | ssh forge.somedomain.com -p 54321 “cat – > /tmp/step1.txt”
The following command uses compression (-C) and faster encryption then the default encryption (-c arcfour,blowfish-cbc)
GENERIC FORM: find . -type f | ssh [remote-host] -c arcfour,blowfish-cbc -C -p [ssh-port]“cat – > [remote-file]”
find . -type f | ssh forge.somedomain.com -c arcfour,blowfish-cbc -C -p 54321 “cat – > /tmp/step1.txt”
Then log in to the remote host and do the following.
ON THE MORE POWERFUL SYSTEM:
Then you have 2 options – I recommend the later since this is a powerful machine:
cat /tmp/step1 | cut -d “/” -f 2 > /tmp/step2
cat /tmp/step2 | sort > /tmp/step3
cat /tmp/step3 | uniq -c > /tmp/step4
cat /tmp/step4 | sort -n > /tmp/final.out
Or if you want you can just run the following command and wait for the output
cat /tmp/step1 | cut -d “/” -f 2 | sort | uniq -c | sort -n | tee /tmp/final.out
===SOME CALCULATIONS===
Before you do all this, its nice to do some quick estimating and quick calculations. This will show you how to calculate the minimum time the ssh command will take – assuming you dont do the compression of course. With compression its hard to estimate the minimum since the compression can be really strong.
Do the following command:
df -i
And see how many used inodes there are. Each inode is a directory and or a file, most likely a file.
(I just realized something magnificent, but I wont mention it till the end)
Also you can
dumpe2fs /dev/c/c
When you get the INODES USED. Lets make some assumptions.
Im going to assume each full filename is about this long on average – 64 characters = thats 64 bytes:
/c/data/whocares/isuredo/why/because/isawesome/but you might.txt
So with 64 bytes per file.
[USED INODES] * 64 byte = step1 file size
This gives you an idea of how big the stream is that we are working with.
Step1.txt file will be pretty big.
Example:
137 million files
64 bytes per file
Thats about 8 gigs on the step1 file, and step2 file and then after the uniq command it gets smaller.
The point is, you can calculate the approx minimum time of completion of the ssh command by getting your transfer speed of the ssh stream and then doing some dividing. However thats the generic ssh command, without compression – realize that with the stream compression it will be shorter time.
Example:
Lets say I know my transfer is going at 1 MB / sec with 137 million inodes ~ 137 million files ~ 8 Gig stream. Well that will take about
8 Gig divide by 1 MB
Or go to wolframalpha.com (bookmark this is the best calculator for IT based calculations)
And type in the search box: (137000000 * 64 bytes) / (1 Megabyte/ second)
The result is 2 hours.
Now ofcourse with compression it can be faster.
The real question in this, is how did I get the 1 MB / sec transfer speed?
===TO GET THE TRANSFER SPEED OF THE SSH COPY===
Go to the remote-host and do the following:
apt-get update
apt-get install iftop
ifconfig
Notice the ifconfig that has an increasing number of packets, thats your main interface, lets assume its eth0
iftop -nNBP -i eth0 or simply iftop -i eth0 works (nNBP for no hostname lookups, no port name look ups, BYTES output instead of the default BITS, and finally to show port numbers)
That should give you an idea of how fast the ssh(port 22) command is receiving that stream, so you can use it for your order of magnitude time to completion estimation.
Here is a quick guide on how to understand it:
http://www.thegeekstuff.com/2008/12/iftop-guide-display-network-interface-bandwidth-usage-on-linux/
===WHAT I REALIZED EARLIER THAT I WILL SAY NOW===
Up to this point every article was saying inodes fill up because of too many files, well directories do it to. Sure not as much but its possible. But its safe to assume every directory has file here or there. Its not likely that over 50% of the inodes are directories. But if they are these commands will not work, and would probably need to be redone using “find . -type d” but then also a few other changes would be in order.
Any how I doubt that directories would ever be the culprit, unless there is some sort of virus doing that.