FIND MOST FRAGMENTED FILES mostfragged – using filefrag to analyze file fragmentation (find most fragmented files)

The script your looking for are at the bottom (and also in the gist link in the 9/5/2022 update), the rest are explanations.

UPDATES TO MOSTFRAGGED SCRIPT

UPDATE 9/5/2022: 7 years later, I implemented this on my ReadyNAS (yes I still have one). I tweaked some of the scripts and corrected minor errors and also added plotting ability via matplotlib.

  • All of the scripts are here: https://gist.github.com/bhbmaster/e6fd83df91e60d627f272e16b8745c8c
  • Fragmentation plot output from my own NAS (should be updated at least twice daily as of 2022-09-05): plots and table 

UPDATE 5/14/2015: added ReadyNAS version script at very bottom. Also added fragmentation percentage to STATS files & run output. Checkout my own personal NASes fragmentation being plotted every day. I did that using these bash and python scripts (read thru the article and check out “Example 2 plotting fragmentation”. Also uploaded the scripts to /scripts so that you can “wget” them: mostfragged (use on any server that has filefrag command – analyzes every file) and mostfragged-for-readynas-os6 (use on readynas os6 devices – analyzes every file but skips snapshots)

UPDATE 8/21/2014: added to script a way to skip analyzing snapshot folders (just read “Pick a find” comment section in script)

UPDATE 8/20/2014: works with files that have spaces. dumping all results to the same folder & better usability

UPDATE 5/27/2014: filenames output are better now in script (make more sense)

UPDATE 10/9/2014: verbose and silent find & less bogus error output. Also added all sorts of new outputs and better output comments. Also added ability to dump to a different directory with 2nd argument.

UPDATE 10/10/2014: clearing up outputs

UPDATE 10/10/2014: clearing up outputs and added FOOTER

REQUIREMENTS FOR SCRIPT: filefrag, awk, sed, echo, cat (the typical), readlink

Each file can be fragmented. A none fragmented file has 1 extent. Just the 1 extent that contains the file. An extent is a start and stop location on storage media where the Filesystem is, anything in between the start and stop location is the data that belong to the file.

Some filesystems fragment more then others, like the COW filesystems (which tend to not overwrite old data, so new data is going to be all over the place). Ex: BTRFS and ZFS. With BTRFS you can set files that you want to stay none-fragmented to have the attribute NODATACOW (check out this article: NODATACOW BTRFS). This is recommended for files that require fast IO (like vmdks, vm disk files etc..)

NOTE : If you have a BTRFS filesystem that has alot of fragmentation in a folder and you need to unfragment it, trying uncow.py (here is my script that runs uncow.py for every file & folder in current dir fixing fragmentation: UNCOW) <- this is like a makeshift defragmenter that works file by file (a defragmenter would work block by block, but this works file by file)

==========
EXAMPLE
==========

To understand extents lets look at this kitty.txt file

Example the file:
# cat kitty.txt
This cat of mine likes soup

# filefrag kitty.txt

Lets say it was 1 extent
It could be like this
* First and only extent: This cat of mine likes soup
The first and only extent could be at the end of the disk (or wherever). The contents of the entire file would be read from about the same location, so theoritecally the drive spindle could pick up all the data in foulswoop.

# filefrag kitty.txt

Lets say it was 2 extents
It could be like this
* First extent: This cat of
* Second extent: mine likes soup
The first extent could be at the end of the disk, and second extent at the front of the disk. This will cause the disk spindle to go back and forth.

Thus the kitty.txt file with 2 extents is more fragmented and

More fragmented = more extents

Density of fragmentation is measuring how many fragments or extents are per each average megabyte. Thats what the below scripts do.

Lets find out which files have the most extents per megabyte (thus they are the most fragmented):

====================
INTERACTIVE EXAMPLE
====================

Unfortunatly this interactive example wouldnt work good on files with spaces in the name. However the Script at the bottom works great with files with spaces (the issue was fixed in 8-21-2014 update)

# Lets say all of my data is located at /data

# DUMP ALL RESULTS HERE
cd /root
 

# DUMP ALL OF THE INFORMATION RELATING TO EXTENTS PER FILE - WITH "FIND" WE ITERATE THRU EACH FILE AND RUN "FILEFRAG"
find $LOCATION -type f -printf "%s - " -exec filefrag {} \; > /root/extents.txt &

# SORT BY SIZE AND EXTENTS (we need the extents ones)
cat * | sort -nk1 > sortbysize.txt
cat * | sort -nk4 > sortbyextents.txt

#CAT IN byextents OR sortbyextents.txt OR sortbysize.txt DOESNT MATTER WHICH ONE
cat byextents.txt | awk '{print "B: " $1, " e: " $4, " B/e: " ($1/$4), ": " $3}'
# Save the output:
cat byextents.txt | awk '{print "B: " $1, " e: " $4, " B/e: " ($1/$4), ": " $3}' > bytes-per-extent.txt

cat byextents.txt | awk '{print "M: " $1/1024/1024, " e: " $4, " e/M: " ($4/($1/1024/1024)), ": " $3}' > extents-per-mb.txt

# THE 6TH COLUMN IS E/M WHICH IS THE EXTENTS PER MEGABYTE, LETS SORT BY IT
cat extents-per-mb.txt | sort -nk6

# OUTPUT OF THAT (I SNIPPED SOME OF THE INBETWEEN STUFF TO SHOW YOU MOST AND LEST E/M)

cat extents-per-mb.txt | sort -nk6

# LEGEND: M is size in megabytes (base 2, mebibytes MiB), e is filefrag extents, and e/M is extents per megabytes (base 2, mebibytes MiB)

M: 15214.2 e: 5971 e/M: 0.392462 : /Volume_2/Volume_2/run/video/Camera8/full/2014-03-15.dat: read speed test at 70MB/s
M: 15214.2 e: 5971 e/M: 0.392462 : /Volume_2/Volume_2/run/video/Camera8/full/2014-03-15.dat:
M: 15032.8 e: 6902 e/M: 0.459128 : /Volume_1/Volume_1/run/video/Camera2/full/2014-02-24.dat:
M: 15032.8 e: 6902 e/M: 0.459128 : /Volume_1/Volume_1/run/video/Camera2/full/2014-02-24.dat:
M: 14601.5 e: 6786 e/M: 0.464747 : /Volume_2/Volume_2/run/video/Camera5/full/2014-03-17.dat:
M: 14601.5 e: 6786 e/M: 0.464747 : /Volume_2/Volume_2/run/video/Camera5/full/2014-03-17.dat:
M: 15035.8 e: 7043 e/M: 0.468414 : /Volume_1/Volume_1/run/video/Camera4/full/2014-02-24.dat:
M: 15035.8 e: 7043 e/M: 0.468414 : /Volume_1/Volume_1/run/video/Camera4/full/2014-02-24.dat:
M: 14640.8 e: 6881 e/M: 0.469989 : /Volume_2/Volume_2/run/video/Camera6/full/2014-03-17.dat:
M: 14640.8 e: 6881 e/M: 0.469989 : /Volume_2/Volume_2/run/video/Camera6/full/2014-03-17.dat:
M: 15031.8 e: 7180 e/M: 0.477653 : /Volume_1/Volume_1/run/video/Camera3/full/2014-02-24.dat:
M: 15031.8 e: 7180 e/M: 0.477653 : /Volume_1/Volume_1/run/video/Camera3/full/2014-02-24.dat:
M: 15032 e: 7233 e/M: 0.481172 : /Volume_1/Volume_1/run/video/Camera9/full/2014-02-24.dat:
M: 15032 e: 7233 e/M: 0.481172 : /Volume_1/Volume_1/run/video/Camera9/full/2014-02-24.dat:
...snip....
M: 583.268 e: 8941 e/M: 15.3291 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-12.dat:
M: 583.268 e: 8941 e/M: 15.3291 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-12.dat:
M: 580.004 e: 8908 e/M: 15.3585 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-22.dat:
M: 580.004 e: 8908 e/M: 15.3585 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-22.dat:
M: 149.335 e: 2338 e/M: 15.6561 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-03-18.dat:
M: 149.335 e: 2338 e/M: 15.6561 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-03-18.dat:
M: 544.418 e: 8535 e/M: 15.6773 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-03-01.dat:
M: 544.418 e: 8535 e/M: 15.6773 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-03-01.dat:
M: 578.828 e: 9318 e/M: 16.0981 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-25.dat:
M: 578.828 e: 9318 e/M: 16.0981 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-25.dat:
M: 579.623 e: 9337 e/M: 16.1087 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-21.dat:
M: 579.623 e: 9337 e/M: 16.1087 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-21.dat:
M: 571.801 e: 9875 e/M: 17.27 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-11.dat:
M: 571.801 e: 9875 e/M: 17.27 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-02-11.dat:
M: 612.831 e: 15596 e/M: 25.4491 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-03-09.dat:
M: 612.831 e: 15596 e/M: 25.4491 : /Volume_2/Volume_2/run/video/Camera8/mini/2014-03-09.dat: 20 MB/s
M: 8.21735 e: 1865 e/M: 226.959 : /Volume_1/Volume_1/run/video/Camera2/mini/2014-03-17.dat:
M: 8.21735 e: 1865 e/M: 226.959 : /Volume_1/Volume_1/run/video/Camera2/mini/2014-03-17.dat:
M: 8.3009 e: 1895 e/M: 228.289 : /Volume_1/Volume_1/run/video/Camera9/mini/2014-03-17.dat:
M: 8.3009 e: 1895 e/M: 228.289 : /Volume_1/Volume_1/run/video/Camera9/mini/2014-03-17.dat:
M: 8.84311 e: 2091 e/M: 236.455 : /Volume_1/Volume_1/run/video/Camera9/mini/2014-03-15.dat:
M: 8.84311 e: 2091 e/M: 236.455 : /Volume_1/Volume_1/run/video/Camera9/mini/2014-03-15.dat:
M: 7.42319 e: 1772 e/M: 238.711 : /Volume_2/Volume_2/run/video/Camera7/mini/2014-03-14.dat:
M: 7.42319 e: 1772 e/M: 238.711 : /Volume_2/Volume_2/run/video/Camera7/mini/2014-03-14.dat: read speed test at 70KB/s

 =====================================
ONE LINERS FRAGMENTATION ANALYZERS
======================================

These are taken from the script below. I just connected some of the commands with pipes. There was no point in tying the script below with pipes because have to wait for sort to complete with the script to see any output. Since we are not sorting here, we can get a live line by line update.

Just set the PATH1 variable below for the folders you want to analyze.

NOTE: Pick one of the red sections below (to show different results), then pick one of the 3 one liners (edit the custom one) to pick what files are read (what directories are skipped from scanning). This happens with a modified find command which is set to skip certain phrases in the pathname (such as /snapshot/). Why 3 different find varitions ones (because the name of your snapshots could be different): default one that doesnt skip anything, readynas one thats custom made for OS6, and a custom one in case you want to use it on your own system (most likely you dont follow the same snapshot naming as the ones on the readynas). Also for more info on the different find command variations read the comment section in the script at the bottom of the page. The comment section is titled “Pick your find”

FILEFRAG ORGANIZED BETTER, order of output (per line): bytes, extents, filename

# DEFAULT find command (USE THIS IF NOT SURE):
PATH1=/data; find $PATH1 -type f -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p'

# readynas os6 ABSOLUTE find command (use this to skip certain snapshot directories):
PATH1=/data; find `readlink -f $PATH1` -type f -not -iwholename "/*/._share/*/.snapshot/*" -not -iwholename "/*/*/.snapshots/*/snapshot/*" -not -iwholename "/*/*/snapshot/*" -not -iwholename "/*/._snap-*/*" -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p'

# CUSTOM find command (use this to skip certain snapshot directories - replcae snapshotties with your custom snapshot word):
PATH1=/data; find `readlink -f $PATH1` -type f -not -iwholename "*/snapshotties/*" -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p'

FILEFRAG WITH EXTENTS/MEGABYTE (binary), order of output (per line): bytes, extents, extents/megabyte(binary), filename

# DEFAULT find command (USE THIS IF NOT SURE):
PATH1=/data; find $PATH1 -type f -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p' | awk '{FIRST=$1; THIRD=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); SIXPLUS=$0; print FIRST/1024/1024 " MB " THIRD " extents " (THIRD/(FIRST/1024/1024)) " e/MB - " SIXPLUS}'

# readynas os6 ABSOLUTE find command (use this to skip certain snapshot directories):
PATH1=/data; find `readlink -f $PATH1` -type f -not -iwholename "/*/._share/*/.snapshot/*" -not -iwholename "/*/*/.snapshots/*/snapshot/*" -not -iwholename "/*/*/snapshot/*" -not -iwholename "/*/._snap-*/*" -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p' | awk '{FIRST=$1; THIRD=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); SIXPLUS=$0; print FIRST/1024/1024 " MB " THIRD " extents " (THIRD/(FIRST/1024/1024)) " e/MB - " SIXPLUS}'

# CUSTOM find command (use this to skip certain snapshot directories - replcae snapshotties with your custom snapshot word):
PATH1=/data; find `readlink -f $PATH1` -type f -not -iwholename "*/snapshotties/*" -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p' | awk '{FIRST=$1; THIRD=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); SIXPLUS=$0; print FIRST/1024/1024 " MB " THIRD " extents " (THIRD/(FIRST/1024/1024)) " e/MB - " SIXPLUS}'

FILEFRAG WITH EXTENTS/MEGABYTE (binary), order of output (per line): bytes, extents, bytes/extent, filename

# DEFAULT find command (USE THIS IF NOT SURE):
PATH1=/data; find $PATH1 -type f -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p' | awk '{FIRST=$1; THIRD=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); SIXPLUS=$0; print FIRST " Bytes " THIRD " extents " FIRST/THIRD " B/e - " SIXPLUS}'

# readynas os6 ABSOLUTE find command (use this to skip certain snapshot directories):
PATH1=/data; find `readlink -f $PATH1` -type f -not -iwholename "/*/._share/*/.snapshot/*" -not -iwholename "/*/*/.snapshots/*/snapshot/*" -not -iwholename "/*/*/snapshot/*" -not -iwholename "/*/._snap-*/*" -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p' | awk '{FIRST=$1; THIRD=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); SIXPLUS=$0; print FIRST " Bytes " THIRD " extents " FIRST/THIRD " B/e - " SIXPLUS}'

# CUSTOM find command (use this to skip certain snapshot directories - replcae snapshotties with your custom snapshot word):
PATH1=/data; find `readlink -f $PATH1` -type f -not -iwholename "*/snapshotties/*" -printf "%s - " -exec filefrag {} \; | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p' | awk '{FIRST=$1; THIRD=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); SIXPLUS=$0; print FIRST " Bytes " THIRD " extents " FIRST/THIRD " B/e - " SIXPLUS}'

===================================================
THE SCRIPT – GENERAL – APPLIES TO ALL LINUX SERVERS  <—— Outdated script. View GIST link at the top for the latest script.
===================================================

Here is a script that can analyze the fragmentation in any folder and dump the results into another. It classifies all fragmented files as ones with more than 1 extent. Please check out the informative comments at the top of the script to see how to run this script and to understand the output of the command (as it writes several files that can be used for analysis, the most informative output files have “STATS” in the file name).

Note that this script looks through any folder you ask it to look through. It does so with the default find command (which just enumarates all the files) – default because its a simply find without any filters to skip any files or folders. There are 3 more find commands that follow it for specific use cases (with different filters) such as one for ReadyNAS OS 6 (you wouldnt want to enumarate files in snapshot folders, so it filters out the possible snapshot folder names). Likewise you can use those 4 find commands to make your own. In the end just make sure there is only 1 find command (the other ones should be commented out). The ReadyNAS OS6 find command is uncommented in the script thats in the section below this one.

Here is a summary of the 4 find command

  • Default find command: enumarates all files (no filters). This is the find command we use in the mostfragged.sh script. In this mostfragged.sh script this one is uncommented and the 3 below are commented out (as we can only have 1 find command)
  • Relative ReadyNAS OS6 find command: enumarates all files but skips snapshot by using relative filters. more details in section below.
  • Absolute ReadyNAS OS6 find command:enumartes all files but skips snapshots by using absolute filters. more details in section below. This is the find command that we use in the ReadyNAS OS6 mostfragged script in the section below.
  • Custom find command: use this find command to make your own filter. In case you dont want to enumarate all of the files & skip certain files or folders in the target folder. You can use this and the above commands as a guide/template into making your own. Just dont forget in the end you can only have 1 find command.

So why do we use this find command? These find command enumarate the files which are asked and then the results file by file are pumped into a program called “filefrag” which outputs the name of the file and the number of extents of that file. We use the number of extents in the calculation of fragmentation.

Make the file mostfragged.sh with this content. Also you can download mostfragged.sh using this method:  wget http://www.infotinks.com/scripts/mostfragged.sh Here is the content of the script:

#!/bin/bash
#
# Filename: mostfragged.sh
#
# What this does:
# Runs an analysis on files by looking at thier sizes and number of extents (using filefrag tool), then it organizes the data to help find which files are most fragmented (extents per mb)
#
# Instructions:
# 1. Copy paste contents into a script called mostfragged.sh (or whatever name you wnat)
# 2. Read through the "Pick a Find" section below (and follow its steps if you want to skip reading certain folders, like snapshot folders)
# 3. Save the script
# 4. Run the script: mostfragged.sh <directory to analyze>
# 5. Wait for results that will get dumped to current directory
# 6. Read the MF-FINAL results for the best info, and the MF-0000# for other information
#
# Extra Info:
# /tmp files are generated and final files are generated into current directory (where mostfragged.sh is ran from - not where mostfragged.sh is located)
#
# How to read data:
# MB is size in megabytes (base 2, mebibytes MiB), e is filefrag extents, and e/MB is extents per megabytes (base 2, mebibytes MiB)
#
# Usage: $0 <directory to analyze> [optional directory to dump files to]
# Example 1: $0 /data
# Example 2: $0 /data results1
# Example 2 would dump data to `pwd`/results1
# Example 3: $0 /data /tmp/results
# Example 3 would dump data to /tmp/results
#
# Example output of a couple of the analysis files:
# cat MF-EXTRA--_data_Main_Downloads--average-no01.txt
# Average Number of Extents: 121.417
# Average Filesize in Bytes: 5.03206e+07
# Average Extents Per Byte: 2.41287e-06
# Average Bytes per Extent: 414444
# Average KiB per Extent: 404.731
# Average MiB per Extent: 0.395245
# Average GiB per Extent: 0.000385981
# Total Number of Files: 187
# Total Size of files: 9.40996e+09 Bytes, 9.18941e+06 KiB, 8974.04 MiB, 8.76371 GiB, 0.00855831 TiB
# Total Number of Extents: 22705
#
# Also added percent fragmentation based on the number of files (number of fragmentated files / total number of files). 
# Using same simple formula calculated percent fragmentation based on the number of extents (number of fragmented extents / total number of extents)
# Using same formula for percent fragmentation based on the number of bytes. This will let us know how many bytes (storage bytes) are acctually fragmented.
# the 3 numbers of fragmentation percentage seem reasonable
# so I also made an average fragmentation percentage which is the average of the 3 numbers.
# Here is the output in the STATS files (both of the STATS files have the same #s for fragmentation percentage)
# Example output in STATS file (also this output is shown at then end of the mostfragged run on the screen)
# * % File Fragmentation: 42.7665
# * % Extent Fragmentation: 49.3532
# * % Size Fragmentation: 49.8358
# * % Average Fragmentation: 47.3185
#
# Release Notes:
# UPDATE 8-21-2014: added support to skip snapshot folders for READYNAS OS6 scanning (just need to comment out default find, and uncomment the correct find). Read section below called pick a find.
# UPDATE 8-20-2014: fixed files with spaces (see #### comments below, ## for style change). dumping all results to same folder & better useability (check for argument, etc...)
# UPDATE 5-27-2014: filenames output are better now in script (make more sense)
# UPDATE 10-9-2014: PART1: changed normal find to verbose (With tee and nl), kept "SILENT" find but in comments below each find, also added "error to trash" on first readlink to minimize error output
# UPDATE 10-9-2014: PART2: added EXTRA info output (alot more files), added ability to dump to another directory (made showusage function), better output on screen (increases time to completion with all this)
# UPDATE 10-10-2014: clearing up outputs
# UPDATE 10-13-2014: minor adjustments in outputs - added FOOTER to stats files
# UPDATE 5-14-2015: added new STATS files with % fragmentation calculations
#
####################
# HELPER FUNCTIONS #
####################
#
function showusage {
    echo "Usage: $0 <directory to analyze> [optional directory to dump files to]";
    echo "Example 1: $0 /data";
    echo "Example 2: $0 /data results1";
    echo "Example 2 would dump data to `pwd`/results1"
    echo "Example 3: $0 /data /tmp/results";
    echo "Example 3 would dump data to /tmp/results"
}
#
###################
# THE MAIN SCRIPT #
###################
#
# --- set location of scan (To absolute path of given folder) ---
LOCATION=`readlink -f "$1" 2> /dev/null`;
DDIR="$2";
if [ -z ${LOCATION} ]; then
    echo "ERROR: Missing argument";
    showusage;
    exit 1;
fi;
if [ ! -d ${LOCATION} ]; then
    echo "ERROR: directory doesn't exist"
    showusage;
    exit 1;
fi;
if [ -z $DDIR ]; then
    DPREFIX=`pwd`;
    echo "Dumping output files here: $DPREFIX";
else
    mkdir -p $DDIR 2> /dev/null;
    DPREFIX=`readlink -f $DDIR`;
    echo "Dumping Output files to: $DPREFIX";
fi;
D81S=`date +s%s-d%D-t%T | tr /: -`
D81=`date +d%Y-%m-%d-t%T | tr /: -`
D81Ss=$(date +"%s")

# --- set file prefix (to a string that includes the full file path that can make do as a filename) ---
PATH2STRING=$(echo ${LOCATION} | sed -e 's/[^A-Za-z0-9._-]/_/g')
pref1x="MF-00001--${D81}--${PATH2STRING}"
pref11x="MF-00002--${D81}--${PATH2STRING}"
pref2x="MF-00003--${D81}--${PATH2STRING}"
pref3x="MF-00004--${D81}--${PATH2STRING}"
pref4x="MF-00005--${D81}--${PATH2STRING}"
pref5x="MF-FINAL--${D81}--${PATH2STRING}"
pref6x="MF-FINAL--${D81}--${PATH2STRING}"
# filenames
file1="${DPREFIX}/${pref1x}--extents-and-sizes.txt"    # old filename: /tmp/extents-${D81}.txt
file11="${DPREFIX}/${pref11x}--extents-and-sizes-organized.txt" # this is like step 1.5 (to fix files with spaces)
file2="${DPREFIX}/${pref2x}--SORTED-by-extents.txt"    # old filename: /tmp/sortbyextents-${D81}.txt
file3="${DPREFIX}/${pref3x}--bytes-per-extent.txt"    # old filename: /tmp/bytes-per-extent-${D81}.txt
file4="${DPREFIX}/${pref4x}--extents-per-mb.txt"    # old filename: /tmp/extents-per-mb-${D81}.txt
file5="${DPREFIX}/${pref5x}--extents-per-mb-SORTED.txt"    # old filename: `pwd`/mostfragged-extents-per-mb-${D81}.txt
file6="${DPREFIX}/${pref6x}--bytes-per-extent-SORTED.txt"    # old filename: `pwd`/mostfragged-bytes-per-extent-${D81}.txt
# extra files
new0="${DPREFIX}/MF-EXTRA--${PATH2STRING}--0-extent-files.txt"
new1="${DPREFIX}/MF-EXTRA--${PATH2STRING}--1-extent-files.txt"
new2="${DPREFIX}/MF-EXTRA--${PATH2STRING}--all-fragmented-files-more-than-1-extent.txt"
new3="${DPREFIX}/MF-EXTRA--${PATH2STRING}--100-plus-extent-files.txt"
new3s="${DPREFIX}/MF-EXTRA--${PATH2STRING}--100-plus-SORTED-extent-files.txt"
new4="${DPREFIX}/MF-EXTRA--${PATH2STRING}--1000-plus-extent-files.txt"
new4s="${DPREFIX}/MF-EXTRA--${PATH2STRING}--1000-plus-SORTED-extent-files.txt"
new5="${DPREFIX}/MF-EXTRA--${PATH2STRING}--STATS-of-all-fragmented-files-more-than-1-extent.txt"
new5a="${DPREFIX}/MF-EXTRA--${PATH2STRING}--STATS-of-all-files.txt"

# --------- work-begins ---
echo "-----------START INFO:---------------------"
echo "START INFO: Starting fragmentation analysis on: ${LOCATION}"
echo "START INFO: This might take a while (seconds to hours - depending on amount of data & fragmentation)"
echo "START INFO: Dumping data to current directory: `pwd`"
echo "-----------START:--------------------------"
# ttt is the total number of steps (its a constant)
ttt=17
echo "* [`date +%D-%T`][`date +%s`] STARTING 1/${ttt}: Analyzing $LOCATION for sizes and number of extents"

################################################
################################################
# ==============Pick a find:===================#
# Scanning with find will go through the entire
# directory tree, it will even scan the millions
# of copies that could exist due to snapshots
# The default find will scan everything.
# If you want to skip snapshot folders then pick
# one of the below finds (other then the default
# one). If on a Readynas I recommend using the
# ABSOLUTE FIND instead of the RELATIVE FIND.
# The RELATIVE FIND will skip all folders with
# the words snapshot/.snapshot/.snapshots (Even
# if the are customer made). The ABSOLUTE FIND
# will make sure that they are the NETGEAR made
# snapshot folder by looking into how deep into
# the filesystem the folders are as well. If
# Not on a READYNAS and your snapshots dont
# adhere to the folder notation: snapshot
# or .snapshot or .snapshots then you can make
# your own CUSTOM FIND
################################################
################################################
# **************** Default Find ************** # - comment this out if not on a readynas (or else this script will go on for forever through all of the snapshots)
find $LOCATION -type f -printf "%s - " -exec filefrag {} \; > ${file1} | tee "${file1}" | nl
################################################
################################################
# **************** RELATIVE FIND to skip snapshots for Readynas OS6 ************** #
# - since this is relative this will skip all directories names snapshot/.snapshots/.snapshot (even if its customers own dirs)
# find $LOCATION -type f -not -iwholename "*/.snapshot/*" -not -iwholename "*/.snapshots/*" -not -iwholename "*/snapshot/*" -not -iwholename "*/._snap-*/*" -printf "%s - " -exec filefrag {} \; > ${file1} | tee "${file1}" | nl
################################################
################################################
# **************** ABSOLUTE FIND to skip snapshots for Readynas OS6 ************** # <----------------------------- pick this one for readynas os6 best results
# - since this is absolute this will skip only netgear named snapshot/.snapshots/.snapshot folders
# find $LOCATION -type f -not -iwholename "/*/._share/*/.snapshot/*" -not -iwholename "/*/*/.snapshots/*/snapshot/*" -not -iwholename "/*/*/snapshot/*" -not -iwholename "/*/._snap-*/*" -printf "%s - " -exec filefrag {} \; | tee "${file1}" | nl
################################################
################################################
# **************** CUSTOM FIND for Snapshots ************** #
# - change the below snaps to whatever you call your snapshot folder (so if you call them snapshotties, then put in "*/snapshotties/*"
# find $LOCATION -type f -not -iwholename "*/snaps/*" -printf "%s - " -exec filefrag {} \; > ${file1} | tee "${file1}" | nl
################################################
################################################
#### added below to fix files with spaces

echo "* [`date +%D-%T`][`date +%s`] 2/${ttt}: Organizing each line for readability"
cat ${file1} | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p' > "${file11}"

echo "* [`date +%D-%T`][`date +%s`] 3/${ttt}: Finding none 0 and none 1 extent files (for main analysis)"
cat ${file11} | grep -v " 1 extents " | grep -v " 0 extents " > "${new2}"

echo "* [`date +%D-%T`][`date +%s`] 4/${ttt}: Putting all 0 extent files to a file (for record keeping, and curiousity)"
cat ${file11} | grep " 0 extents " > "${new0}"

echo "* [`date +%D-%T`][`date +%s`] 5/${ttt}: Putting all 1 extent files to a file (for record keeping, and curiousity)"
cat ${file11} | grep " 1 extents " > "${new1}"

############ ----------- old awk (squished output for stats) - start ------------ #################

###### AWKSTRING='BEGIN{i=0;x=0;s=0;}{i=i+1;x=x+$3;s=s+$1;}END{isum=i;xsum=x;xav=xsum/isum;ssum=s;sav=ssum/isum;xps=xsum/ssum;spx=ssum/xsum; print "Stats of fragmentation (Everything is Average & Totaled per file)"; print "Average Number of Extents: " xav " (less is best)"; print "Average Filesize: " sav " Bytes, " sav/1024 " KiB, " sav/1024/1024 " MiB, " sav/1024/1024/1024 " GiB, " sav/1024/1024/1024/1024 " TiB"; print "Average Extents Per Byte: " xps " (this value does not mean much, bigger is better)"; print "Average Size of an Extent: " spx " Bytes, " spx/1024 " KiB, " spx/1024/1024 " MiB, " spx/1024/1024/1024 " GiB, " spx/1024/1024/1024/1024 " TiB (bigger is better)"; print "Total Number of Files: " isum; print "Total Size of files: " ssum " Bytes, " ssum/1024 " KiB, " ssum/1024/1024 " MiB, " ssum/1024/1024/1024 " GiB, " ssum/1024/1024/1024/1024 " TiB"; print "Total Number of Extents: " xsum " (less is best)"}'

###### echo "* [`date +%D-%T`][`date +%s`] 6/${ttt}: Calculating Average Extents in files more than 1 extent"
cat ${new2} | awk "$AWKSTRING"  > "${new5}"

###### echo "* [`date +%D-%T`][`date +%s`] 7/${ttt}: Calculating Average Extents from all files (0 and 1 extents includes)"
cat ${file11} | awk "$AWKSTRING" > "${new5a}"

############ ----------- old awk (squished output for stats) - end ------------ #################

############ ----------- testing zone start ------------ #################

# Uncomment above ###### and comment below
# I like below output better :-) update 2014-10-13

TESTAWKSTRING='BEGIN{i=0;x=0;s=0;}{i=i+1;x=x+$3;s=s+$1;}END{isum=i;xsum=x;xav=xsum/isum;ssum=s;sav=ssum/isum;xps=xsum/ssum;spx=ssum/xsum; print "Stats of fragmentation (Everything is Average & Totaled per file)"; print "Average Number of Extents: " xav " (less is best)"; print "Average Filesize: " sav " Bytes"; print "Average Filesize: " sav/1024 " KiB"; print "Average Filesize: " sav/1024/1024 " MiB"; print "Average Filesize: " sav/1024/1024/1024 " GiB"; print "Average Filesize: " sav/1024/1024/1024/1024 " TiB"; print "Average Extents Per Byte: " xps " (this value does not mean much, bigger is better)"; print "Average Size of an Extent: " spx " Bytes (bigger is better)"; print "Average Size of an Extent: " spx/1024 " KiB (bigger is better)"; print "Average Size of an Extent: " spx/1024/1024 " MiB (bigger is better)"; print "Average Size of an Extent: " spx/1024/1024/1024 " GiB (bigger is better)"; print "Average Size of an Extent: " spx/1024/1024/1024/1024 " TiB (bigger is better)"; print "Total Number of Files: " isum; print "Total Size of files: " ssum " Bytes"; print "Total Size of files: " ssum/1024 " KiB"; print "Total Size of files: " ssum/1024/1024 " MiB"; print "Total Size of files: " ssum/1024/1024/1024 " GiB"; print "Total Size of files: " ssum/1024/1024/1024/1024 " TiB"; print "Total Number of Extents: " xsum " (less is best)"}'

echo "* [`date +%D-%T`][`date +%s`] 6/${ttt}: Calculating Average Extents in files more than 1 extent"
cat ${new2} | awk "$TESTAWKSTRING"  > "${new5}"

echo "* [`date +%D-%T`][`date +%s`] 7/${ttt}: Calculating Average Extents from all files (0 and 1 extents includes)"
cat ${file11} | awk "$TESTAWKSTRING" > "${new5a}"

############ ----------- testing zone end ------------ #################

FOOTER="FOOTER `date +%s` sec | `date` | $0 $1 $2 | ${LOCATION} | Saving to | ${DPREFIX}"
echo $FOOTER >> "${new5}"
echo $FOOTER >> "${new5a}"

echo "* [`date +%D-%T`][`date +%s`] 8/${ttt}: Finding all files with more than 100 extents"
cat ${new2} | awk '{EXTENTS=$3;THRESHOLD=100;if(EXTENTS>=THRESHOLD){print $0;}}' > "${new3}"

echo "* [`date +%D-%T`][`date +%s`] 9/${ttt}: Finding all files with more than 1000 extents"
cat ${new2} | awk '{EXTENTS=$3;THRESHOLD=1000;if(EXTENTS>=THRESHOLD){print $0;}}' > "${new4}"

echo "* [`date +%D-%T`][`date +%s`] 10/${ttt}: Sorting main output to make ${file2}"
#### changed below from file1 to file11 and -nk4 to -nk3 to fix spaces issue
# COMMENTED WITH "new" CHANGES: to not count 0 and 1 extents file in main calculation
# COMMENTED WITH "new" CHANGES: cat ${file11} | sort -nk3 > "${file2}"
cat ${new2} | sort -nk3 > "${file2}"
# COMMENTED WITH "new" CHANGES: to not do sorting again just using file2 output which is already sorted

echo "* [`date +%D-%T`][`date +%s`] 11/${ttt}: Sorting 100 extent output (not actually running sort alg again for CPU and time saving)"
cat ${file2} | awk '{EXTENTS=$3;THRESHOLD=100;if(EXTENTS>=THRESHOLD){print $0;}}' > "${new3s}"

echo "* [`date +%D-%T`][`date +%s`] 12/${ttt}: Sorting 1000 extent output (not actually running sort alg again for CPU and time saving)"
cat ${file2} | awk '{EXTENTS=$3;THRESHOLD=1000;if(EXTENTS>=THRESHOLD){print $0;}}' > "${new4s}"

echo "* [`date +%D-%T`][`date +%s`] 13/${ttt}: Starting Calculating BYTES PER EXTENT"
#### for files with space: $1 stays, $4 becomes $3, $3 to $6 (need to make it $6+)
#### ---- interesting awk trick (to print nth column to end) start comment ---- ####
# How printed file name (because of spaces it can span more than just the 6th column, but it could be 6th column +): Note that $1=$2=$3=$4=$5="", so that $0 is the 6th column plus the rest, but that leaves left trailing spaces which we remove with gsub(match this,change to this,on this variable)
#### ---- interesting awk trick (to print nth column to end) end comment ---- ####
cat ${file2} | awk '{SIZE=$1; EXTENTS=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); FILENAME=$0; print SIZE " Bytes, " EXTENTS " extents, " SIZE/EXTENTS " B/e - " FILENAME}' > "${file3}"

echo "* [`date +%D-%T`][`date +%s`] 14/${ttt}: Starting Calculating EXTENTS PER MB"
#### for files with space: $1 stays, $4 becomes $3, $3 to $6 (need to make it $6+)
cat ${file2} | awk '{SIZE=$1; EXTENTS=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); FILENAME=$0; print SIZE/1024/1024 " MiB, " EXTENTS " extents, " (EXTENTS/(SIZE/1024/1024)) " Extents/MiB - " FILENAME}' > "${file4}"

echo "* [`date +%D-%T`][`date +%s`] 15/${ttt}: Sorting EXTENTS PER MB output"
# THE 5TH COLUMN IS E/M WHICH IS THE EXTENTS PER MEGABYTE, LETS SORT BY IT
cat ${file4} | sort -nk5 > "${file5}"

echo "* [`date +%D-%T`][`date +%s`] 16/${ttt}: Sorting BYTES PER EXTENT output"
# THE 5TH COLUMN BELOW IS B/E LETS SORT BY IT
cat ${file3} | sort -nk5 > "${file6}"

echo "* [`date +%D-%T`][`date +%s`] 17/${ttt}: DONE"

# ----------------------- calculating new fragmentation numbers - start -----------------#
# number of extents/files/bytes all files or fragmented files (have _F for frag)
# fragmented files have more than 1 extent. what new and new5a contain
# new5 is frag files, new5a is all files
# Total Number of Files: 246409
# Total Size of files: 9.03173e+12 Bytes
# Total Number of Extents: 353389 (less is best)
N_FILES=`cat "${new5a}" | grep 'Total Number of Files' | cut -f2 -d: | awk '{print $1;}'`
N_FILES_F=`cat "${new5}" | grep 'Total Number of Files' | cut -f2 -d: | awk '{print $1;}'`
N_EXT=`cat "${new5a}" | grep 'Total Number of Extents' | cut -f2 -d: | awk '{print $1;}'`
N_EXT_F=`cat "${new5}" | grep 'Total Number of Extents' | cut -f2 -d: | awk '{print $1;}'`
N_BYTE=`cat "${new5a}" | grep 'Total Size of files' | grep 'Bytes' | cut -f2 -d: | awk '{print $1;}'`
N_BYTE_F=`cat "${new5}" | grep 'Total Size of files' | grep 'Bytes' | cut -f2 -d: | awk '{print $1;}'`
echo "$N_FILES $N_FILES_F $N_EXT $N_EXT_F $N_BYTE $N_BYTE_F"
# calculation
N_FILE_FRAG_PERCENT=`echo "${N_FILES_F} ${N_FILES}" | awk '{TOTAL=($1+$2);print (($1/TOTAL)*100);}'`
N_EXT_FRAG_PERCENT=`echo "${N_EXT_F} ${N_EXT}" | awk '{TOTAL=($1+$2);print (($1/TOTAL)*100);}'`
N_BYTE_FRAG_PERCENT=`echo "${N_BYTE_F} ${N_BYTE}" | awk '{TOTAL=($1+$2);print (($1/TOTAL)*100);}'`
N_AVERAGE_PERCENT=`echo "${N_EXT_FRAG_PERCENT} ${N_FILE_FRAG_PERCENT} ${N_BYTE_FRAG_PERCENT}" | awk '{print ($1+$2+$3)/3;}'`
# add same info to both files (at same time with tee -a and >>)
echo "-----------PERCENT FRAGMENTATION:--------------"
echo "* % File Fragmentation: ${N_FILE_FRAG_PERCENT}"
echo "* % File Fragmentation: ${N_FILE_FRAG_PERCENT}"  | tee -a "${new5a}" >> "${new5}"
echo "* % Extent Fragmentation: ${N_EXT_FRAG_PERCENT}"
echo "* % Extent Fragmentation: ${N_EXT_FRAG_PERCENT}" | tee -a "${new5a}" >> "${new5}"
echo "* % Size Fragmentation: ${N_BYTE_FRAG_PERCENT}"
echo "* % Size Fragmentation: ${N_BYTE_FRAG_PERCENT}"  | tee -a "${new5a}" >> "${new5}"
echo "* % Average Fragmentation: ${N_AVERAGE_PERCENT}" 
echo "* % Average Fragmentation: ${N_AVERAGE_PERCENT}" | tee -a "${new5a}" >> "${new5}"
# ----------------------- calculating new fragmentation numbers - end -----------------#

echo "-----------ANALYSIS COMPLETE:--------------"
D81F=`date +s%s-d%D-t%T | tr /: -`
D81Fs=$(date +"%s")
Tdiff=$(($D81Fs-$D81Ss))
echo "* DURATION: $(($Tdiff / 60)) minutes and $(($Tdiff % 60)) seconds = $Tdiff seconds total"
echo "* START TIME: ${D81S}"
echo "* END TIME: ${D81F}"
# ----------------------- log these times to stats - start------------------------#
# --- to 5 --- #
echo "FOOTER | RUN DURATION | $(($Tdiff / 60)) | minutes | $(($Tdiff % 60)) seconds | $Tdiff | seconds total" >> "${new5}"
echo "FOOTER | RUN START TIME | ${D81S}" >> "${new5}"
echo "FOOTER | RUN END TIME | ${D81F}" >> "${new5}"
# --- to 5a --- #
echo "FOOTER | RUN DURATION | $(($Tdiff / 60)) | minutes | $(($Tdiff % 60)) seconds | $Tdiff | seconds total" >> "${new5a}"
echo "FOOTER | RUN START TIME | ${D81S}" >> "${new5a}"
echo "FOOTER | RUN END TIME | ${D81F}" >> "${new5a}"
# ----------------------- log these times to stats - end ------------------------#
echo "-----------NONE IMPORTANT RESULTS:---------"
echo "* The following files are optional and inbetween information (you can choose to delete them with these command):"
echo "rm ${file1}"
echo "rm ${file11}"
echo "rm ${file2}"
echo "rm ${file3}"
echo "rm ${file4}"
echo "-----------EXTRA CURIOUS RESULTS:----------"
echo "* Interersting extra curious results:"
echo "- ${new0}"
echo "- ${new1}"
echo "- ${new2}"
echo "- ${new3}"
echo "- ${new3s}"
echo "- ${new4}"
echo "- ${new4s}"
echo "- ${new5}"
echo "- ${new5a}"
echo "-----------IMPORTANT RESULTS:--------------"
echo "* The MF-FINAL files should be used for analysis:"
echo "(1) ${file5}"
echo "(2) ${file6}"
echo "-----------THE END-------------------------"

===========================
THE SCRIPT – READYNAS OS6 <—— Outdated script. View GIST link at the top for the latest script.
===========================

This script will work for all ReadyNAS OS6 units as it will enumarate all of the files in the specified folder (point the folder at a share or volume). This specific script differs from the top script in that the default find command is commented out (the default find command enumartes every file without a filter) & instead this one uses a modified ReadyNAS find command that enumarates all files except BTRFS snapshot file. This ReadyNAS OS6 mostfragged variation enumartes all files in the folder except it skips snapshot folders (using find filters to skip any possible type of snapshot folder name). Note that there are 2 readynas find commands, the relative and absolute. The relative one filters out snapshots based on relative snapshot folder paths. The absolute one filters out snapshots based on absolute snapshot folder paths. We use (and uncomment) the absolute readynas find command. The relative one is there for my own knowledge (it was a less efficient method of running it). The problem with the relative one is that if a share or subfolder actually contains a folder with one of the catch words in the filter (such as “snapshot”) then that folder is skipped even though that folder will not contain the BTRFS snapshots that we want skipped.

The only thing different in this script from the above section script is that this script comments out the default find command and uncomments the absolute find command (see comments). The other find commands remain commented out.

Tou can also download the mostfragged-rnos6.sh script right here. Or from linux: wget http://www.infotinks.com/scripts/mostfragged-rn6.sh Here is the content of this script:

#!/bin/bash
#
# Filename: mostfragged.sh
#
# What this does:
# Runs an analysis on files by looking at thier sizes and number of extents (using filefrag tool), then it organizes the data to help find which files are most fragmented (extents per mb)
#
# Instructions:
# 1. Copy paste contents into a script called mostfragged.sh (or whatever name you wnat)
# 2. Read through the "Pick a Find" section below (and follow its steps if you want to skip reading certain folders, like snapshot folders)
# 3. Save the script
# 4. Run the script: mostfragged.sh <directory to analyze>
# 5. Wait for results that will get dumped to current directory
# 6. Read the MF-FINAL results for the best info, and the MF-0000# for other information
#
# Extra Info:
# /tmp files are generated and final files are generated into current directory (where mostfragged.sh is ran from - not where mostfragged.sh is located)
#
# How to read data:
# MB is size in megabytes (base 2, mebibytes MiB), e is filefrag extents, and e/MB is extents per megabytes (base 2, mebibytes MiB)
#
# Usage: $0 <directory to analyze> [optional directory to dump files to]
# Example 1: $0 /data
# Example 2: $0 /data results1
# Example 2 would dump data to `pwd`/results1
# Example 3: $0 /data /tmp/results
# Example 3 would dump data to /tmp/results
#
# Example output of a couple of the analysis files:
# cat MF-EXTRA--_data_Main_Downloads--average-no01.txt
# Average Number of Extents: 121.417
# Average Filesize in Bytes: 5.03206e+07
# Average Extents Per Byte: 2.41287e-06
# Average Bytes per Extent: 414444
# Average KiB per Extent: 404.731
# Average MiB per Extent: 0.395245
# Average GiB per Extent: 0.000385981
# Total Number of Files: 187
# Total Size of files: 9.40996e+09 Bytes, 9.18941e+06 KiB, 8974.04 MiB, 8.76371 GiB, 0.00855831 TiB
# Total Number of Extents: 22705
#
# Also added percent fragmentation based on the number of files (number of fragmentated files / total number of files). 
# Using same simple formula calculated percent fragmentation based on the number of extents (number of fragmented extents / total number of extents)
# Using same formula for percent fragmentation based on the number of bytes. This will let us know how many bytes (storage bytes) are acctually fragmented.
# the 3 numbers of fragmentation percentage seem reasonable
# so I also made an average fragmentation percentage which is the average of the 3 numbers.
# Here is the output in the STATS files (both of the STATS files have the same #s for fragmentation percentage)
# Example output in STATS file (also this output is shown at then end of the mostfragged run on the screen)
# * % File Fragmentation: 42.7665
# * % Extent Fragmentation: 49.3532
# * % Size Fragmentation: 49.8358
# * % Average Fragmentation: 47.3185
#
# Release Notes:
# UPDATE 8-21-2014: added support to skip snapshot folders for READYNAS OS6 scanning (just need to comment out default find, and uncomment the correct find). Read section below called pick a find.
# UPDATE 8-20-2014: fixed files with spaces (see #### comments below, ## for style change). dumping all results to same folder & better useability (check for argument, etc...)
# UPDATE 5-27-2014: filenames output are better now in script (make more sense)
# UPDATE 10-9-2014: PART1: changed normal find to verbose (With tee and nl), kept "SILENT" find but in comments below each find, also added "error to trash" on first readlink to minimize error output
# UPDATE 10-9-2014: PART2: added EXTRA info output (alot more files), added ability to dump to another directory (made showusage function), better output on screen (increases time to completion with all this)
# UPDATE 10-10-2014: clearing up outputs
# UPDATE 10-13-2014: minor adjustments in outputs - added FOOTER to stats files
# UPDATE 5-14-2015: added new STATS files with % fragmentation calculations
#
####################
# HELPER FUNCTIONS #
####################
#
function showusage {
    echo "Usage: $0 <directory to analyze> [optional directory to dump files to]";
    echo "Example 1: $0 /data";
    echo "Example 2: $0 /data results1";
    echo "Example 2 would dump data to `pwd`/results1"
    echo "Example 3: $0 /data /tmp/results";
    echo "Example 3 would dump data to /tmp/results"
}
#
###################
# THE MAIN SCRIPT #
###################
#
# --- set location of scan (To absolute path of given folder) ---
LOCATION=`readlink -f "$1" 2> /dev/null`;
DDIR="$2";
if [ -z ${LOCATION} ]; then
    echo "ERROR: Missing argument";
    showusage;
    exit 1;
fi;
if [ ! -d ${LOCATION} ]; then
    echo "ERROR: directory doesn't exist"
    showusage;
    exit 1;
fi;
if [ -z $DDIR ]; then
    DPREFIX=`pwd`;
    echo "Dumping output files here: $DPREFIX";
else
    mkdir -p $DDIR 2> /dev/null;
    DPREFIX=`readlink -f $DDIR`;
    echo "Dumping Output files to: $DPREFIX";
fi;
D81S=`date +s%s-d%D-t%T | tr /: -`
D81=`date +d%Y-%m-%d-t%T | tr /: -`
D81Ss=$(date +"%s")

# --- set file prefix (to a string that includes the full file path that can make do as a filename) ---
PATH2STRING=$(echo ${LOCATION} | sed -e 's/[^A-Za-z0-9._-]/_/g')
pref1x="MF-00001--${D81}--${PATH2STRING}"
pref11x="MF-00002--${D81}--${PATH2STRING}"
pref2x="MF-00003--${D81}--${PATH2STRING}"
pref3x="MF-00004--${D81}--${PATH2STRING}"
pref4x="MF-00005--${D81}--${PATH2STRING}"
pref5x="MF-FINAL--${D81}--${PATH2STRING}"
pref6x="MF-FINAL--${D81}--${PATH2STRING}"
# filenames
file1="${DPREFIX}/${pref1x}--extents-and-sizes.txt"    # old filename: /tmp/extents-${D81}.txt
file11="${DPREFIX}/${pref11x}--extents-and-sizes-organized.txt" # this is like step 1.5 (to fix files with spaces)
file2="${DPREFIX}/${pref2x}--SORTED-by-extents.txt"    # old filename: /tmp/sortbyextents-${D81}.txt
file3="${DPREFIX}/${pref3x}--bytes-per-extent.txt"    # old filename: /tmp/bytes-per-extent-${D81}.txt
file4="${DPREFIX}/${pref4x}--extents-per-mb.txt"    # old filename: /tmp/extents-per-mb-${D81}.txt
file5="${DPREFIX}/${pref5x}--extents-per-mb-SORTED.txt"    # old filename: `pwd`/mostfragged-extents-per-mb-${D81}.txt
file6="${DPREFIX}/${pref6x}--bytes-per-extent-SORTED.txt"    # old filename: `pwd`/mostfragged-bytes-per-extent-${D81}.txt
# extra files
new0="${DPREFIX}/MF-EXTRA--${PATH2STRING}--0-extent-files.txt"
new1="${DPREFIX}/MF-EXTRA--${PATH2STRING}--1-extent-files.txt"
new2="${DPREFIX}/MF-EXTRA--${PATH2STRING}--all-fragmented-files-more-than-1-extent.txt"
new3="${DPREFIX}/MF-EXTRA--${PATH2STRING}--100-plus-extent-files.txt"
new3s="${DPREFIX}/MF-EXTRA--${PATH2STRING}--100-plus-SORTED-extent-files.txt"
new4="${DPREFIX}/MF-EXTRA--${PATH2STRING}--1000-plus-extent-files.txt"
new4s="${DPREFIX}/MF-EXTRA--${PATH2STRING}--1000-plus-SORTED-extent-files.txt"
new5="${DPREFIX}/MF-EXTRA--${PATH2STRING}--STATS-of-all-fragmented-files-more-than-1-extent.txt"
new5a="${DPREFIX}/MF-EXTRA--${PATH2STRING}--STATS-of-all-files.txt"

# --------- work-begins ---
echo "-----------START INFO:---------------------"
echo "START INFO: Starting fragmentation analysis on: ${LOCATION}"
echo "START INFO: This might take a while (seconds to hours - depending on amount of data & fragmentation)"
echo "START INFO: Dumping data to current directory: `pwd`"
echo "-----------START:--------------------------"
# ttt is the total number of steps (its a constant)
ttt=17
echo "* [`date +%D-%T`][`date +%s`] STARTING 1/${ttt}: Analyzing $LOCATION for sizes and number of extents"

################################################
################################################
# ==============Pick a find:===================#
# Scanning with find will go through the entire
# directory tree, it will even scan the millions
# of copies that could exist due to snapshots
# The default find will scan everything.
# If you want to skip snapshot folders then pick
# one of the below finds (other then the default
# one). If on a Readynas I recommend using the
# ABSOLUTE FIND instead of the RELATIVE FIND.
# The RELATIVE FIND will skip all folders with
# the words snapshot/.snapshot/.snapshots (Even
# if the are customer made). The ABSOLUTE FIND
# will make sure that they are the NETGEAR made
# snapshot folder by looking into how deep into
# the filesystem the folders are as well. If
# Not on a READYNAS and your snapshots dont
# adhere to the folder notation: snapshot
# or .snapshot or .snapshots then you can make
# your own CUSTOM FIND
################################################
################################################
# **************** Default Find ************** # - comment this out if not on a readynas (or else this script will go on for forever through all of the snapshots)
# find $LOCATION -type f -printf "%s - " -exec filefrag {} \; > ${file1} | tee "${file1}" | nl
################################################
################################################
# **************** RELATIVE FIND to skip snapshots for Readynas OS6 ************** #
# - since this is relative this will skip all directories names snapshot/.snapshots/.snapshot (even if its customers own dirs)
# find $LOCATION -type f -not -iwholename "*/.snapshot/*" -not -iwholename "*/.snapshots/*" -not -iwholename "*/snapshot/*" -not -iwholename "*/._snap-*/*" -printf "%s - " -exec filefrag {} \; > ${file1} | tee "${file1}" | nl
################################################
################################################
# **************** ABSOLUTE FIND to skip snapshots for Readynas OS6 ************** # <----------------------------- pick this one for readynas os6 best results
# - since this is absolute this will skip only netgear named snapshot/.snapshots/.snapshot folders
find $LOCATION -type f -not -iwholename "/*/._share/*/.snapshot/*" -not -iwholename "/*/*/.snapshots/*/snapshot/*" -not -iwholename "/*/*/snapshot/*" -not -iwholename "/*/._snap-*/*" -printf "%s - " -exec filefrag {} \; | tee "${file1}" | nl
################################################
################################################
# **************** CUSTOM FIND for Snapshots ************** #
# - change the below snaps to whatever you call your snapshot folder (so if you call them snapshotties, then put in "*/snapshotties/*"
# find $LOCATION -type f -not -iwholename "*/snaps/*" -printf "%s - " -exec filefrag {} \; > ${file1} | tee "${file1}" | nl
################################################
################################################
#### added below to fix files with spaces

echo "* [`date +%D-%T`][`date +%s`] 2/${ttt}: Organizing each line for readability"
cat ${file1} | sed -n 's/^\([0-9]*\) - \(.*\): \([0-9]*\) extent[s]* found$/\1 bytes \3 extents - \2/p' > "${file11}"

echo "* [`date +%D-%T`][`date +%s`] 3/${ttt}: Finding none 0 and none 1 extent files (for main analysis)"
cat ${file11} | grep -v " 1 extents " | grep -v " 0 extents " > "${new2}"

echo "* [`date +%D-%T`][`date +%s`] 4/${ttt}: Putting all 0 extent files to a file (for record keeping, and curiousity)"
cat ${file11} | grep " 0 extents " > "${new0}"

echo "* [`date +%D-%T`][`date +%s`] 5/${ttt}: Putting all 1 extent files to a file (for record keeping, and curiousity)"
cat ${file11} | grep " 1 extents " > "${new1}"

############ ----------- old awk (squished output for stats) - start ------------ #################

###### AWKSTRING='BEGIN{i=0;x=0;s=0;}{i=i+1;x=x+$3;s=s+$1;}END{isum=i;xsum=x;xav=xsum/isum;ssum=s;sav=ssum/isum;xps=xsum/ssum;spx=ssum/xsum; print "Stats of fragmentation (Everything is Average & Totaled per file)"; print "Average Number of Extents: " xav " (less is best)"; print "Average Filesize: " sav " Bytes, " sav/1024 " KiB, " sav/1024/1024 " MiB, " sav/1024/1024/1024 " GiB, " sav/1024/1024/1024/1024 " TiB"; print "Average Extents Per Byte: " xps " (this value does not mean much, bigger is better)"; print "Average Size of an Extent: " spx " Bytes, " spx/1024 " KiB, " spx/1024/1024 " MiB, " spx/1024/1024/1024 " GiB, " spx/1024/1024/1024/1024 " TiB (bigger is better)"; print "Total Number of Files: " isum; print "Total Size of files: " ssum " Bytes, " ssum/1024 " KiB, " ssum/1024/1024 " MiB, " ssum/1024/1024/1024 " GiB, " ssum/1024/1024/1024/1024 " TiB"; print "Total Number of Extents: " xsum " (less is best)"}'

###### echo "* [`date +%D-%T`][`date +%s`] 6/${ttt}: Calculating Average Extents in files more than 1 extent"
cat ${new2} | awk "$AWKSTRING"  > "${new5}"

###### echo "* [`date +%D-%T`][`date +%s`] 7/${ttt}: Calculating Average Extents from all files (0 and 1 extents includes)"
cat ${file11} | awk "$AWKSTRING" > "${new5a}"

############ ----------- old awk (squished output for stats) - end ------------ #################

############ ----------- testing zone start ------------ #################

# Uncomment above ###### and comment below
# I like below output better :-) update 2014-10-13

TESTAWKSTRING='BEGIN{i=0;x=0;s=0;}{i=i+1;x=x+$3;s=s+$1;}END{isum=i;xsum=x;xav=xsum/isum;ssum=s;sav=ssum/isum;xps=xsum/ssum;spx=ssum/xsum; print "Stats of fragmentation (Everything is Average & Totaled per file)"; print "Average Number of Extents: " xav " (less is best)"; print "Average Filesize: " sav " Bytes"; print "Average Filesize: " sav/1024 " KiB"; print "Average Filesize: " sav/1024/1024 " MiB"; print "Average Filesize: " sav/1024/1024/1024 " GiB"; print "Average Filesize: " sav/1024/1024/1024/1024 " TiB"; print "Average Extents Per Byte: " xps " (this value does not mean much, bigger is better)"; print "Average Size of an Extent: " spx " Bytes (bigger is better)"; print "Average Size of an Extent: " spx/1024 " KiB (bigger is better)"; print "Average Size of an Extent: " spx/1024/1024 " MiB (bigger is better)"; print "Average Size of an Extent: " spx/1024/1024/1024 " GiB (bigger is better)"; print "Average Size of an Extent: " spx/1024/1024/1024/1024 " TiB (bigger is better)"; print "Total Number of Files: " isum; print "Total Size of files: " ssum " Bytes"; print "Total Size of files: " ssum/1024 " KiB"; print "Total Size of files: " ssum/1024/1024 " MiB"; print "Total Size of files: " ssum/1024/1024/1024 " GiB"; print "Total Size of files: " ssum/1024/1024/1024/1024 " TiB"; print "Total Number of Extents: " xsum " (less is best)"}'

echo "* [`date +%D-%T`][`date +%s`] 6/${ttt}: Calculating Average Extents in files more than 1 extent"
cat ${new2} | awk "$TESTAWKSTRING"  > "${new5}"

echo "* [`date +%D-%T`][`date +%s`] 7/${ttt}: Calculating Average Extents from all files (0 and 1 extents includes)"
cat ${file11} | awk "$TESTAWKSTRING" > "${new5a}"

############ ----------- testing zone end ------------ #################

FOOTER="FOOTER `date +%s` sec | `date` | $0 $1 $2 | ${LOCATION} | Saving to | ${DPREFIX}"
echo $FOOTER >> "${new5}"
echo $FOOTER >> "${new5a}"

echo "* [`date +%D-%T`][`date +%s`] 8/${ttt}: Finding all files with more than 100 extents"
cat ${new2} | awk '{EXTENTS=$3;THRESHOLD=100;if(EXTENTS>=THRESHOLD){print $0;}}' > "${new3}"

echo "* [`date +%D-%T`][`date +%s`] 9/${ttt}: Finding all files with more than 1000 extents"
cat ${new2} | awk '{EXTENTS=$3;THRESHOLD=1000;if(EXTENTS>=THRESHOLD){print $0;}}' > "${new4}"

echo "* [`date +%D-%T`][`date +%s`] 10/${ttt}: Sorting main output to make ${file2}"
#### changed below from file1 to file11 and -nk4 to -nk3 to fix spaces issue
# COMMENTED WITH "new" CHANGES: to not count 0 and 1 extents file in main calculation
# COMMENTED WITH "new" CHANGES: cat ${file11} | sort -nk3 > "${file2}"
cat ${new2} | sort -nk3 > "${file2}"
# COMMENTED WITH "new" CHANGES: to not do sorting again just using file2 output which is already sorted

echo "* [`date +%D-%T`][`date +%s`] 11/${ttt}: Sorting 100 extent output (not actually running sort alg again for CPU and time saving)"
cat ${file2} | awk '{EXTENTS=$3;THRESHOLD=100;if(EXTENTS>=THRESHOLD){print $0;}}' > "${new3s}"

echo "* [`date +%D-%T`][`date +%s`] 12/${ttt}: Sorting 1000 extent output (not actually running sort alg again for CPU and time saving)"
cat ${file2} | awk '{EXTENTS=$3;THRESHOLD=1000;if(EXTENTS>=THRESHOLD){print $0;}}' > "${new4s}"

echo "* [`date +%D-%T`][`date +%s`] 13/${ttt}: Starting Calculating BYTES PER EXTENT"
#### for files with space: $1 stays, $4 becomes $3, $3 to $6 (need to make it $6+)
#### ---- interesting awk trick (to print nth column to end) start comment ---- ####
# How printed file name (because of spaces it can span more than just the 6th column, but it could be 6th column +): Note that $1=$2=$3=$4=$5="", so that $0 is the 6th column plus the rest, but that leaves left trailing spaces which we remove with gsub(match this,change to this,on this variable)
#### ---- interesting awk trick (to print nth column to end) end comment ---- ####
cat ${file2} | awk '{SIZE=$1; EXTENTS=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); FILENAME=$0; print SIZE " Bytes, " EXTENTS " extents, " SIZE/EXTENTS " B/e - " FILENAME}' > "${file3}"

echo "* [`date +%D-%T`][`date +%s`] 14/${ttt}: Starting Calculating EXTENTS PER MB"
#### for files with space: $1 stays, $4 becomes $3, $3 to $6 (need to make it $6+)
cat ${file2} | awk '{SIZE=$1; EXTENTS=$3; $1=$2=$3=$4=$5=""; gsub(/^[ \t]+/,"",$0); FILENAME=$0; print SIZE/1024/1024 " MiB, " EXTENTS " extents, " (EXTENTS/(SIZE/1024/1024)) " Extents/MiB - " FILENAME}' > "${file4}"

echo "* [`date +%D-%T`][`date +%s`] 15/${ttt}: Sorting EXTENTS PER MB output"
# THE 5TH COLUMN IS E/M WHICH IS THE EXTENTS PER MEGABYTE, LETS SORT BY IT
cat ${file4} | sort -nk5 > "${file5}"

echo "* [`date +%D-%T`][`date +%s`] 16/${ttt}: Sorting BYTES PER EXTENT output"
# THE 5TH COLUMN BELOW IS B/E LETS SORT BY IT
cat ${file3} | sort -nk5 > "${file6}"

echo "* [`date +%D-%T`][`date +%s`] 17/${ttt}: DONE"

# ----------------------- calculating new fragmentation numbers - start -----------------#
# number of extents/files/bytes all files or fragmented files (have _F for frag)
# fragmented files have more than 1 extent. what new and new5a contain
# new5 is frag files, new5a is all files
# Total Number of Files: 246409
# Total Size of files: 9.03173e+12 Bytes
# Total Number of Extents: 353389 (less is best)
N_FILES=`cat "${new5a}" | grep 'Total Number of Files' | cut -f2 -d: | awk '{print $1;}'`
N_FILES_F=`cat "${new5}" | grep 'Total Number of Files' | cut -f2 -d: | awk '{print $1;}'`
N_EXT=`cat "${new5a}" | grep 'Total Number of Extents' | cut -f2 -d: | awk '{print $1;}'`
N_EXT_F=`cat "${new5}" | grep 'Total Number of Extents' | cut -f2 -d: | awk '{print $1;}'`
N_BYTE=`cat "${new5a}" | grep 'Total Size of files' | grep 'Bytes' | cut -f2 -d: | awk '{print $1;}'`
N_BYTE_F=`cat "${new5}" | grep 'Total Size of files' | grep 'Bytes' | cut -f2 -d: | awk '{print $1;}'`
echo "$N_FILES $N_FILES_F $N_EXT $N_EXT_F $N_BYTE $N_BYTE_F"
# calculation
N_FILE_FRAG_PERCENT=`echo "${N_FILES_F} ${N_FILES}" | awk '{TOTAL=($1+$2);print (($1/TOTAL)*100);}'`
N_EXT_FRAG_PERCENT=`echo "${N_EXT_F} ${N_EXT}" | awk '{TOTAL=($1+$2);print (($1/TOTAL)*100);}'`
N_BYTE_FRAG_PERCENT=`echo "${N_BYTE_F} ${N_BYTE}" | awk '{TOTAL=($1+$2);print (($1/TOTAL)*100);}'`
N_AVERAGE_PERCENT=`echo "${N_EXT_FRAG_PERCENT} ${N_FILE_FRAG_PERCENT} ${N_BYTE_FRAG_PERCENT}" | awk '{print ($1+$2+$3)/3;}'`
# add same info to both files (at same time with tee -a and >>)
echo "-----------PERCENT FRAGMENTATION:--------------"
echo "* % File Fragmentation: ${N_FILE_FRAG_PERCENT}"
echo "* % File Fragmentation: ${N_FILE_FRAG_PERCENT}"  | tee -a "${new5a}" >> "${new5}"
echo "* % Extent Fragmentation: ${N_EXT_FRAG_PERCENT}"
echo "* % Extent Fragmentation: ${N_EXT_FRAG_PERCENT}" | tee -a "${new5a}" >> "${new5}"
echo "* % Size Fragmentation: ${N_BYTE_FRAG_PERCENT}"
echo "* % Size Fragmentation: ${N_BYTE_FRAG_PERCENT}"  | tee -a "${new5a}" >> "${new5}"
echo "* % Average Fragmentation: ${N_AVERAGE_PERCENT}" 
echo "* % Average Fragmentation: ${N_AVERAGE_PERCENT}" | tee -a "${new5a}" >> "${new5}"
# ----------------------- calculating new fragmentation numbers - end -----------------#

echo "-----------ANALYSIS COMPLETE:--------------"
D81F=`date +s%s-d%D-t%T | tr /: -`
D81Fs=$(date +"%s")
Tdiff=$(($D81Fs-$D81Ss))
echo "* DURATION: $(($Tdiff / 60)) minutes and $(($Tdiff % 60)) seconds = $Tdiff seconds total"
echo "* START TIME: ${D81S}"
echo "* END TIME: ${D81F}"
# ----------------------- log these times to stats - start------------------------#
# --- to 5 --- #
echo "FOOTER | RUN DURATION | $(($Tdiff / 60)) | minutes | $(($Tdiff % 60)) seconds | $Tdiff | seconds total" >> "${new5}"
echo "FOOTER | RUN START TIME | ${D81S}" >> "${new5}"
echo "FOOTER | RUN END TIME | ${D81F}" >> "${new5}"
# --- to 5a --- #
echo "FOOTER | RUN DURATION | $(($Tdiff / 60)) | minutes | $(($Tdiff % 60)) seconds | $Tdiff | seconds total" >> "${new5a}"
echo "FOOTER | RUN START TIME | ${D81S}" >> "${new5a}"
echo "FOOTER | RUN END TIME | ${D81F}" >> "${new5a}"
# ----------------------- log these times to stats - end ------------------------#
echo "-----------NONE IMPORTANT RESULTS:---------"
echo "* The following files are optional and inbetween information (you can choose to delete them with these command):"
echo "rm ${file1}"
echo "rm ${file11}"
echo "rm ${file2}"
echo "rm ${file3}"
echo "rm ${file4}"
echo "-----------EXTRA CURIOUS RESULTS:----------"
echo "* Interersting extra curious results:"
echo "- ${new0}"
echo "- ${new1}"
echo "- ${new2}"
echo "- ${new3}"
echo "- ${new3s}"
echo "- ${new4}"
echo "- ${new4s}"
echo "- ${new5}"
echo "- ${new5a}"
echo "-----------IMPORTANT RESULTS:--------------"
echo "* The MF-FINAL files should be used for analysis:"
echo "(1) ${file5}"
echo "(2) ${file6}"
echo "-----------THE END-------------------------"

The end.

3 thoughts on “FIND MOST FRAGMENTED FILES – mostfragged – using filefrag to analyze file fragmentation

  1. Here is how I monitor my fragmentation
    ########################################

    Just made an interesting crontab that runs every 6 hours. It basically runs mostfrag (with max ionice and and max nice so as to not take up resources) every 6 hours and saves output to /root/docs/frag/ folder. I only keep the stats files and the files that show most fragmented files, so I save the files 100+ and 1000+ extents as well.

    NOTE: running this will degrade performance and might crash the system. Make sure you have a backup of your data. Also you will need to edit your scripts to match what ever directory paths your using. (remember to edit mostfragged.sh and “pick the find” or else the default find might endlessly/forever look through you millions of snapshot folders).

    Here is the cron script (this is root users cron script). “sudo -i” then “crontab -e” to edit. “crontab -l” to check it out. This part guarantees running the script every 6 hours (at 00:30,6:30,12:30,18:30)

    30 0,6,12,18 * * * /root/scripts/frag/cron.sh

    The cron tab above runs the cron.sh at those mentioned intervals.
    And here is cron.sh (which runs mostfragged, note i had to make PATH variable so that certain commands ran, or else get bad output in files, like NaN and 0s without it):

    #!/bin/bash
    VOL=/data/
    D82=$(date +s%s_d%Y-%m-%d_t%H-%M-%S)
    SAVE=/root/docs/frag/$D82
    export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    logger "Starting mostfragged $SAVE"
    /usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 /root/scripts/frag/mostfragged.sh $VOL $SAVE
    # delete all output files besides the 4 important files
    for i in `find $SAVE -type f | egrep -v "STATS|plus-SORTED"`; do
    ls -lisah $i >> $SAVE/deleted.txt
    rm -rf $i
    done;
    logger "Done mostfragged $SAVE"

  2. View my update at the top of the page from 09-05-2022 for latest script (GIST link) and an example of the fragmentation analysis plots (it’s actually from my own NAS and should be updated twice daily).

Leave a Reply

Your email address will not be published. Required fields are marked *