This is a working program that I use to scour through lots of text for keywords. I put the keywords in the terms file, one line at a time (keywords or key phrases so they can have spaces). Each launches a recursive grep through whatever folder I need to look for those phrases. Best part is each search/grep is launched at same time (so you dont have to wait for one to finish before next begins)
This is based off: this article
The main script which has everything is (it contains the other scripts in it, they are just commented away for redundancy). It needs terms file with it (which you edit to include your terms). The other helper scripts are given as well (also as comments in <folder or file to search thru> that launches the script which makes the result files
Each result file looks like this _allSS_<term looked for>_<folder or file name given>.txt – you get as many result files as lines in terms that had valid terms.
Valid terms files are like:
this is a phrase1 that has term4
phrase2 has a space as well
Then use any of the 3 monitoring scripts to monitor progress:
./ – this shows the processes and also the last few lines of each result file (live as terms/phrases are being found)
./ – this shows the processes and also the number of lines found in each result file (number of times each term was found, also live info)
./ – this just shows the processes information that you get in monitor1 and monitor2 scripts (also it has system load and memory)
When your done you can concat all of your results to one file with ./
Move them to a folder (that will be made) – simple mkdir and mv script: ./ <folder name>
Lets say you launched a whole group of grep commands and they are making result files already, need to clean up? run ./
./print.script just outputs what you see below (completely optional) – it formats output with seperators and gives statistics about each file (word count, line count, etc…)
./backup.script is my own script that I run after I modify any of the source code – makes it easy to bundle everything up
NOTE: program below has every script. its the main script, and it has all of the optional scripts commented out (for redundancy)
NOTE: added (and nice and high priority variation of it) that runs X number of term searchs per at a time, then when done with those terms it moves on to the next X terms, until its done. Default is 10 terms at a time (10 is the default in if number of terms per job is not specified)
TERMSEARCH SOURCE CODE ####################### ####################### SOURCE CODE PRINTED USING print.script ON Wed Jan 22 18:57:11 PST 2014 Printing Code From the following files: terms backup.script print.script *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:11 PST 2014 # of lines: 8 # of bytes: 127 # of chars: 127 # of words: 20 Longer line length: 23 *************************************************** *************************************************** #!/bin/bash # last update: 1/3/2014 killall -9 grep rm -rf _all* # smartlook addition rm -rf terms-main-* rm -rf terms-work-* *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 227 # of bytes: 10725 # of chars: 10725 # of words: 1767 Longer line length: 239 *************************************************** *************************************************** #!/bin/bash # last update 1/9/2014 # as per: # make sure to have a terms file called "terms" in the a directory before, 1 above, the directory you want to search - just like in this layout # /some/path/search/this/<everything here> # /some/path/search/ - will look for the keywords listed in the "terms" file and only thru everything inside the folder "this" (/some/path/search/this/) # note the folder that needs to be specified has to not search thru the current working directory, so either search something 1 level deep from the current working directory, or some other branch in the filesystem tree # the search location is specified with the variabler RELDIR1, in the above example I would set RELDIR1="this" or RELDIR1="/some/path/search/this" - note i didnt put the slash / at end (its optional) # also note it can be relative or absolute path # in program below (the runnable part not commented out - aka the main part of the look program) the only part that is supposed to be edited (again thats the RELDIR1 variable)... # is set to a folder that I have been looking through 12-11-2013, thats the folders actual relative path. # It sits right next to the and terms file (so terms file and and results files and the serach directory are all in the same folder # the folder that they are all in doesnt matter - just like the part /some/path/search/ doesnt matter in the whole example - unless of course I gave the full path to RELDIR1). Just like in this example # /some/path/search/terms - note term file has keyword per line (no spaces per keyword, also realize that search is case insensitive) # /some/path/search/<results go here> # /some/path/search/ - optional used to just monitor the long operation - shows the end of each result file on the go (as terms are being searched thru) # /some/path/search/ - optional used to just monitor the long operation - shows the line count in each result file on the go # /some/path/search/ - optional used to just monitor the long operation - just shows the processes and memory load (this info included in both of the above monitor scripts) # /some/path/search/ - optional, cleans (kills all grep commands - even ones not started by, also deletes all files that start with _all aka the result files) - good for starting fresh if messed something up # /some/path/search/ - optional, concats all _all files in current directory - only run once as the file will grow exponentially everytime its run - so make sure a cleanup is run first - it still keeps originals # /some/path/search/ - optional, this moves all of the results (all files with _all*) and copies terms file into a new directory (directory is made as well) # if you do this alot, i recommend doing the runs like this: edit terms, run clean up, run look, run monitor (immediately after or slightly after look script), then run the together script if you want it all together, then the move script. # you will notice nice and priority scripts: # ./ <path>: runs look and all greps with nice of -19 so high priority good for short ops # ./ <path>: runs look and all grep with nice of 19 so low priority good for long ops # ./ if job already running can change prioritys to high if taking too long (note might impact cpu alot) # ./ changes all jobs to nice 0 (as if was run without nice or prioty so its like running app with just ./ # ./ if job already running can change prioritys to super low (Aka nice) (good for long ops) ################################ # THE LOOK PROGRAM # # # # - the heart of the program # # # # # ################################ # # check if argument is there # if not show usage # check if argument is a directory or a file # if not show usage # # usage function usage123 () { ME123=`basename $0` echo "termsearch - by kostia - 2014" echo "Your using ./ - for you it has the name of ./$ME123" echo "Usage:" echo "./$ME123 <directory or file to look thru>" echo "RESULTS: It will output files with name _allSS_<term>_<filename or directory given>" echo "REQUIREMENT: The terms it will look thru are listed SINGLE LINE at a time in a \'terms\' file that sits in the same folder as this file" echo "NOTE: each term needs to be seperated with a new line" echo "--DO NOT HAVE SPACES LIKE term1 term2 term3--" echo "Example of terms file:" echo "# cat terms" echo "term1" echo "term2" echo "* After running this, all search tasks - ran with grep - will be in the background" echo "* You can monitor your operation with MONITOR1 or MONITOR2 script" echo "* You can clean up everything with the CLEAN UP script - stops all grep operations and deletes all result files" echo "* You can concat all of the results together into 1 file - still keeping the originals - into one file with TOGETHER script" echo "* Finally you can move all of your results and together file - if you made one - into a folder using the move script" exit 1 } # main code if [ $# -eq 0 ] # 0 args then usage123 fi if [ $# -gt 1 ] # more then 1 arg then usage123 fi if [ -z "$1" ] # no argument in first arg then usage123 fi if [ -d "$1" ] ; then # if directory echo "Will look thru DIRECTORY: $1" else if [ -f "$1" ] ; then echo "Will look thru FILE: $1" else usage123 fi fi if [ -f "terms" ] ; then logger "termsearch thru $RELDIR1 for `wc -l terms` term[s]" else echo "ERROR: \'terms\' file is missing - it has to be in this directory and nowhere else" echo "More info on terms below" echo "---" usage123 fi # cant have file names with /, or else thats a directory - this doesnt affect filenames DIR123=${1%/} # change / to - for filename FILESUF123=`echo ${DIR123} | tr / -` echo "**** the file suf is kostia__${FILESUF123}__kostia ****" # IFS controls for loop new line char usually its space, but for now its new line # This way terms can have spaces in each line SAVEIFS=$IFS IFS=$(echo -en "\n\b") RELDIR1=${DIR123}; for i in `cat terms`; do echo "======================="; date; echo "Looking for: ${i}"; echo "======================="; TERM4FILE=`echo -n ${i} | tr ' ' -` (time (grep -nir ${i} ${RELDIR1}) >& _allSS_${TERM4FILE}_${FILESUF123}.txt &); echo "----------------------"; echo "Started PID: $!"; (ps awfuxx | egrep 'grep|USER' | nl;); echo; done; echo "##############################"; echo "FINAL JOBS:"; (ps awfuxx | egrep 'grep|USER' | nl;); echo echo "Looking for `wc -l terms` at the same time!" IFS=$SAVEIFS exit 0 # the monitor scripts # the are below # the clean up script # (copy whole section to new file and remove only the first hash mark) ##!/bin/bash #killall -9 grep #rm -rf _all* # the put together script - puts all results together to 1 file # (copy whole section to new file and remove only the first has mark) ##!/bin/bash #DST1="_allTOGETHER_.txt"; for i in `ls | grep _allSS`; do echo "CONCATTING FILE: $i >> $DST1"; echo -e "TERM: $i\n###################" >> $DST1; cat $i >> $DST1; echo >> $DST1; done; # The move script, this will move all of the results (including the one if you used together script) but not the terms file(terms file is just copied - all into one directory with the given name. You can use relative or absolute path. # example ./ search-nodeleaf - then all of the # this is the only script that needs an input arguments ##!/bin/bash ## USE: ./ <path-to-move-to> ##$ last update: 01-05-2014 #mkdir $1 #mv _allSS_* $1 ## note only copy terms, so terms files stays for next round #cp terms $1 ############################## # UPDATES ON MONITOR SCRIPTS # ############################## # just remove 1 hash mark to make em work #MONITOR SCRIPT: ##!/bin/bash ## last update: 1/7/2014 #watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;); echo; echo "RESULTS:"; echo "==========="; tail _all*;' ## if you want to do with while loop: ## while true; do clear; "=====`date`======"; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS:"; echo "==========="; tail _all*; sleep 1; done; # #MONITOR SCRIPT: ##!/bin/bash ## last update 1/7/2014 #watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*;' ## if you wann do with while loop: ## while true; do clear; "=====`date`======"; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*; sleep 1; done; #MONITOR SCRIPT: ##!/bin/bash ## last update 1/7/2014 #watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;);' ## if you wann do with while loop: ## while true; do clear; "=====`date`======"; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*; sleep 1; done; ##iSCRIPT FOR CPU CONTROL: ############################################### ##!/bin/bash ## update: 1/9/2014 ## usage: ./ <path> ## just run look with more priority #nice -n -19 ./ ${1} ##SCRIPT FOR CPU CONTROL: ############################################### ##!/bin/bash ## update: 1/9/2014 ## usage: ./ <path> ## just run look with nice #nice -n 19 ./ ${1} ##SCRIPT FOR CPU CONTROL: ############################################### ##!/bin/bash ## update: 1/9/2014 ## make nice of the program more cpu ## prioritize #for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n -19 -p $i; done; ##SCRIPT FOR CPU CONTROL: ############################################### ##!/bin/bash ## update: 1/9/2014 ## this put nice back to 0 default so it runs as if was run by ./look and not ./nicelook* #for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n 0 -p $i; done; ##SCRIPT FOR CPU CONTROL: ############################################### ##!/bin/bash ## update: 1/9/2014 ## make nice of the program less cpu ## good for huge ops #for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n 19 -p $i; done; *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 5 # of bytes: 427 # of chars: 427 # of words: 67 Longer line length: 181 *************************************************** *************************************************** #!/bin/bash # last update: 1/7/2014 watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;); echo; echo "RESULTS:"; echo "==========="; tail _all*;' # if you want to do with while loop: # while true; do clear; "=====`date`======"; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS:"; echo "==========="; tail _all*; sleep 1; done; *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 5 # of bytes: 494 # of chars: 494 # of words: 74 Longer line length: 216 *************************************************** *************************************************** #!/bin/bash # last update 1/7/2014 watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*;' # if you wann do with while loop: # while true; do clear; "=====`date`======"; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*; sleep 1; done; *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 5 # of bytes: 403 # of chars: 403 # of words: 63 Longer line length: 216 *************************************************** *************************************************** #!/bin/bash # last update 1/7/2014 watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;);' # if you wann do with while loop: # while true; do clear; "=====`date`======"; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*; sleep 1; done; *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 8 # of bytes: 172 # of chars: 172 # of words: 29 Longer line length: 59 *************************************************** *************************************************** #!/bin/bash # USE: ./ <path-to-move-to> #$ last update: 01-05-2014 mkdir $1 mv _allSS_* $1 # note only copy terms, so terms files stays for next round cp terms $1 *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 3 # of bytes: 218 # of chars: 218 # of words: 34 Longer line length: 182 *************************************************** *************************************************** #!/bin/bash # last update 1/3/2014 DST1="_allTOGETHER_.txt"; for i in `ls | grep _allSS`; do echo "CONCATTING FILE: $i >> $DST1"; echo -e "TERM: $i\n###################" >> $DST1; cat $i >> $DST1; echo >> $DST1; done; *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 5 # of bytes: 136 # of chars: 136 # of words: 20 Longer line length: 42 *************************************************** *************************************************** #!/bin/bash # update: 1/9/2014 # usage: ./ <path> # just run look with more priority nice -n -19 ./ ${1} *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 5 # of bytes: 113 # of chars: 113 # of words: 19 Longer line length: 29 *************************************************** *************************************************** #!/bin/bash # update: 1/9/2014 # usage: ./ <path> # just run look with nice nice -n 19 ./ ${1} *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 5 # of bytes: 177 # of chars: 177 # of words: 37 Longer line length: 96 *************************************************** *************************************************** #!/bin/bash # update: 1/9/2014 # make nice of the program more cpu # prioritize for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n -19 -p $i; done; *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 4 # of bytes: 215 # of chars: 215 # of words: 47 Longer line length: 94 *************************************************** *************************************************** #!/bin/bash # update: 1/9/2014 # this put nice back to 0 default so it runs as if was run by ./look and not ./nicelook* for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n 0 -p $i; done; *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 5 # of bytes: 183 # of chars: 183 # of words: 40 Longer line length: 95 *************************************************** *************************************************** #!/bin/bash # update: 1/9/2014 # make nice of the program less cpu # good for huge ops for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n 19 -p $i; done; *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 192 # of bytes: 7260 # of chars: 7260 # of words: 990 Longer line length: 112 *************************************************** *************************************************** #!/bin/bash # ./ <number of terms per jobs> # variations: just change the "look" variable # ./ <number of terms per jobs> # ./ <number of terms per jobs> # look="" # look="" # look="" # if no specified then set to 10 # AS JOB MIGHT BE LONG LETS CLEAN UP BY CATCHING CONTROL-C WITH TRAP trap ctrl_c INT function ctrl_c() { echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" echo "CLEAN UP TRIGGERED DUE TO:" echo "CONTROL-C" echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" echo "CLEANING UP - putting terms file as it was - and killing all greps" echo "RECOMMENDED: delete _allSS files that were cancelled - didnt delete just incase you needed them" echo "RECOMMENDED CONTINUED: ./ does just that" killall -9 grep mv -vf ${tmainfile} ${tfile} rm -vf ${ntfile} rm -vf "${tfile}-work-"* time2=`date +ymd%Y-%m-%d-t%T | tr : -` echo "START TIME OF FULL - CANCELED - OPERATION OF $N TOTAL TERM[S]: ${time0}" echo "END TIME OF FULL - CANCELED - OPERATION OF $N TOTAL TERM[S]: ${time2}" exit 2 } # PRINT USAGE usage123 () { ME123=`basename $0` echo "termsearch - by kostia - 2014" echo "Usage: $ME123 <number of terms per jobs> <folder or file to look thru>" echo "Usage: $ME123 <folder or file to look thru>" echo "When no <number of terms per jobs> specified it will default to 0" exit 1 } # CHECK PROGRESS SCRIPT check_process() { #echo -n "" [ "$1" = "" ] && return 0 # if process name is nothing then fail, this will be grep so it will never fail [ `pgrep -n $1` ] && return 1 || return 0 # if process exists then 1 if doesnt then 0 } # MAIN SCRIPT # CHECK IF NO ARGS if [ $# -eq 0 ] # 0 args then usage123 fi if [ $# -gt 2 ] # more then 2 arg then usage123 fi if [ -z "$1" ] # no argument in first arg then usage123 fi if [ $# -eq 1 ] # if 1 argument it should be folder or file Usage: $ME123 <folder or file to look thru> then if [ -d "$1" ] ; then # if directory echo "Will look thru DIRECTORY: $1" else if [ -f "$1" ] ; then echo "Will look thru FILE: $1" else usage123 fi fi n=10; FOLDER=$1 echo "Defauling to 10 terms per job" echo "Setting n=10" fi if [ $# -eq 2 ] # if 2 Usage: $ME123 <number of terms per jobs> <folder or file to look thru> then n=$1 FOLDER=$2 if [ -d "$2" ] ; then # if directory echo "Will look thru DIRECTORY: $2" else if [ -f "$2" ] ; then echo "Will look thru FILE: $2" else usage123 fi fi fi # SETTING IMPORTANT VARIABLES look="" # look="" # look="" searchgrep="grep" time0=`date +ymd%Y-%m-%d-t%T | tr : -` echo "Folder/File that we are term searching thru: ${FOLDER}" echo "##############################################################################################" echo "############################### START: $time0 ###############################" echo "##############################################################################################" tfile="terms" # COPY TERMS FILE TO TERMS-CURRENT (WE WILL WORK OFF TERMS CURRENT) tmainfile="${tfile}-main-${time0}" echo "Main File Was Copied To: ${tmainfile}" cp -vf ${tfile} ${tmainfile} # MAKE NTFILE - TEMPORARY - TO SHOW WHAT TERM # BEING COPIED (SIMPLE HACK) ntfile="${tmainfile}-numbered" cat ${tmainfile} | nl > ${ntfile} # N total number of lines # n is the number of terms per job, given as 1st parameter N=`wc -l ${tmainfile} | awk '{print $1;}'` #n=$1 total=$((N/n)) echo "Number of terms: ${N}, Terms Per Job: ${n}, Number Of Jobs: ${total}" for i in `seq 1 $((total+1))`; do echo "-------------------------------------------------------------------------------------------------------" echo "------------------------------ STARTING ${i}/${total} --------------------------------------" echo "-------------------------------------------------------------------------------------------------------" date; #echo "* n=$n N=$N i=$i total=$total" START=$((i*n-n+1)) END=$((START+n-1)) if [ ${i} -eq ${total} ]; then END=${N} fi len=$((END-START+1)) echo "* n=$n, N=$N, i=$i, total=$total, START=$START, END=$END, len=$len" ###### echo "* Job number ${i}: iterating thru ${START} -> ${END}" # USE NTFILE TO TELL US WHAT TERMS and THEIR NUMBER ARE BEING WORKED ON echo "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%" cat ${ntfile} | awk "{S=$START;E=$END;if(NR>=S && NR<=E){print \$0;}}" | nl echo "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%" time1=`date +ymd%Y-%m-%d-t%T | tr : -` curterms="${tfile}-work-${START}-to-${END}-${time1}" # EXTRACT TERMS TO WORK ON FOR THIS JOB TO CURRENT-TERMS FILE ##### echo "Current Terms File: ${curterms}" cat ${tmainfile} | awk "{S=$START;E=$END;if(NR>=S && NR<=E){print \$0;}}" > ${curterms} ###### echo "- NOTE: Overwriting ${curterms} over ${tfile} and will work on it" ###### echo "- NOTE: Don't worry original ${tfile} is still at ${tmainfile}" cp -vf ${curterms} ${tfile} ##### echo "Looking thru this:" # echo "~~~~~~~~~~~ numbered current terms ~~~~~~~~~" # cat ${tfile} | nl # echo "~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~" ##### echo "Launching ${look} on ${n} terms" # LAUNCHING MAIN LOOK THAT RUNS IN BACKGROUND - NOTE LOOK OPERATES ON terms FILE # # # killing all other greps ##### echo "killing all greps before looking" killall -9 grep echo "Launching: ./${look} ${FOLDER}" ./${look} ${FOLDER} # sleep 1 # # echo "*************************************************************" echo "**** WAITING FOR COMPLETION ${i} - CHECKING EVERY 5 sec *****" echo "*************************************************************" # MONITORING UNTIL LOOK IS DONE while [ 1 ]; do ##### ts="`date +ymd%Y-%m-%d-t%T` " ts="X" echo -n "${ts}" check_process ${searchgrep} # WHEN LOOK IS DONE LET US KNOW AND BREAK LOOP [ $? -eq 0 ] && echo -e "\n*** Job ${i} - Complete - GREPS Finished ***" && break sleep 5 done ##### echo "**** FINISHED ${i} - MOVING TO NEXT JOB *****" # FOR CLEAN UP OF CURTERMS UNHASH THIS: # rm -f ${curterms} time1a=`date +ymd%Y-%m-%d-t%T | tr : -` echo "STARTED THIS SUB-JOB OF $n TERMS ON: ${time1}" echo "ENDED THIS SUB-JOB OF $n TERMS ON: ${time1a}" echo "-----------------------------------------------------------------------------------------------------" echo "------------------------------ ENDING ${i}/${total} --------------------------------------" echo "-----------------------------------------------------------------------------------------------------" done # WHEN DONE PUT TERMS FILE BACK AS THEY WERE echo "CLEANING UP - putting terms file as it was" mv -vf ${tmainfile} ${tfile} rm -vf ${ntfile} rm -vf "${tfile}-work-"* time2=`date +ymd%Y-%m-%d-t%T | tr : -` echo "START TIME OF FULL OPERATION OF $N TOTAL TERM[S]: ${time0}" echo "END TIME OF FULL OPERATION OF $N TOTAL TERM[S]: ${time2}" echo "############################################################################################" echo "############################### END: $time2 ###############################" echo "############################################################################################" *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 192 # of bytes: 7260 # of chars: 7260 # of words: 990 Longer line length: 112 *************************************************** *************************************************** #!/bin/bash # ./ <number of terms per jobs> # variations: just change the "look" variable # ./ <number of terms per jobs> # ./ <number of terms per jobs> # look="" # look="" # look="" # if no specified then set to 10 # AS JOB MIGHT BE LONG LETS CLEAN UP BY CATCHING CONTROL-C WITH TRAP trap ctrl_c INT function ctrl_c() { echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" echo "CLEAN UP TRIGGERED DUE TO:" echo "CONTROL-C" echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" echo "CLEANING UP - putting terms file as it was - and killing all greps" echo "RECOMMENDED: delete _allSS files that were cancelled - didnt delete just incase you needed them" echo "RECOMMENDED CONTINUED: ./ does just that" killall -9 grep mv -vf ${tmainfile} ${tfile} rm -vf ${ntfile} rm -vf "${tfile}-work-"* time2=`date +ymd%Y-%m-%d-t%T | tr : -` echo "START TIME OF FULL - CANCELED - OPERATION OF $N TOTAL TERM[S]: ${time0}" echo "END TIME OF FULL - CANCELED - OPERATION OF $N TOTAL TERM[S]: ${time2}" exit 2 } # PRINT USAGE usage123 () { ME123=`basename $0` echo "termsearch - by kostia - 2014" echo "Usage: $ME123 <number of terms per jobs> <folder or file to look thru>" echo "Usage: $ME123 <folder or file to look thru>" echo "When no <number of terms per jobs> specified it will default to 0" exit 1 } # CHECK PROGRESS SCRIPT check_process() { #echo -n "" [ "$1" = "" ] && return 0 # if process name is nothing then fail, this will be grep so it will never fail [ `pgrep -n $1` ] && return 1 || return 0 # if process exists then 1 if doesnt then 0 } # MAIN SCRIPT # CHECK IF NO ARGS if [ $# -eq 0 ] # 0 args then usage123 fi if [ $# -gt 2 ] # more then 2 arg then usage123 fi if [ -z "$1" ] # no argument in first arg then usage123 fi if [ $# -eq 1 ] # if 1 argument it should be folder or file Usage: $ME123 <folder or file to look thru> then if [ -d "$1" ] ; then # if directory echo "Will look thru DIRECTORY: $1" else if [ -f "$1" ] ; then echo "Will look thru FILE: $1" else usage123 fi fi n=10; FOLDER=$1 echo "Defauling to 10 terms per job" echo "Setting n=10" fi if [ $# -eq 2 ] # if 2 Usage: $ME123 <number of terms per jobs> <folder or file to look thru> then n=$1 FOLDER=$2 if [ -d "$2" ] ; then # if directory echo "Will look thru DIRECTORY: $2" else if [ -f "$2" ] ; then echo "Will look thru FILE: $2" else usage123 fi fi fi # SETTING IMPORTANT VARIABLES # look="" look="" # look="" searchgrep="grep" time0=`date +ymd%Y-%m-%d-t%T | tr : -` echo "Folder/File that we are term searching thru: ${FOLDER}" echo "##############################################################################################" echo "############################### START: $time0 ###############################" echo "##############################################################################################" tfile="terms" # COPY TERMS FILE TO TERMS-CURRENT (WE WILL WORK OFF TERMS CURRENT) tmainfile="${tfile}-main-${time0}" echo "Main File Was Copied To: ${tmainfile}" cp -vf ${tfile} ${tmainfile} # MAKE NTFILE - TEMPORARY - TO SHOW WHAT TERM # BEING COPIED (SIMPLE HACK) ntfile="${tmainfile}-numbered" cat ${tmainfile} | nl > ${ntfile} # N total number of lines # n is the number of terms per job, given as 1st parameter N=`wc -l ${tmainfile} | awk '{print $1;}'` #n=$1 total=$((N/n)) echo "Number of terms: ${N}, Terms Per Job: ${n}, Number Of Jobs: ${total}" for i in `seq 1 $((total+1))`; do echo "-------------------------------------------------------------------------------------------------------" echo "------------------------------ STARTING ${i}/${total} --------------------------------------" echo "-------------------------------------------------------------------------------------------------------" date; #echo "* n=$n N=$N i=$i total=$total" START=$((i*n-n+1)) END=$((START+n-1)) if [ ${i} -eq ${total} ]; then END=${N} fi len=$((END-START+1)) echo "* n=$n, N=$N, i=$i, total=$total, START=$START, END=$END, len=$len" ###### echo "* Job number ${i}: iterating thru ${START} -> ${END}" # USE NTFILE TO TELL US WHAT TERMS and THEIR NUMBER ARE BEING WORKED ON echo "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%" cat ${ntfile} | awk "{S=$START;E=$END;if(NR>=S && NR<=E){print \$0;}}" | nl echo "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%" time1=`date +ymd%Y-%m-%d-t%T | tr : -` curterms="${tfile}-work-${START}-to-${END}-${time1}" # EXTRACT TERMS TO WORK ON FOR THIS JOB TO CURRENT-TERMS FILE ##### echo "Current Terms File: ${curterms}" cat ${tmainfile} | awk "{S=$START;E=$END;if(NR>=S && NR<=E){print \$0;}}" > ${curterms} ###### echo "- NOTE: Overwriting ${curterms} over ${tfile} and will work on it" ###### echo "- NOTE: Don't worry original ${tfile} is still at ${tmainfile}" cp -vf ${curterms} ${tfile} ##### echo "Looking thru this:" # echo "~~~~~~~~~~~ numbered current terms ~~~~~~~~~" # cat ${tfile} | nl # echo "~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~" ##### echo "Launching ${look} on ${n} terms" # LAUNCHING MAIN LOOK THAT RUNS IN BACKGROUND - NOTE LOOK OPERATES ON terms FILE # # # killing all other greps ##### echo "killing all greps before looking" killall -9 grep echo "Launching: ./${look} ${FOLDER}" ./${look} ${FOLDER} # sleep 1 # # echo "*************************************************************" echo "**** WAITING FOR COMPLETION ${i} - CHECKING EVERY 5 sec *****" echo "*************************************************************" # MONITORING UNTIL LOOK IS DONE while [ 1 ]; do ##### ts="`date +ymd%Y-%m-%d-t%T` " ts="X" echo -n "${ts}" check_process ${searchgrep} # WHEN LOOK IS DONE LET US KNOW AND BREAK LOOP [ $? -eq 0 ] && echo -e "\n*** Job ${i} - Complete - GREPS Finished ***" && break sleep 5 done ##### echo "**** FINISHED ${i} - MOVING TO NEXT JOB *****" # FOR CLEAN UP OF CURTERMS UNHASH THIS: # rm -f ${curterms} time1a=`date +ymd%Y-%m-%d-t%T | tr : -` echo "STARTED THIS SUB-JOB OF $n TERMS ON: ${time1}" echo "ENDED THIS SUB-JOB OF $n TERMS ON: ${time1a}" echo "-----------------------------------------------------------------------------------------------------" echo "------------------------------ ENDING ${i}/${total} --------------------------------------" echo "-----------------------------------------------------------------------------------------------------" done # WHEN DONE PUT TERMS FILE BACK AS THEY WERE echo "CLEANING UP - putting terms file as it was" mv -vf ${tmainfile} ${tfile} rm -vf ${ntfile} rm -vf "${tfile}-work-"* time2=`date +ymd%Y-%m-%d-t%T | tr : -` echo "START TIME OF FULL OPERATION OF $N TOTAL TERM[S]: ${time0}" echo "END TIME OF FULL OPERATION OF $N TOTAL TERM[S]: ${time2}" echo "############################################################################################" echo "############################### END: $time2 ###############################" echo "############################################################################################" *************************************************** *************************************************** SOURCE CODE OF FILE: --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 192 # of bytes: 7260 # of chars: 7260 # of words: 990 Longer line length: 112 *************************************************** *************************************************** #!/bin/bash # ./ <number of terms per jobs> # variations: just change the "look" variable # ./ <number of terms per jobs> # ./ <number of terms per jobs> # look="" # look="" # look="" # if no specified then set to 10 # AS JOB MIGHT BE LONG LETS CLEAN UP BY CATCHING CONTROL-C WITH TRAP trap ctrl_c INT function ctrl_c() { echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" echo "CLEAN UP TRIGGERED DUE TO:" echo "CONTROL-C" echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" echo "CLEANING UP - putting terms file as it was - and killing all greps" echo "RECOMMENDED: delete _allSS files that were cancelled - didnt delete just incase you needed them" echo "RECOMMENDED CONTINUED: ./ does just that" killall -9 grep mv -vf ${tmainfile} ${tfile} rm -vf ${ntfile} rm -vf "${tfile}-work-"* time2=`date +ymd%Y-%m-%d-t%T | tr : -` echo "START TIME OF FULL - CANCELED - OPERATION OF $N TOTAL TERM[S]: ${time0}" echo "END TIME OF FULL - CANCELED - OPERATION OF $N TOTAL TERM[S]: ${time2}" exit 2 } # PRINT USAGE usage123 () { ME123=`basename $0` echo "termsearch - by kostia - 2014" echo "Usage: $ME123 <number of terms per jobs> <folder or file to look thru>" echo "Usage: $ME123 <folder or file to look thru>" echo "When no <number of terms per jobs> specified it will default to 0" exit 1 } # CHECK PROGRESS SCRIPT check_process() { #echo -n "" [ "$1" = "" ] && return 0 # if process name is nothing then fail, this will be grep so it will never fail [ `pgrep -n $1` ] && return 1 || return 0 # if process exists then 1 if doesnt then 0 } # MAIN SCRIPT # CHECK IF NO ARGS if [ $# -eq 0 ] # 0 args then usage123 fi if [ $# -gt 2 ] # more then 2 arg then usage123 fi if [ -z "$1" ] # no argument in first arg then usage123 fi if [ $# -eq 1 ] # if 1 argument it should be folder or file Usage: $ME123 <folder or file to look thru> then if [ -d "$1" ] ; then # if directory echo "Will look thru DIRECTORY: $1" else if [ -f "$1" ] ; then echo "Will look thru FILE: $1" else usage123 fi fi n=10; FOLDER=$1 echo "Defauling to 10 terms per job" echo "Setting n=10" fi if [ $# -eq 2 ] # if 2 Usage: $ME123 <number of terms per jobs> <folder or file to look thru> then n=$1 FOLDER=$2 if [ -d "$2" ] ; then # if directory echo "Will look thru DIRECTORY: $2" else if [ -f "$2" ] ; then echo "Will look thru FILE: $2" else usage123 fi fi fi # SETTING IMPORTANT VARIABLES # look="" # look="" look="" searchgrep="grep" time0=`date +ymd%Y-%m-%d-t%T | tr : -` echo "Folder/File that we are term searching thru: ${FOLDER}" echo "##############################################################################################" echo "############################### START: $time0 ###############################" echo "##############################################################################################" tfile="terms" # COPY TERMS FILE TO TERMS-CURRENT (WE WILL WORK OFF TERMS CURRENT) tmainfile="${tfile}-main-${time0}" echo "Main File Was Copied To: ${tmainfile}" cp -vf ${tfile} ${tmainfile} # MAKE NTFILE - TEMPORARY - TO SHOW WHAT TERM # BEING COPIED (SIMPLE HACK) ntfile="${tmainfile}-numbered" cat ${tmainfile} | nl > ${ntfile} # N total number of lines # n is the number of terms per job, given as 1st parameter N=`wc -l ${tmainfile} | awk '{print $1;}'` #n=$1 total=$((N/n)) echo "Number of terms: ${N}, Terms Per Job: ${n}, Number Of Jobs: ${total}" for i in `seq 1 $((total+1))`; do echo "-------------------------------------------------------------------------------------------------------" echo "------------------------------ STARTING ${i}/${total} --------------------------------------" echo "-------------------------------------------------------------------------------------------------------" date; #echo "* n=$n N=$N i=$i total=$total" START=$((i*n-n+1)) END=$((START+n-1)) if [ ${i} -eq ${total} ]; then END=${N} fi len=$((END-START+1)) echo "* n=$n, N=$N, i=$i, total=$total, START=$START, END=$END, len=$len" ###### echo "* Job number ${i}: iterating thru ${START} -> ${END}" # USE NTFILE TO TELL US WHAT TERMS and THEIR NUMBER ARE BEING WORKED ON echo "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%" cat ${ntfile} | awk "{S=$START;E=$END;if(NR>=S && NR<=E){print \$0;}}" | nl echo "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%" time1=`date +ymd%Y-%m-%d-t%T | tr : -` curterms="${tfile}-work-${START}-to-${END}-${time1}" # EXTRACT TERMS TO WORK ON FOR THIS JOB TO CURRENT-TERMS FILE ##### echo "Current Terms File: ${curterms}" cat ${tmainfile} | awk "{S=$START;E=$END;if(NR>=S && NR<=E){print \$0;}}" > ${curterms} ###### echo "- NOTE: Overwriting ${curterms} over ${tfile} and will work on it" ###### echo "- NOTE: Don't worry original ${tfile} is still at ${tmainfile}" cp -vf ${curterms} ${tfile} ##### echo "Looking thru this:" # echo "~~~~~~~~~~~ numbered current terms ~~~~~~~~~" # cat ${tfile} | nl # echo "~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~" ##### echo "Launching ${look} on ${n} terms" # LAUNCHING MAIN LOOK THAT RUNS IN BACKGROUND - NOTE LOOK OPERATES ON terms FILE # # # killing all other greps ##### echo "killing all greps before looking" killall -9 grep echo "Launching: ./${look} ${FOLDER}" ./${look} ${FOLDER} # sleep 1 # # echo "*************************************************************" echo "**** WAITING FOR COMPLETION ${i} - CHECKING EVERY 5 sec *****" echo "*************************************************************" # MONITORING UNTIL LOOK IS DONE while [ 1 ]; do ##### ts="`date +ymd%Y-%m-%d-t%T` " ts="X" echo -n "${ts}" check_process ${searchgrep} # WHEN LOOK IS DONE LET US KNOW AND BREAK LOOP [ $? -eq 0 ] && echo -e "\n*** Job ${i} - Complete - GREPS Finished ***" && break sleep 5 done ##### echo "**** FINISHED ${i} - MOVING TO NEXT JOB *****" # FOR CLEAN UP OF CURTERMS UNHASH THIS: # rm -f ${curterms} time1a=`date +ymd%Y-%m-%d-t%T | tr : -` echo "STARTED THIS SUB-JOB OF $n TERMS ON: ${time1}" echo "ENDED THIS SUB-JOB OF $n TERMS ON: ${time1a}" echo "-----------------------------------------------------------------------------------------------------" echo "------------------------------ ENDING ${i}/${total} --------------------------------------" echo "-----------------------------------------------------------------------------------------------------" done # WHEN DONE PUT TERMS FILE BACK AS THEY WERE echo "CLEANING UP - putting terms file as it was" mv -vf ${tmainfile} ${tfile} rm -vf ${ntfile} rm -vf "${tfile}-work-"* time2=`date +ymd%Y-%m-%d-t%T | tr : -` echo "START TIME OF FULL OPERATION OF $N TOTAL TERM[S]: ${time0}" echo "END TIME OF FULL OPERATION OF $N TOTAL TERM[S]: ${time2}" echo "############################################################################################" echo "############################### END: $time2 ###############################" echo "############################################################################################" *************************************************** *************************************************** SOURCE CODE OF FILE: terms --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 3 terms # of bytes: 18 terms # of chars: 18 terms # of words: 3 terms Longer line length: 5 terms *************************************************** *************************************************** term1 term2 term3 *************************************************** *************************************************** SOURCE CODE OF FILE: backup.script --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 20 backup.script # of bytes: 1056 backup.script # of chars: 1056 backup.script # of words: 122 backup.script Longer line length: 115 backup.script *************************************************** *************************************************** #!/bin/bash SCRIPTS="" SCRIPTS="$SCRIPTS" SCRIPTS="$SCRIPTS" echo "* REMOVING OLD TGZ FILE" rm -f termsearch.tgz echo "* RENAMING TERMS TO .tempterms123" mv terms .tempterms123 echo "* MAKING SHOWCASE terms FILE WITH TERMS: term1 THRU term3" echo -e "term1\nterm2\nterm3" > terms echo "* GRABBING SOURCE CODE" ./print.script > termsearch-all-code.txt echo "* TAR GZIPPING EVERY SCRIPT AND terms FILE INTO termsearch.tgz" tar -zcvf termsearch.tgz termsearch-all-code.txt ${SCRIPTS} terms backup.script print.script echo "* COPYING termsearch.tgz AND ALL SOURCE CODE termsearch-all-code.txt TO WEB SERVER /var/www" cp termsearch.tgz termsearch-all-code.txt /var/www/ echo "* RENAME TEMP TERMS BACK TO ORIGINAL terms FILE - REMOVING SHOWCASE terms FILE AND RESTORING YOUR terms FILE" mv .tempterms123 terms echo "* DONE!" *************************************************** *************************************************** SOURCE CODE OF FILE: print.script --------------------------------------------------- Date of code: Wed Jan 22 18:57:12 PST 2014 # of lines: 36 print.script # of bytes: 1199 print.script # of chars: 1199 print.script # of words: 128 print.script Longer line length: 130 print.script *************************************************** *************************************************** #!/bin/bash # update 1/9/2014 SCRIPTS="" #cant include SCRIPTS="$SCRIPTS" SCRIPTS="$SCRIPTS" echo "TERMSEARCH SOURCE CODE" echo "#######################" echo "#######################" echo echo "SOURCE CODE PRINTED USING `basename $0` ON `date`" echo "Printing Code From the following files:" echo *sh terms backup.script print.script echo echo echo for i in ${SCRIPTS} terms backup.script print.script do echo "***************************************************" echo "***************************************************" echo "SOURCE CODE OF FILE: $i" echo "---------------------------------------------------" echo "Date of code: `date`" echo "# of lines: `wc -l $i`" echo "# of bytes: `wc -c $i`" echo "# of chars: `wc -m $i`" echo "# of words: `wc -w $i`" echo "Longer line length: `wc -L $i`" echo "***************************************************" echo "***************************************************" echo cat $i echo echo echo done