Thursday, January 26, 2012

Martin Pechanec's slay - process killer in the console

Martin Pechanec's slay is the easiest way to kill processes from the Linux console without having to manually do all that ps ux| grep foo followed by kill -9 666 boilerplate. This script was inspired by the very similar slay command in QNX O.S.

To kill some kate instances with slay, you simply type
% slay kate

and slay will prompt you, one at a time, to kill the processes that have "kate" in their command name:

> slay: Process list:
UID PID PPID C STIME TTY TIME CMD
ubuntu 5834 5481 2 11:33 pts/90 00:00:00 kate
ubuntu 5839 5481 2 11:33 pts/90 00:00:00 kate
ubuntu 5843 32672 0 11:33 ? 00:00:00 kio_file [kdeinit] file /tmp/ksocket-ubuntu/klauncherN2EMKb.slave-socket /tmp/ksocket-ubuntu/kateliTBzb.slave-socket
ubuntu 5844 32672 0 11:33 ? 00:00:00 kio_file [kdeinit] file /tmp/ksocket-ubuntu/klauncherN2EMKb.slave-socket /tmp/ksocket-ubuntu/kateb8tyxa.slave-socket

> slay: Terminate:
ubuntu 5834 5481 11:33 pts/90 00:00:00 kate ? (y/n)[n]


Alternatively, if you know you want to smash every process with "kate" in its process name, you can supply the -f switch:

slay kate -f
> slay: Process list:
UID PID PPID C STIME TTY TIME CMD
ubuntu 5834 5481 0 11:33 pts/90 00:00:00 kate
ubuntu 5839 5481 0 11:33 pts/90 00:00:00 kate
ubuntu 5843 32672 0 11:33 ? 00:00:00 kio_file [kdeinit] file /tmp/ksocket-ubuntu/klauncherN2EMKb.slave-socket /tmp/ksocket-ubuntu/kateliTBzb.slave-socket



To use this awesome script, put the following in a file at $HOME/bin/slay and then put $HOME/bin: at the front of your $PATH variable:

#!/bin/ksh -
#-----------------------------------------------------------------------------
#
# FILE: slay
# AUTHOR: Martin Pechanec, 03-SEPT-2002
#
# $Header:$
# $Log:$
#
#-----------------------------------------------------------------------------
#
# DESCRIPTION
# slay - interactively kill by process name by sending
# SIGKILL (kill -9) or (kill -s KILL) to the process(es)
# selected
#
# USAGE:
# slay [-a|-c] [-s] [-f|-i] [-h] []
#
# If no name or substring is specified then just the process
# information is printed as in ps command.
# It works on the current user processes run on all terminals.
# The information printed out about the processes is in long
# form by default (ps -f). By default the process termination
# is interactive.
# The options:
# -a .. all processes (default current user's only)
# -c .. current terminal for current user (default all terminals
# for current user).
# -s .. short items of the process list (default long)
# -g (-sg) .. same as -s, but long args (default long)
# -f .. force process termination (non-interactive)
# -i .. info only, do not terminate anything
# -h .. help
#
# The is a space separated list
# of strings and they are matched against the 'ps' listing
# line by line (see -s option). Not only the process name
# is matched, but the whole process information. So, for example,
# to terminate all the processes run on a tty pts/2 by a current
# user, run:
# slay pts/2
#
# Other examples:
# slay -s 1543 .. terminate the process with PID 1543
# slay 1543 .. as above, but also terminate all its children
# slay /usr/ .. terminate all the current user processes
# which include /usr/ in the process location path
#
# January 20, 2004: Update to recoginize Linux. Changed kill -s KILL
# to kill -9, since Linux did not want to end the program by the
# former command. Replaced the eval echo by better filtering command.
#
#-----------------------------------------------------------------------------
#

# Uncomment the set -v line for debugging
# set -v

function vPrintUsage
{
echo "Usage: slay [-a|-c] [-s[a]] [-f|-i] [-h] []";
echo " -a[ll] .. all processes (default current user's only)";
echo " -c[urrent] .. current terminal for current user (default all terms)";
echo " -s[hort] .. short items of the process list (default long)";
echo " -s[ar]g[s] (-g) .. short items but full args (default long)";
echo " -f[orce] .. force (non-interactive)";
echo " -i[nfo] .. info only, do not terminate anything";
echo " -h[elp] .. help";
echo " Running plain slay will return all processes running for a current user.";
}

# -- Check the echo command for flags to force it not to print
# newline at the end of the printed line.
if [[ "`echo 'ble_ble\c'`" = "ble_ble\c" ]] ;
then
EFLAG="-n"
ENDER=""
else
EFLAG=""
ENDER="\c"
fi
ECHO="echo ${EFLAG}"
UNAME=`uname`

# -- Set Unix type flags
integer LINUX_FLAG
let LINUX_FLAG=0

# -- Set Unix type
if [[ ${UNAME} = "HP-UX" ]] ;
then
if [[ ${UNIX95:-"__Unix95NotSet__"} = "__Unix95NotSet__" ]] ;
then
echo "slay: WARNING: UNIX95 variable undefined .. exporting UNIX95=true"
export UNIX95="true"
fi

if [[ -z ${UNIX95} ]] ;
then
echo "slay: WARNING: UNIX95 variable empty .. exporting UNIX95=true"
export UNIX95="true"
fi
else
if [[ ${UNAME} = "Linux" ]] ;
then
let LINUX_FLAG=1
fi
fi

# -- Check the existence of the dash parameter
# and set inputs to local variables. The dash option
# can be position anywhere in the line.
# PS_PARAM initial value is all processes for current user only
# on all terminals (-u ${USER})

SLAY_PID=$$
PROCESS_NAME_LIST=""
integer ALL_FLAG
integer CURRENT_FLAG
integer FORCE_FLAG
integer INFO_FLAG

PS_EXEC="ps"
PS_PARAM="-u ${USER}"
PS_SHORT_PARAM="-f"
PS_O_USER_PARAM=""
AWK_FIELD='$2'
AWK_FILTER='$2, $3'

ARGS_PS_SHORT_PARAM="-o user,pid,args"

let ALL_FLAG=0
let CURRENT_FLAG=0
let FORCE_FLAG=0
let INFO_FLAG=0
let SHORT_FLAG=0

# Parse input arguments
while [[ $# -gt 0 ]] ;
do
ARGUMENT=$1
case ${ARGUMENT} in
-a|-all|--all)
if [[ 0 -ne ${CURRENT_FLAG} ]] ;
then
vPrintUsage
return 0
fi
PS_PARAM="-e"
PS_O_USER_PARAM="-o user="
let ALL_FLAG=1
shift
continue
;;

-c|-current|--current)
if [[ 0 -ne ${ALL_FLAG} ]] ;
then
vPrintUsage
return 0
fi
PS_PARAM=""
let CURRENT_FLAG=1
shift
continue
;;

-s|-short|--short)
PS_SHORT_PARAM="-o user,pid,ppid,stime,tty,time,fname"
AWK_FIELD='$2'
AWK_FILTER='$2, $3'
let SHORT_FLAG=1
shift
continue
;;

-g|-sg|-sargs|-shortargs|--shortargs)
PS_SHORT_PARAM="-o user,pid,args"
AWK_FIELD='$2'
AWK_FILTER='$2, $3'
let SHORT_FLAG=1
shift
continue
;;


-f|-force|--force)
if [[ 0 -ne ${INFO_FLAG} ]] ;
then
vPrintUsage
return 0
fi
let FORCE_FLAG=1
shift
continue
;;

-i|-info|--info)
if [[ 0 -ne ${FORCE_FLAG} ]] ;
then
vPrintUsage
return 0
fi
let INFO_FLAG=1
shift
continue
;;

-h|-help|--help)
vPrintUsage
return 0
;;

-*)
vPrintUsage
return 1
;;

*)
;;
esac

# No dash parameter .. store the
PROCESS_NAME_LIST="${PROCESS_NAME_LIST} ${ARGUMENT}"
shift

done

# -- Parameters and input process substring list set
# If there is not name list, then just list the names
# of the processes according to the -a flag.
# If the -a is not specified, then list processes
# for current user only. If -s is not specified, then
# list long listings.

# ps options:
# -e .. every running process
# -f .. full listing
# -u ${USER} .. current user on all terminals
#
# All users setting implies all terminals.
# ps -e .. short item, all users --> all terminals
# ps -e -f .. long item, all users --> all terminals
# ps .. short item, current user, current terminal
# ps -f .. long item, current user, current terminal
# ps -f -u ${USER} .. long item, current user, all terminals
# ps -u ${USER} .. short item, current user, all terminals
#
# The ps command line will look like this:
# ps ${PS_PARAM} ${PS_SHORT_PARAM}

# Check whether there are processes to be slayed
if [[ -z ${PROCESS_NAME_LIST} ]] ;
then

# No process names to process .. just print the processes info.
# Remove the new lines from the list .. second line!
PROCESS_ID_LIST=`${PS_EXEC} ${PS_PARAM} ${PS_SHORT_PARAM} | awk "{print ${AWK_FILTER}}" | grep -v ${SLAY_PID} | awk '{print $1}' | grep -v PID`
PROCESS_ID_LIST=`echo ${PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`

# This has to be there since HP does not allow more than 32 items in the list
if [[ 0 -ne ${ALL_FLAG} ]] ;
then
${PS_EXEC} ${PS_PARAM} ${PS_SHORT_PARAM}
else
${PS_EXEC} ${PS_SHORT_PARAM} -p """$PROCESS_ID_LIST"""
fi
return 0
fi

# -- Some processes were listed .. create a list of all
# the process id's for processes to be slayed
PROCESS_ID_LIST=""
for PROCESS_NAME_ITEM in ${PROCESS_NAME_LIST} ;
do
ITEM_PROCESS_ID_LIST=`${PS_EXEC} ${PS_PARAM} ${PS_SHORT_PARAM} | grep ${PROCESS_NAME_ITEM} | awk "{print ${AWK_FILTER}}" | grep -v ${SLAY_PID} | awk '{print $1}' | grep -v PID`
ITEM_PROCESS_ID_LIST=`echo ${ITEM_PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`
PROCESS_ID_LIST="${PROCESS_ID_LIST} ${ITEM_PROCESS_ID_LIST}"
done
PROCESS_ID_LIST=`echo ${PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`

# -- Check the process id list
if [[ -z ${PROCESS_ID_LIST} ]] ;
then

# If the long process list was used (default) and the process
# was not found, try the short with long arguments
if [[ 0 -eq ${SHORT_FLAG} ]] ;
then
PROCESS_ID_LIST=""
for PROCESS_NAME_ITEM in ${PROCESS_NAME_LIST} ;
do
ITEM_PROCESS_ID_LIST=`${PS_EXEC} ${PS_PARAM} ${ARGS_PS_SHORT_PARAM} | grep ${PROCESS_NAME_ITEM} | awk "{print ${AWK_FILTER}}" | grep -v ${SLAY_PID} | awk '{print $1}' | grep -v PID`
ITEM_PROCESS_ID_LIST=`echo ${ITEM_PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`
PROCESS_ID_LIST="${PROCESS_ID_LIST} ${ITEM_PROCESS_ID_LIST}"
done
PROCESS_ID_LIST=`echo ${PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`
fi

# Check the new process ID list again
if [[ -z ${PROCESS_ID_LIST} ]] ;
then
echo "slay: No such process(es)."
return 0
fi
fi

# -- Filter the process id list. Linux must not have the triple quotes
if [[ 0 -ne ${LINUX_FLAG} ]] ;
then
PROCESS_ID_LIST=`${PS_EXEC} ${PS_SHORT_PARAM} -p ${PROCESS_ID_LIST} | awk "{print ${AWK_FILTER}}" | awk '{print $1}' | grep -v PID`
else
PROCESS_ID_LIST=`${PS_EXEC} ${PS_SHORT_PARAM} -p """${PROCESS_ID_LIST}""" | awk "{print ${AWK_FILTER}}" | awk '{print $1}' | grep -v PID`
fi
PROCESS_ID_LIST=`echo ${PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`

# -- Check the filtered process id list
if [[ -z ${PROCESS_ID_LIST} ]] ;
then
echo "slay: No such filtered process(es)."
return 0
fi

# -- Some processes do exist .. print the processes info any case
# Print the header only if there will be some action
if [[ 0 -eq INFO_FLAG ]] ;
then
echo "> slay: Process list:"
fi
${PS_EXEC} ${PS_SHORT_PARAM} -p """${PROCESS_ID_LIST}"""

# Check whether the information only .. if so, then stop here
if [[ 0 -ne INFO_FLAG ]] ;
then
return 0
fi

# Echo process termination loop
if [[ 0 -eq FORCE_FLAG ]] ;
then
echo ""
echo "> slay: Terminate:"
fi

for PROCESS_ID_ITEM in ${PROCESS_ID_LIST} ;
do
if [[ 0 -ne FORCE_FLAG ]] ;
then
kill -9 ${PROCESS_ID_ITEM}
else
PROCESS_ID_STRING=`${PS_EXEC} ${PS_O_USER_PARAM} -o user= -o pid= -o ppid= -o stime= -o tty= -o time= -o args= -p ${PROCESS_ID_ITEM}`

if [[ ! -z ${PROCESS_ID_STRING} ]] ;
then
${ECHO} "${PROCESS_ID_STRING} ? (y/n)[n] ${ENDER}"
read CHECK
if [[ "${CHECK}" = "y" || "${CHECK}" = "Y" ]] ;
then
kill -9 ${PROCESS_ID_ITEM}
fi
fi
fi
done

return 0

#
#-----------------------------------------------------------------------------
#

Tuesday, January 3, 2012

Detect duplicate files in Linux or cygwin

In doing some back up of pictures and movies, I wanted to detect duplicate files and delete them. I found this post, but it doesn't run efficiently on large file sets since it md5sum's all files regardless of whether a size match amongst all the other files has been found. A stat command on a file is much quicker than actually computing the md5sum since a stat command looks at the file's header information, whereas an md5sum command has to read the entire file off the disk. Thus, I only want to do an md5sum of files where the length matches other files, since only files with a length equal to another file's length can possibly be a duplicate of another file.

We come to the following "one-liner", which only md5sum's the files that have identical length to some other file in the system:
find . -type f -exec stat --printf='%32s ' {} \; -print |\
    sort -rn |\
    uniq -d -w32 --all-repeated=separate |\
    awk '(NF > 0){
        system( \
            "md5sum \"`echo \"" $0 "\"|\
            sed -r \"s/^ *[0-9]+ *//\" `\" |\
            cut -c 1-32 | tr -d [:space:] " );
        printf " %32s %s", $1, $2 ;\
        for (i = 3; i <= NF; i++) printf $i " "; \
        printf "\n;\
    sort -r |\
    uniq -d -w65 --all-repeated=separate  |\
    awk '{for (i = 3; i <= NF; i++) printf $i " ";print "";}'
 # Print size of every file to find duplicated file sizes                        
 # sort  to put duplicate files sizes together ( for uniq )
 # print all files that have file sizes equal to any other file
 # for the remaining files, insert the md5sum after the file size
 # do the md5sum while allowing for spaces in the file names
 # ( More than 1 consecutive space in a file name is not allowed..)

 # print out the file size with a fixed width for future comparisons and also print the file name 
 


 # re-sort to catch duplicates of _different_ files with _identical_ sizes                        
 # Compare the file size and md5sum for every file and print multiples                            
 # get rid of the file size and md5sum and simply print out the duplicate file names by themselves


If you have a ton of files that you're analyzing, you may want to see progress happening. Due to the nature of sort, it won't generate output until all its input lines are received. We can intercept and log the intermediate output so you can see progress by modifying the command with tee:

find . -type f -exec stat --printf='%32s ' {} \; -print |\
    tee find_stat.log |\
    sort -rn |\
    uniq -d -w32 --all-repeated=separate |\
    awk '(NF > 0){
        system( \
            "md5sum \"`echo \"" $0 "\"|\
            sed -r \"s/^ *[0-9]+ *//\" | tee -a f.log`\" |\
            cut -c 1-32 | tr -d [:space:] " );
"        printf " %32s %s", $1, $2 ;\
        for (i = 3; i <= NF; i++) printf $i " "; \
        printf "\n;\
    }' |\
    tee md5sum_generation.log |\
    sort -r |\
    uniq -d -w65 --all-repeated=separate  |\
    awk '{for (i = 3; i <= NF; i++) printf $i " ";print "";}' | \
    tee repeated_files.log


Then, in a seperate window from the above command, you can run tail:
tail -f find_stat.log
or
tail -f md5sum_generation.log