Tuesday, October 16, 2012

Create Playlist for Bible MP3's

I bought a copy of the English Standard Version Bible in mp3 form. Unfortunately, once you load it onto you MP3 player or smartphone, you have a hard time skipping around. I like to read a chapter from the Old Testament, a chapter from the New Testament, a chapter from the "wisdom literature", and a chapter from Psalms. Doing this, skipping around, and remembering which chapter you're on is painful.

That is... until you can create your MP3 play lists! To do the following, you have to have cygwin, Mac OS terminal, or a PC with Linux installed.

  1. Then, with file names like:
    [...]
    Bible - ESV 0934.mp3
    Bible - ESV 0935.mp3
    Bible - ESV 0936.mp3
    Bible - ESV 0937.mp3
    Bible - ESV 0938.mp3
    Bible - ESV 0939.mp3
    Bible - ESV 0940.mp3
    Bible - ESV 0941.mp3
    [...]
    
    You determine which ranges of files you want for your custom playlist. For me, that was
    • Old Testament 1-441,688-941 (without "wisdom literature" or Psalms)
    • New Testament 942-1205
    • Wisdom Literature 442-483,637-687
    • Psalms 484-636
  2. Now I create the M3U header for a given range. Let's say we're doing the Old Testament range:
    echo "#EXTM3U" > old_testament.m3u
  3. Construct a sed command for each range like:
    /BEGINNING_NUMBER/,/ENDING_NUMBER/p;
    
    Then, concatenate all these sed commands together.

    e.g., for the Old Testament range above,you would create these sedcommands:

    /0001/,/0441/p; /0688/,/0941/p;
  4. Now, stick that filter into the following command after navigating to the ESV Study Bible MP3 directory:
    find . -type f | \
        sort -u | \
        sed -n -e 's/^\.\///; YOUR_SED_COMMANDS_HERE'\
        >> old_testament.m3u
    
    e.g. For the Old Testament example:
    find . -type f | \
        sort -u | \
        sed -n -e 's/^\.\///; /0001/,/0441/p; /0688/,/0941/p;'\
        >> old_testament.m3u
    
Stick that .m3u onto your MP3 player or smartphone and you're off to the races.

For Android install MortPlayer and it will help you keep track of where you are in each of the playlists.

Creating one List

If you want to combine all the lists into one mega list that automatically switches between your subranges, use a python script like the following:
class RangeItr:
  def __init__(self,ranges):
    self.ranges_ = ranges
    self.outer_range_idx_ = 0
    self.inner_range_val_ = ranges[0][0]-1
  def __iter__(self):
    return self
  def next(self):
    cur_range = self.ranges_[self.outer_range_idx_]
    self.inner_range_val_+=1
    if self.inner_range_val_ > cur_range[1]:
      self.outer_range_idx_= (self.outer_range_idx_+1)%len(self.ranges_)
      cur_range = self.ranges_[self.outer_range_idx_] 
      self.inner_range_val_ = cur_range[0]
    return self.inner_range_val_


ranges =( RangeItr( ( (1,441),(688,941) ) ),
          RangeItr( ( (942,1205) ,) ),
          RangeItr( ( (442,483),(637,687) ) ),
          RangeItr( ( (484,636), ) )
        )
r = 0
f_num = 0
f = None
for i in range(0,10000):
    if i %200 == 0:
        f_num += 1
        f = open('bible_%02d.m3u'%f_num,'w')
        f.write('#EXTM3U\n')
    f.write("Bible - ESV %04d.mp3\n"%ranges[r].next())
    r = (r + 1 ) % len(ranges)

Friday, September 14, 2012

How to find the time of the last access of any file under a directory in Linux.

Here is a one-liner to find the time of the last access of any file under a directory in Linux:
date -d @`find PATH_TO_DIR -type f -exec stat -c %X '{}' \; | \
  perl -ne 'use List::Util qw[min max]; BEGIN {$max=0;} chomp; $max = max($max,$_); END { print $max ."\n";}'`
Here's a bash script to do this on each file or directory underneath a specified directory:
#!/bin/bash
# use like:
#   least_used_nodes.bash [PATH_TO_DIR]
#
# If PATH_TO_DIR is omitted, the current directory is used.
#
# leaf nodes with spaces in them are not supported

dir="."
if [ "$#" -ne 0 ] ; then
    dir=$1
fi

declare -a access_times
for n in `\ls -1 $dir | egrep -v '^\.{1,2}\$'`  ; do
    echo "Analyzing $n"
    d=`find $dir/$n -type f -exec stat -c %X '{}' \; |\
      perl -ne 'use List::Util qw[min max]; BEGIN {$max=0;} chomp; $max = max($max,$_); END { print $max ."\n";}'`
    #echo $d
    access_times[$d]="${access_times[$d]} $n"
done
echo ""
echo ""
#echo ${access_times[@]}

echo "Printing access times youngest to oldest"
for d in `echo ${!access_times[@]} | sed -r 's/\\s+/\\n/g;' | sort -r` ; do
    #echo $d
    list=${access_times[$d]}
    for n in $list ; do
        echo `date -d @$d +"%Y.%m.%d %kh%Mm"` $n
    done
done

Friday, February 17, 2012

Perl one-liner for only printing lines of standard in that are unique

Only print lines if they have never been seen on standard in:

perl -ne 'if (not defined %d) { %d = ();} if (not exists $d{$_}) { $d{$_} = 1; print $_}'


Used like:

printf "hi\nhi\nbye\n" | \
perl -ne 'if (not defined %d) { %d = ();} if (not exists $d{$_}) { $d{$_} = 1; print $_}'
hi
bye

Thursday, January 26, 2012

Martin Pechanec's slay - process killer in the console

Martin Pechanec's slay is the easiest way to kill processes from the Linux console without having to manually do all that ps ux| grep foo followed by kill -9 666 boilerplate. This script was inspired by the very similar slay command in QNX O.S.

To kill some kate instances with slay, you simply type
% slay kate

and slay will prompt you, one at a time, to kill the processes that have "kate" in their command name:

> slay: Process list:
UID PID PPID C STIME TTY TIME CMD
ubuntu 5834 5481 2 11:33 pts/90 00:00:00 kate
ubuntu 5839 5481 2 11:33 pts/90 00:00:00 kate
ubuntu 5843 32672 0 11:33 ? 00:00:00 kio_file [kdeinit] file /tmp/ksocket-ubuntu/klauncherN2EMKb.slave-socket /tmp/ksocket-ubuntu/kateliTBzb.slave-socket
ubuntu 5844 32672 0 11:33 ? 00:00:00 kio_file [kdeinit] file /tmp/ksocket-ubuntu/klauncherN2EMKb.slave-socket /tmp/ksocket-ubuntu/kateb8tyxa.slave-socket

> slay: Terminate:
ubuntu 5834 5481 11:33 pts/90 00:00:00 kate ? (y/n)[n]


Alternatively, if you know you want to smash every process with "kate" in its process name, you can supply the -f switch:

slay kate -f
> slay: Process list:
UID PID PPID C STIME TTY TIME CMD
ubuntu 5834 5481 0 11:33 pts/90 00:00:00 kate
ubuntu 5839 5481 0 11:33 pts/90 00:00:00 kate
ubuntu 5843 32672 0 11:33 ? 00:00:00 kio_file [kdeinit] file /tmp/ksocket-ubuntu/klauncherN2EMKb.slave-socket /tmp/ksocket-ubuntu/kateliTBzb.slave-socket



To use this awesome script, put the following in a file at $HOME/bin/slay and then put $HOME/bin: at the front of your $PATH variable:

#!/bin/ksh -
#-----------------------------------------------------------------------------
#
# FILE: slay
# AUTHOR: Martin Pechanec, 03-SEPT-2002
#
# $Header:$
# $Log:$
#
#-----------------------------------------------------------------------------
#
# DESCRIPTION
# slay - interactively kill by process name by sending
# SIGKILL (kill -9) or (kill -s KILL) to the process(es)
# selected
#
# USAGE:
# slay [-a|-c] [-s] [-f|-i] [-h] []
#
# If no name or substring is specified then just the process
# information is printed as in ps command.
# It works on the current user processes run on all terminals.
# The information printed out about the processes is in long
# form by default (ps -f). By default the process termination
# is interactive.
# The options:
# -a .. all processes (default current user's only)
# -c .. current terminal for current user (default all terminals
# for current user).
# -s .. short items of the process list (default long)
# -g (-sg) .. same as -s, but long args (default long)
# -f .. force process termination (non-interactive)
# -i .. info only, do not terminate anything
# -h .. help
#
# The is a space separated list
# of strings and they are matched against the 'ps' listing
# line by line (see -s option). Not only the process name
# is matched, but the whole process information. So, for example,
# to terminate all the processes run on a tty pts/2 by a current
# user, run:
# slay pts/2
#
# Other examples:
# slay -s 1543 .. terminate the process with PID 1543
# slay 1543 .. as above, but also terminate all its children
# slay /usr/ .. terminate all the current user processes
# which include /usr/ in the process location path
#
# January 20, 2004: Update to recoginize Linux. Changed kill -s KILL
# to kill -9, since Linux did not want to end the program by the
# former command. Replaced the eval echo by better filtering command.
#
#-----------------------------------------------------------------------------
#

# Uncomment the set -v line for debugging
# set -v

function vPrintUsage
{
echo "Usage: slay [-a|-c] [-s[a]] [-f|-i] [-h] []";
echo " -a[ll] .. all processes (default current user's only)";
echo " -c[urrent] .. current terminal for current user (default all terms)";
echo " -s[hort] .. short items of the process list (default long)";
echo " -s[ar]g[s] (-g) .. short items but full args (default long)";
echo " -f[orce] .. force (non-interactive)";
echo " -i[nfo] .. info only, do not terminate anything";
echo " -h[elp] .. help";
echo " Running plain slay will return all processes running for a current user.";
}

# -- Check the echo command for flags to force it not to print
# newline at the end of the printed line.
if [[ "`echo 'ble_ble\c'`" = "ble_ble\c" ]] ;
then
EFLAG="-n"
ENDER=""
else
EFLAG=""
ENDER="\c"
fi
ECHO="echo ${EFLAG}"
UNAME=`uname`

# -- Set Unix type flags
integer LINUX_FLAG
let LINUX_FLAG=0

# -- Set Unix type
if [[ ${UNAME} = "HP-UX" ]] ;
then
if [[ ${UNIX95:-"__Unix95NotSet__"} = "__Unix95NotSet__" ]] ;
then
echo "slay: WARNING: UNIX95 variable undefined .. exporting UNIX95=true"
export UNIX95="true"
fi

if [[ -z ${UNIX95} ]] ;
then
echo "slay: WARNING: UNIX95 variable empty .. exporting UNIX95=true"
export UNIX95="true"
fi
else
if [[ ${UNAME} = "Linux" ]] ;
then
let LINUX_FLAG=1
fi
fi

# -- Check the existence of the dash parameter
# and set inputs to local variables. The dash option
# can be position anywhere in the line.
# PS_PARAM initial value is all processes for current user only
# on all terminals (-u ${USER})

SLAY_PID=$$
PROCESS_NAME_LIST=""
integer ALL_FLAG
integer CURRENT_FLAG
integer FORCE_FLAG
integer INFO_FLAG

PS_EXEC="ps"
PS_PARAM="-u ${USER}"
PS_SHORT_PARAM="-f"
PS_O_USER_PARAM=""
AWK_FIELD='$2'
AWK_FILTER='$2, $3'

ARGS_PS_SHORT_PARAM="-o user,pid,args"

let ALL_FLAG=0
let CURRENT_FLAG=0
let FORCE_FLAG=0
let INFO_FLAG=0
let SHORT_FLAG=0

# Parse input arguments
while [[ $# -gt 0 ]] ;
do
ARGUMENT=$1
case ${ARGUMENT} in
-a|-all|--all)
if [[ 0 -ne ${CURRENT_FLAG} ]] ;
then
vPrintUsage
return 0
fi
PS_PARAM="-e"
PS_O_USER_PARAM="-o user="
let ALL_FLAG=1
shift
continue
;;

-c|-current|--current)
if [[ 0 -ne ${ALL_FLAG} ]] ;
then
vPrintUsage
return 0
fi
PS_PARAM=""
let CURRENT_FLAG=1
shift
continue
;;

-s|-short|--short)
PS_SHORT_PARAM="-o user,pid,ppid,stime,tty,time,fname"
AWK_FIELD='$2'
AWK_FILTER='$2, $3'
let SHORT_FLAG=1
shift
continue
;;

-g|-sg|-sargs|-shortargs|--shortargs)
PS_SHORT_PARAM="-o user,pid,args"
AWK_FIELD='$2'
AWK_FILTER='$2, $3'
let SHORT_FLAG=1
shift
continue
;;


-f|-force|--force)
if [[ 0 -ne ${INFO_FLAG} ]] ;
then
vPrintUsage
return 0
fi
let FORCE_FLAG=1
shift
continue
;;

-i|-info|--info)
if [[ 0 -ne ${FORCE_FLAG} ]] ;
then
vPrintUsage
return 0
fi
let INFO_FLAG=1
shift
continue
;;

-h|-help|--help)
vPrintUsage
return 0
;;

-*)
vPrintUsage
return 1
;;

*)
;;
esac

# No dash parameter .. store the
PROCESS_NAME_LIST="${PROCESS_NAME_LIST} ${ARGUMENT}"
shift

done

# -- Parameters and input process substring list set
# If there is not name list, then just list the names
# of the processes according to the -a flag.
# If the -a is not specified, then list processes
# for current user only. If -s is not specified, then
# list long listings.

# ps options:
# -e .. every running process
# -f .. full listing
# -u ${USER} .. current user on all terminals
#
# All users setting implies all terminals.
# ps -e .. short item, all users --> all terminals
# ps -e -f .. long item, all users --> all terminals
# ps .. short item, current user, current terminal
# ps -f .. long item, current user, current terminal
# ps -f -u ${USER} .. long item, current user, all terminals
# ps -u ${USER} .. short item, current user, all terminals
#
# The ps command line will look like this:
# ps ${PS_PARAM} ${PS_SHORT_PARAM}

# Check whether there are processes to be slayed
if [[ -z ${PROCESS_NAME_LIST} ]] ;
then

# No process names to process .. just print the processes info.
# Remove the new lines from the list .. second line!
PROCESS_ID_LIST=`${PS_EXEC} ${PS_PARAM} ${PS_SHORT_PARAM} | awk "{print ${AWK_FILTER}}" | grep -v ${SLAY_PID} | awk '{print $1}' | grep -v PID`
PROCESS_ID_LIST=`echo ${PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`

# This has to be there since HP does not allow more than 32 items in the list
if [[ 0 -ne ${ALL_FLAG} ]] ;
then
${PS_EXEC} ${PS_PARAM} ${PS_SHORT_PARAM}
else
${PS_EXEC} ${PS_SHORT_PARAM} -p """$PROCESS_ID_LIST"""
fi
return 0
fi

# -- Some processes were listed .. create a list of all
# the process id's for processes to be slayed
PROCESS_ID_LIST=""
for PROCESS_NAME_ITEM in ${PROCESS_NAME_LIST} ;
do
ITEM_PROCESS_ID_LIST=`${PS_EXEC} ${PS_PARAM} ${PS_SHORT_PARAM} | grep ${PROCESS_NAME_ITEM} | awk "{print ${AWK_FILTER}}" | grep -v ${SLAY_PID} | awk '{print $1}' | grep -v PID`
ITEM_PROCESS_ID_LIST=`echo ${ITEM_PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`
PROCESS_ID_LIST="${PROCESS_ID_LIST} ${ITEM_PROCESS_ID_LIST}"
done
PROCESS_ID_LIST=`echo ${PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`

# -- Check the process id list
if [[ -z ${PROCESS_ID_LIST} ]] ;
then

# If the long process list was used (default) and the process
# was not found, try the short with long arguments
if [[ 0 -eq ${SHORT_FLAG} ]] ;
then
PROCESS_ID_LIST=""
for PROCESS_NAME_ITEM in ${PROCESS_NAME_LIST} ;
do
ITEM_PROCESS_ID_LIST=`${PS_EXEC} ${PS_PARAM} ${ARGS_PS_SHORT_PARAM} | grep ${PROCESS_NAME_ITEM} | awk "{print ${AWK_FILTER}}" | grep -v ${SLAY_PID} | awk '{print $1}' | grep -v PID`
ITEM_PROCESS_ID_LIST=`echo ${ITEM_PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`
PROCESS_ID_LIST="${PROCESS_ID_LIST} ${ITEM_PROCESS_ID_LIST}"
done
PROCESS_ID_LIST=`echo ${PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`
fi

# Check the new process ID list again
if [[ -z ${PROCESS_ID_LIST} ]] ;
then
echo "slay: No such process(es)."
return 0
fi
fi

# -- Filter the process id list. Linux must not have the triple quotes
if [[ 0 -ne ${LINUX_FLAG} ]] ;
then
PROCESS_ID_LIST=`${PS_EXEC} ${PS_SHORT_PARAM} -p ${PROCESS_ID_LIST} | awk "{print ${AWK_FILTER}}" | awk '{print $1}' | grep -v PID`
else
PROCESS_ID_LIST=`${PS_EXEC} ${PS_SHORT_PARAM} -p """${PROCESS_ID_LIST}""" | awk "{print ${AWK_FILTER}}" | awk '{print $1}' | grep -v PID`
fi
PROCESS_ID_LIST=`echo ${PROCESS_ID_LIST} | sed 's/^[ ][ ]*//'`

# -- Check the filtered process id list
if [[ -z ${PROCESS_ID_LIST} ]] ;
then
echo "slay: No such filtered process(es)."
return 0
fi

# -- Some processes do exist .. print the processes info any case
# Print the header only if there will be some action
if [[ 0 -eq INFO_FLAG ]] ;
then
echo "> slay: Process list:"
fi
${PS_EXEC} ${PS_SHORT_PARAM} -p """${PROCESS_ID_LIST}"""

# Check whether the information only .. if so, then stop here
if [[ 0 -ne INFO_FLAG ]] ;
then
return 0
fi

# Echo process termination loop
if [[ 0 -eq FORCE_FLAG ]] ;
then
echo ""
echo "> slay: Terminate:"
fi

for PROCESS_ID_ITEM in ${PROCESS_ID_LIST} ;
do
if [[ 0 -ne FORCE_FLAG ]] ;
then
kill -9 ${PROCESS_ID_ITEM}
else
PROCESS_ID_STRING=`${PS_EXEC} ${PS_O_USER_PARAM} -o user= -o pid= -o ppid= -o stime= -o tty= -o time= -o args= -p ${PROCESS_ID_ITEM}`

if [[ ! -z ${PROCESS_ID_STRING} ]] ;
then
${ECHO} "${PROCESS_ID_STRING} ? (y/n)[n] ${ENDER}"
read CHECK
if [[ "${CHECK}" = "y" || "${CHECK}" = "Y" ]] ;
then
kill -9 ${PROCESS_ID_ITEM}
fi
fi
fi
done

return 0

#
#-----------------------------------------------------------------------------
#

Tuesday, January 3, 2012

Detect duplicate files in Linux or cygwin

In doing some back up of pictures and movies, I wanted to detect duplicate files and delete them. I found this post, but it doesn't run efficiently on large file sets since it md5sum's all files regardless of whether a size match amongst all the other files has been found. A stat command on a file is much quicker than actually computing the md5sum since a stat command looks at the file's header information, whereas an md5sum command has to read the entire file off the disk. Thus, I only want to do an md5sum of files where the length matches other files, since only files with a length equal to another file's length can possibly be a duplicate of another file.

We come to the following "one-liner", which only md5sum's the files that have identical length to some other file in the system:
find . -type f -exec stat --printf='%32s ' {} \; -print |\
    sort -rn |\
    uniq -d -w32 --all-repeated=separate |\
    awk '(NF > 0){
        system( \
            "md5sum \"`echo \"" $0 "\"|\
            sed -r \"s/^ *[0-9]+ *//\" `\" |\
            cut -c 1-32 | tr -d [:space:] " );
        printf " %32s %s", $1, $2 ;\
        for (i = 3; i <= NF; i++) printf $i " "; \
        printf "\n;\
    sort -r |\
    uniq -d -w65 --all-repeated=separate  |\
    awk '{for (i = 3; i <= NF; i++) printf $i " ";print "";}'
 # Print size of every file to find duplicated file sizes                        
 # sort  to put duplicate files sizes together ( for uniq )
 # print all files that have file sizes equal to any other file
 # for the remaining files, insert the md5sum after the file size
 # do the md5sum while allowing for spaces in the file names
 # ( More than 1 consecutive space in a file name is not allowed..)

 # print out the file size with a fixed width for future comparisons and also print the file name 
 


 # re-sort to catch duplicates of _different_ files with _identical_ sizes                        
 # Compare the file size and md5sum for every file and print multiples                            
 # get rid of the file size and md5sum and simply print out the duplicate file names by themselves


If you have a ton of files that you're analyzing, you may want to see progress happening. Due to the nature of sort, it won't generate output until all its input lines are received. We can intercept and log the intermediate output so you can see progress by modifying the command with tee:

find . -type f -exec stat --printf='%32s ' {} \; -print |\
    tee find_stat.log |\
    sort -rn |\
    uniq -d -w32 --all-repeated=separate |\
    awk '(NF > 0){
        system( \
            "md5sum \"`echo \"" $0 "\"|\
            sed -r \"s/^ *[0-9]+ *//\" | tee -a f.log`\" |\
            cut -c 1-32 | tr -d [:space:] " );
"        printf " %32s %s", $1, $2 ;\
        for (i = 3; i <= NF; i++) printf $i " "; \
        printf "\n;\
    }' |\
    tee md5sum_generation.log |\
    sort -r |\
    uniq -d -w65 --all-repeated=separate  |\
    awk '{for (i = 3; i <= NF; i++) printf $i " ";print "";}' | \
    tee repeated_files.log


Then, in a seperate window from the above command, you can run tail:
tail -f find_stat.log
or
tail -f md5sum_generation.log