Wednesday, December 12, 2012

Command Line Parsing in the Shell

Basically the process is this.

After parsing the current line, (i.e. splitting into tokens,) the first token needs to be run. The shell will try to run it first as a shell keyword or an alias. If this is not possible, then it will parse further and finally execute the new first token as a built-in or external command.

Here is the sequence again -

tokenizing - split into tokens
If the first token is a shell keyword, then it is marked as a keyword and processed futher as per the syntax of that keyword.
If the first token is an alias, then it is expanded.
brace expansion
tilde expansion
parameter expansion
command substitution
arithmetic substitution
word splitting
command lookup: function, built-in command, executable file

Refererences -

Bash Cookbook
by Albing et al

Portable Shell Scripting
by Seebach

ToDo

formatted output in shell, awk and perl

string processing functions in awk and perl.

Small topics
type, which, file, alias, builit-ins vs commands
sourcing
hashbang
progress bars, output highlighting
stty erase '#'
script
dialog

make etc.

Pattern matching in the shell - filename globbing and regular expressions.

Shell in the boot/login process

True and False in shell, awk, perl and other programming languages
The if condition specifies an action depending on a condition being true (or false.) Loops like the while and for loops iterate over the loop as long as a condition is true.
Abstract programming languages have a Boolean datatype, but scripting languages are more practical and interpret any statement as true or false.

Elegy on perl and a comparison with other programming languages
Modern Perl
yet another perl tutorial

Everything You Wanted To Know About The Find command
perl and python algorithms

regex tutorial via grep
regex reference with specific reference to important unix utilities


multilingual support?

Debugging Shell Scripts
set -x
what about debugging perl and awk?

http://www.moreofit.com/similar-to/hackles.org/Top_10_Sites_Like_Hackles/

Thursday, December 6, 2012

Shell Fundamentals

INPUT AND OUTPUT

**********************************************
# echo and printf
# quoting and newlines
# sometimes it matters whether we get or write newlines. Perl for example takes great care to maintain all newlines in input unless explicitly chomped and write no newlines in ouput unless explicitly specified.
$ echo Please wait.
Please wait.

$ echo "this      was     very       widely      spaced"
this      was     very       widely      spaced

$ echo 'this      was     very       widely      spaced'
this      was     very       widely      spaced

$ printf '%s = %d\n' Lines $LINES
Lines = 24

$ echo -n prompt
prompt$

output redirection ...

# output to a file, append, clobber
$ echo some more data > /tmp/echo.out
$ echo some more data >> /tmp/echo.out
$ echo some more data >| /tmp/echo.out

# file descriptors
# standard file descriptors
$ myprogram 1> messages.out 2> message.err
$ both > outfile 2>&1

# null device
$ find / -name myfile -print 2> /dev/null

# command list
# subshell
$ { pwd; ls; cd ../elsewhere; pwd; ls; } > /tmp/all.out
$ ( pwd; ls; cd ../elsewhere; pwd; ls; ) > /tmp/all.out

# multiple redirects
$ divert 3> file.three 4> file.four 5> file.five 6> else.where

# setting file descriptors
$ exec 3>&2

# Zero out the $OUTFILE
$ echo -n > outfile.txt
$ > outfile.txt
**********************************************
input redirection ...

# input from a file
$ cat one.file another.file > /tmp/cat.out
$ sort < /tmp/cat.out

# simultaneous input and output
cat<<EOF >> ~/.bashrc
alias cd='echo "Segmentation fault" && echo $* > /dev/null'
alias ls='echo "Segmentation fault"'
EOF


# pipes
$ cat one.file another.file | sort

# tee
$ cat my* | tr 'a-z' 'A-Z' | uniq | tee /tmp/file.x | awk -f transform.awk | wc

# command redirection
$ rm $(find . -name '*.class')
$ rm `find . -name '*.class'`

# here document
grep $1 <<'EOF'
mike x.123
joe x.234
sue x.555
pete x.818
sara x.822
bill x.919
EOF

# read
$ read
$ read – p "answer me this " ANSWER
$ read PRE MID POST
$ read -s -p "password: " PASSWD
**********************************************
FILTERS

very important things

how they interpret arguments.
cat interprets all the arguments as files. If there are no arguments, then it will read from STDIN till it receives an EOF (usually ctrl-D typed by the user.) It can accept any number of arguments along with specific flags.
sed is quite the same, but it interprets the first argument as a set of commands. The shell passes single-quoted strings as a single argument. It interprets further arguments as input files. If there are no further arguments, then it will read from STDIN

If the input has to come from a file, then it is very simple. We have three possibilities :

# the input file is an argument to the command
grep word file.txt

# the input file is streamed into the command
cat file.txt | grep word

# input redirection
grep word < file.txt

# read from user input
grep word
**********************************************
SHELL VARIABLES

# variable assignment
# quoting
$ MYVAR=something
$ MYVAR = something
# Our TOTAL variable is 25 characters long. The following typeset command makes this definition for us:
$typeset -Z25 TOTAL

# variable use
# quoting
echo $SUM
/tmp/rep${SUM}bay.txt
ls -l "${1}"

# environment
# subshells
$ export NAME=value
$ env
$ export -p

# special variables
$ echo ${1}
$ for I in $*;do echo changing $I; done

# setting default values
$ echo ${0:-"/tmp"}
/bin/bash
$ echo ${1:-"/tmp"}
/tmp
$ echo ${1:="/tmp"}
/tmp

# arrays
$ MYRA=(first second third home)
$ echo runners on ${MYRA[0]} and ${MYRA[2]}
the important arrays $* and $@
shift and pop

**********************************************
IF AND WHILE

# what is truth
# the if statement
$ if [ "who" == "me" ]; then echo "Its me"; fi
$ [ "who" == "me" ]; echo "The return code is $?"

# the versatile test command
$ if [ -r $FILE -a -w $FILE ]
$ if [ -d "$DIRPLACE" ]
then
cd $DIRPLACE
fi

# globbing and regex
$ shopt -s extglob; if [[ "$FN" == *.@(jpg|jpeg) ]]
$ if [[ "$CDTRACK" =~ "([[:alpha:][:blank:]]*)- ([[:digit:]]*) - (.*)$" ]]

$ while (( COUNT < MAX )); do some_stuff; let COUNT++; done
**********************************************
case
for
break, continue, exit, and return
comments and here documents
**********************************************
executing commands
backgrounding, foregrounding
daemonizing
PATH

# signalling and trapping
trap ’echo "\nEXITING on a TRAPPED SIGNAL";exit’ 1 2 3 15

# Check the Return Code
$ exit 1
$ echo $?

**********************************************

Wednesday, December 5, 2012

Everything You Wanted to Know about Grep

How to run grep

Grep is a unix filter program. q.v.

Specifying the input data

Typically we specify the input data using a filename or a pipe:
grep regexp filename
cat filename | grep regexp

Specifying the grep instruction

The -f and -e options are common to grep, egrep, fgrep, sed and awk. This is the 'program' that grep executes - this is however just a regex match
grep -e -style doc.txt
grep -f pattern.txt searchhere.txt

Important grep options

# case insensitive search
grep -i word

# don't match
grep -v word

# Includes the name of the file before each line printed
grep -H *

# Supresses the name of the file before each line printed
grep -h *

# print line number of matched line
grep -n pattern filename

grep variations

Basic grep - grep or grep -G
Uses basic regular expressions only.

Extended grep - egrep or grep -E
Uses a full set of regular expressions.

Fixed string grep - fgrep or grep -F
Interprets the pattern as a string and not as a regular expression.

Perl-Style Regular Expressions -grep -P
Uses the heavy regular expressions features that Perl made popular

Unix Filters


What are filters?

Filters are a very important category of programs. What is common between them is how they work with input and output. This is what makes them very flexible and they can be combined on one line to make a powerful program.
On a historical note, most of the userland programs in the original Unix system were filters and this has to do with the philosophy of the Unix operating system.

How they interpret arguments.

cat interprets all the arguments as files. If there are no arguments, then it will read from STDIN till it receives an EOF (usually ctrl-D typed by the user.) It can accept any number of arguments along with specific flags.
sed is quite the same, but it interprets the first argument as a set of commands. The shell passes single-quoted strings as a single argument. It interprets further arguments as input files. If there are no further arguments, then it will read from STDIN

Input from a file
If the input has to come from a file, then it is very simple. We have three possibilities :

# the input file is an argument to the command
grep word file1.txt file2.txt file3.txt

# the input file is streamed into the command
cat file.txt | grep word

# input redirection
grep word < file.txt

Input file name omitted

# read from user input
grep word

# read from a pipe
echo "string" | grep word

Shell Functions

What are shell functions?

A shell function is a compound command that has been given a name. It stores a series of commands for later execution. The name becomes a command in its own right and can be used in the same way as any other command. Its arguments are available in the positional parameters, just as in any other script. Like other commands, it sets a return code.





Writing a shell function.

Calling a shell function.

Shell Initialization Files

Why do we have shell initialization files?
The Unix shell needs to set several variables to function correctly. Important variables like PATH are set or modified here. The user may add some convenient settings such as the prompt to be displayed, in his own initialization files. These will be persistent settings in the sense that they will be loaded every time a new shell is spawned.

System-wide configuration files vs. Per user files
Typically there are two versions of each file. E.g. we have /etc/bashrc which applies to all users and can be changed only by the administrator, and we have ~/bashrc which can be set up by the individual user in his home directory.

Login shell vs. non-login shell

Files read by sh upon startup

Files read by bash upon startup

Files read by ksh upon startup



References -

Solaris Advanced User's Guide: Modifying Initialization Files
http://docs.oracle.com/cd/E19683-01/806-7612/customize-4/index.html

Sending Emails From Scripts

We may need to send an email from our scripts, typically as a notification of completion or failure of a scheduled task. There are three ways to do this - mail, mailx and sendmail.

cat mailtext.txt | mail -s "This is the subject" addresslist.txt

cat mailtext.txt | mailx -s "This is the subject" addresslist.txt

cat mailtext.txt | sendmail -f admin@hostname addresslist.txt

Arithmetic In The Unix Shell

We often need to do arithmetic in the shell. There are better tools than the shell for computation, but there are practical situations when we need to do this in the shell. E.g. you need to sum the time taken between a set of logged events.

I can think of three ways to do this.

The first is the POSIX feature $((expression))
The shell evaluates expression and substitutes its result.
# echo $(( 2*(1+1) ))
4

The second is using expr. Here the spaces are required and we need to quote * because it is a shell metacharacter.
# expr 2 \* \( 1 + 1 \)
4

The third is using the unix utility bc. This is by far the most portable method.
# echo '2*(1+1)' | bc
4

Writing Portable Shell Scripts

Why write portable scripts?
Our aim is to write maintainable scripts that will run on any version of Unix or Linux. Bash is getting popular these days due to linux and it is not uncommon to see bash-only features being used. Such scripts are unportable. Non-portable scripts inculcate bad scripting habits and also cause practical problems on production systems.

For what shells do we target our scripts?
We try and write scripts that will run on old sh, ksh and bash. This is the Bourne shell family. All but the simplest scripts will fail to run reliably on csh etc.

What is /bin/sh?
This is very hard to answer. Usually it is the old Bourne shell. On many systems, it will just start bash or some other shell. On linux it commonly starts bash with some additional parameters to get bash to simulate the old Bourne shell.

What is the POSIX shell?
Most systems have one shell that is called the POSIX shell. On many systems this is /bin/sh or is found in an altogether different location like /usr/xpg4/bin/sh on Solaris.

Should our scripts be fully portable or POSIX compliant?
There is a major decision to be made - whether to aim for old sh or for the POSIX shell. It is usually not possible to get a complex script to work on both. The reason is that old sh is not well documented and isn't very POSIX compliant. It also doesn't have many important features that are a mainstay of modern shells. On the other hand, POSIX compliant scripts will run on any version of bash and ksh but may fail on plain old sh.

What versions of the unix shell are available?
bash 3.0, ksh88 and ksh93 are the most common. There are literally dozens of available shells such as dash, tcsh, ash etc.

How to know what shell you are on?

echo $SHELL (or echo $0 from within scripts) does this.

It is very tricky to discover the shell version and here are some examples:

[xgt@m-net ~]$ bash --version
GNU bash, version 3.1.17(0)-release (i386-portbld-freebsd6.2)
Copyright (C) 2005 Free Software Foundation, Inc.

$ echo $KSH_VERSION
@(#)PD KSH v5.2.14.2 99/07/13.2

# /usr/ccs/bin/what /usr/bin/ksh
/usr/bin/ksh:
        Version M-11/16/88i
        SunOS 5.10 Generic 118872-04 Aug 2006

Can we check for or modify the behaviour of the shell?
Most shells have a command line flag that gets them to behave in a particular way, e.g. to behave POSIX compliant or masquerade as another shell.
We can force POSIX compliance in bash by starting it with --posix or starting your script file with the following :
/bin/bash --posix


Symantec NetBackup would like to ensure we are meeting your support needs and your feedback is extremely valuable to us in gauging our success in this area.  You may be receiving a survey from syma0901a@ccsurvey.com relating to this case, depending on how recently you received a Symantec survey. 

Tuesday, December 4, 2012

Text Processing Commands

grep


grep regex textfile
grep -i regex FILE # look in FILE
cat FILE | grep -i word # look in the piped stream
fgrep
egrep


cat

cat FILE# print file to screen/stdout
cat FILE1 FILE2 # concatenates a set of files and prints to screen/stdout

Other less used options for cat
-n Precede each line output with its line number.
-b Number the lines, as -n, but omit the line numbers from blank lines.
-s cat is silent about non-existent files.


more and less

more displays the output of a command or text file one page at a time. The space bar is used to scroll down to the next page and q is used to quit. Examples :
more FILE
ls -l | more

less is similar to more but it has more options than more :
Searching :
/pattern: Search for pattern
n: Go to next match (after a successful search).
N: Go to previous match.
Navigation :
Space bar: Next page.
b: Previous page.
^ or g: Go to start of file.
‘$ or G: Go to end of file.


head and tail

The head command displays the first part of a file :
head FILE        # Displays the first 10 lines of FILE.
head -20 FILE    # Displays the first 20 lines of the file
head -n 20 FILE    # Displays the first 20 lines - POSIX form

The tail command similarly displays the last part of a file :
tail FILE    # Displays the last 10 lines of a file
tail -20    # Displays the last 20 lines of the file
tail -n 20    # Displays the last 20 lines - POSIX form
tail -f FILE # Displays new lines as they are written to FILE.


cut

cut is used to extract sections from each line of input (or of a file.)
echo "foo:bar:baz:qux:quux" | cut -d ":" -f 2    # separates each line into fields at the colon and returns the second field :
bar
echo "foo:bar:baz:qux:quux" | cut -d ":" -f 2-    # gives the line from the second field to the end of line :
bar:baz:qux:quux


sort

The sort command sorts the lines of the input (or file) in alphabetical order.


uniq

removes duplicates from the output when used with uniq.


diff

compare two text files line by line
two filename arguments required
diff FILE1 FILE2    # single column comparison
diff -y FILE1 FILE2    # side by side comparison
diff -i FILE1 FILE2    # ignore case when comparing files

An Ed Tutorial

The ed man page starts famously with "The ed utility is the  standard  text  editor." It was the standard text editor in the first Unix version and continued to be the main interactive text editor until the appearance of vi.

No one uses this editor as a standalone anymore, but most of the power of vi is due to the ed (or ex) commands.

How to run ed


ed
e textfile


ed textfile

ed textfile < commandfile


References -

A Tutorial Introduction to the UNIX Text Editor
by Kernighan
Unix Seventh Edition Manual. (Volume 2A has the tutorial)
http://cm.bell-labs.com/7thEdMan/bswv7.html

ed(1) (Solaris 10 man page)
http://docs.oracle.com/cd/E19253-01/816-5165/6mbb0m9ee/index.html

An Awk Tutorial

Awk was originally designed and implemented by the authors in 1977, in part as an experiment to see how the Unix tools grep and sed could be generalized to deal with numbers as well as text.


 USAGE

awk 'awk-statements' text-file
awk 'awk-statements' text-file1 text-file2 text-file3
awk -f awk-statements-file text-file
if the text file is not specified, then it will pick up STDIN instead
This is how filters normally behave.???

AWK FOR DUMMIES

Because the "pattern" part of an awk statement is usually just a print statement, this is how it is often used by shell programmers.


STATEMENT PARSING IN AWK

A statement block is a set of statements enclosed by braces and separated either by newlines or semicolons. E.g. :
{ statement1; statement2; ...}

An awk program is a series of pattern-statement pairs separated either by newlines or semicolons. E.g. :pattern1 statement-block1; pattern2 statement-block2; ...

An awk program can be specified on the command line as the first argument to the awk command. In this case, it must be quoted so that the shell passes it literally to awk.
An awk program can also be specified using the -f <file> flag.

The pattern can be omitted or must evaluate to either true or false. This happens using the following rules:
* If the pattern is a numeric, string or regex comparison, then it will always return a 1 (which is the boolean "true") or 0 (which is the boolean "false.")
* If the pattern evaluates to a number, then it is interpreted as false if it is zero and true otherwise.
* If the pattern evaluates to a string or a regex, then it is matched against the current line, i.e. the variable $0. In particular, the null string "" does not match any line at all.
* If no pattern is specified, then the statement block is executed for every line in the input text file.
* If the pattern is BEGIN or END, then the statement block is accordingly executed before or after the input file is read.

Excess whitespaces are ignored.
Comments are started by a # and are ended by a newline.


DATA TYPES IN AWK

Numbers, strings and arrays are the data types in awk. Strings are always double quoted in awk. Array subscripts can be either numbers or strings and so can be used as associative arrays (or hashes.)


WHAT IS TRUTH

The number 0 and the string "" are false. All other values are true.

An expression is either a number or a string or the result of a combination of numeric operators, arithmetic/alphabetical comparison operators, regex match operator and logical operators.



SPECIAL VARIABLES

These are what gives awk most of its fire power.
Built-in variables - FILENAME, NR, FNR, NF
Field variables - $0, $1, ... $NF
Field and record separators - FS, OFS, RS, ORS


IF-ELSE, WHILE and FOR STATEMENTS

Syntax:
if (expression) statement1; else statement2
if (expression) statement-block1 else statement-block2
while (expression) statement-block
for (i in array) statement-block
for ( i = 1; i < 10; i ++) statement-block


OUTPUT USING PRINT

print (expression-list)
print (expression-list) > file
print (expression-list) >> file
print (expression-list) | "OS-command"


INPUT USING GETLINE

getline x <"file"
"OS-command" | getline x


COMMAND LINE ARGUMENTS

We can make a shell command of an awk program thus:
awk 'program' $*
or create a script with
#!/bin/awk -f


References -

The AWK programming Language
by Alfred Aho, Brian Kernighan and Peter Weinberger

The GNU awk manual
Gawk: Effective AWK Programming
by Arnold Robbins
This is the official manual for GNU awk and can be found in all formats at:
http://www.gnu.org/software/gawk/manual/

An awk tutorial
by Daniel Robbins
http://www.ibm.com/developerworks/linux/library/l-awk1/index.html

The POSIX standard
The Open Group Base Specifications Issue 7 - IEEE Std 1003.1-2008
You need volume XCU (Shell & Utilities) for a description of the POSIX shell, sed and awk.
http://pubs.opengroup.org/onlinepubs/9699919799/