Awk
was originally designed and implemented by the authors in 1977, in part
as an experiment to see how the Unix tools grep and sed could be
generalized to deal with numbers as well as text.
USAGE
awk 'awk-statements' text-file
awk 'awk-statements' text-file1 text-file2 text-file3
awk -f awk-statements-file text-file
if the text file is not specified, then it will pick up STDIN instead
This is how filters normally behave.???
AWK FOR DUMMIES
Because the "pattern" part of an awk statement is usually just a print statement, this is how it is often used by shell programmers.
STATEMENT PARSING IN AWK
A statement block is a set of statements enclosed by braces and separated either by newlines or semicolons. E.g. :
{ statement1; statement2; ...}
An awk program is a series of pattern-statement pairs separated either by newlines or semicolons. E.g. :pattern1 statement-block1; pattern2 statement-block2; ...
An awk program can be specified on the command line as the first argument to the awk command. In this case, it must be quoted so that the shell passes it literally to awk.
An awk program can also be specified using the -f <file> flag.
The pattern can be omitted or must evaluate to either true or false. This happens using the following rules:
* If the pattern is a numeric, string or regex comparison, then it will always return a 1 (which is the boolean "true") or 0 (which is the boolean "false.")
* If the pattern evaluates to a number, then it is interpreted as false if it is zero and true otherwise.
* If the pattern evaluates to a string or a regex, then it is matched against the current line, i.e. the variable $0. In particular, the null string "" does not match any line at all.
* If no pattern is specified, then the statement block is executed for every line in the input text file.
* If the pattern is BEGIN or END, then the statement block is accordingly executed before or after the input file is read.
Excess whitespaces are ignored.
Comments are started by a # and are ended by a newline.
DATA TYPES IN AWK
Numbers, strings and arrays are the data types in awk. Strings are always double quoted in awk. Array subscripts can be either numbers or strings and so can be used as associative arrays (or hashes.)
WHAT IS TRUTH
The number 0 and the string "" are false. All other values are true.
An expression is either a number or a string or the result of a combination of numeric operators, arithmetic/alphabetical comparison operators, regex match operator and logical operators.
SPECIAL VARIABLES
These are what gives awk most of its fire power.
Built-in variables - FILENAME, NR, FNR, NF
Field variables - $0, $1, ... $NF
Field and record separators - FS, OFS, RS, ORS
IF-ELSE, WHILE and FOR STATEMENTS
Syntax:
if (expression) statement1; else statement2
if (expression) statement-block1 else statement-block2
while (expression) statement-block
for (i in array) statement-block
for ( i = 1; i < 10; i ++) statement-block
OUTPUT USING PRINT
print (expression-list)
print (expression-list) > file
print (expression-list) >> file
print (expression-list) | "OS-command"
INPUT USING GETLINE
getline x <"file"
"OS-command" | getline x
COMMAND LINE ARGUMENTS
We can make a shell command of an awk program thus:
awk 'program' $*
or create a script with
#!/bin/awk -f
References -
The AWK programming Language
by Alfred Aho, Brian Kernighan and Peter Weinberger
The GNU awk manual
Gawk: Effective AWK Programming
by Arnold Robbins
This is the official manual for GNU awk and can be found in all formats at:
http://www.gnu.org/software/gawk/manual/
An awk tutorial
by Daniel Robbins
http://www.ibm.com/developerworks/linux/library/l-awk1/index.html
The POSIX standard
The Open Group Base Specifications Issue 7 - IEEE Std 1003.1-2008
You need volume XCU (Shell & Utilities) for a description of the POSIX shell, sed and awk.
http://pubs.opengroup.org/onlinepubs/9699919799/