AWK Lenguaje
E1000IO3 de Agosto de 2014
7.137 Palabras (29 Páginas)236 Visitas
Awk -- A Pattern Scanning and Processing Language USD:19-1
Awk -- A Pattern Scanning and Processing Language
(Second Edition)
Alfred V. Aho
Brian W. Kernighan
Peter J. Weinberger
AT&T Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
Awk is a programming language whose basic opera-
tion is to search a set of files for patterns, and to
perform specified actions upon lines or fields of lines
which contain instances of those patterns. Awk makes
certain data selection and transformation operations
easy to express; for example, the awk program
length > 72
prints all input lines whose length exceeds 72 charac-
ters; the program
NF % 2 == 0
prints all lines with an even number of fields; and the
program
{ $1 = log($1); print }
replaces the first field of each line by its logarithm.
Awk patterns may include arbitrary boolean combi-
nations of regular expressions and of relational opera-
tors on strings, numbers, fields, variables, and array
elements. Actions may include the same pattern-
matching constructions as in patterns, as well as
arithmetic and string expressions and assignments, if-
else, while, for statements, and multiple output
USD:19-2 Awk -- A Pattern Scanning and Processing Language
streams.
This report contains a user's guide, a discussion
of the design and implementation of awk, and some tim-
ing statistics.
1. Introduction
Awk is a programming language designed to make many common
information retrieval and text manipulation tasks easy to state
and to perform.
The basic operation of awk is to scan a set of input lines
in order, searching for lines which match any of a set of pat-
terns which the user has specified. For each pattern, an action
can be specified; this action will be performed on each line that
matches the pattern.
Readers familiar with the UNIX program grep[1] will recog-
nize the approach, although in awk the patterns may be more gen-
eral than in grep, and the actions allowed are more involved than
merely printing the matching line. For example, the awk program
{print $3, $2}
prints the third and second columns of a table in that order.
The program
$2 ~ /A|B|C/
prints all input lines with an A, B, or C in the second field.
The program
--------------------------------------------------
UNIX is a trademark of AT&T Bell Laboratories.
Awk -- A Pattern Scanning and Processing Language USD:19-3
$1 != prev { print; prev = $1 }
prints all lines in which the first field is different from the
previous first field.
1.1. Usage
The command
awk program [files]
executes the awk commands in the string program on the set of
named files, or on the standard input if there are no files. The
statements can also be placed in a file pfile, and executed by
the command
awk -f pfile [files]
1.2. Program Structure
An awk program is a sequence of statements of the form:
pattern { action }
pattern { action }
...
Each line of input is matched against each of the patterns in
turn. For each pattern that matches, the associated action is
executed. When all the patterns have been tested, the next line
is fetched and the matching starts over.
Either the pattern or the action may be left out, but not
USD:19-4 Awk -- A Pattern Scanning and Processing Language
both. If there is no action for a pattern, the matching line is
simply copied to the output. (Thus a line which matches several
patterns can be printed several times.) If there is no pattern
for an action, then the action is performed for every input line.
A line which matches no pattern is ignored.
Since patterns and actions are both optional, actions must
be enclosed in braces to distinguish them from patterns.
1.3. Records and Fields
Awk input is divided into ``records'' terminated by a record
separator. The default record separator is a newline, so by
default awk processes its input a line at a time. The number of
the current record is available in a variable named NR.
Each input record is considered to be divided into
``fields.'' Fields are normally separated by white space --
blanks or tabs -- but the input field separator may be changed,
as described below. Fields are referred to as $1, $2, and so
forth, where $1 is the first field, and $0 is the whole input
record itself. Fields may be assigned to. The number of fields
in the current record is available in a variable named NF.
The variables FS and RS refer to the input field and record
separators; they may be changed at any time to any single charac-
ter. The optional command-line argument -Fc may also be used to
set FS to the character c.
If the record separator is empty, an empty input line is
Awk -- A Pattern Scanning and Processing Language USD:19-5
taken as the record separator, and blanks, tabs and newlines are
treated as field separators.
The variable FILENAME contains the name of the current input
file.
1.4. Printing
An action may have no pattern, in which case the action is
executed for all lines. The simplest action is to print some or
all of a record; this is accomplished by the awk command print.
The awk program
{ print }
prints each record, thus copying the input to the output intact.
More useful is to print a field or fields from each record. For
instance,
print $2, $1
prints the first two fields in reverse order. Items separated by
a comma in the print statement will be separated by the current
output field separator when output. Items not separated by com-
mas will be concatenated, so
print $1 $2
runs the first and second fields together.
The predefined variables NF and NR can be used; for example
USD:19-6 Awk -- A Pattern Scanning and Processing Language
{ print NR, NF, $0 }
prints each record preceded by the record number and the number
of fields.
Output may be diverted to multiple files; the program
{ print $1 >"foo1"; print $2 >"foo2" }
writes the first field, $1, on the file foo1, and the second
field on file foo2. The >> notation can also be used:
print $1 >>"foo"
appends the output to the file foo. (In each case, the output
...