"To boldly go where no man has gone before" is a
Registered Trademark of Paramount Pictures Corporation.
Copyright (C) 1989, 1991 - 1996 Free Software Foundation, Inc.
This is Edition 1.0 of AWK Language Programming,
for the 3.0 (or later) version of the GNU implementation of AWK.
Published by the Free Software Foundation
59 Temple Place -- Suite 330
Boston, MA 02111-1307 USA
Phone: +1-617-542-5942
Fax (including Japan): +1-617-542-2652
Printed copies are available for $25 each.
ISBN 1-882114-26-4
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation.
Cover art by Etienne Suvasa.
To Miriam, for making me complete.
To Chana, for the joy you bring us.
To Rivka, for the exponential increase.
awk
language; using this
book.
awk
. How
to run an awk
program. Command line
syntax.
awk
programs.
awk
. Describes the
print
and printf
statements.
Also describes redirection of output.
gawk
.
awk
Functions.
awk
programs with complete
explanations.
awk
language.
gawk
Options and Language Summary.
gawk
under various operating
systems.
gawk
.
gawk
.
gawk
and awk
.
awk
programs illustrated in this book.
awk
.
gawk
programs; includes
command line syntax.
awk
program.
awk
programs in
files.
awk
programs.
gawk
programs.
awk
.
gawk
and when to use other
things.
FS
from the command line.
getline
function.
getline
function.
getline
with no arguments.
getline
into a variable.
getline
from a file.
getline
into a variable from a
file.
getline
from a pipe.
getline
into a variable from a
pipe.
getline
Variants.
print
statement.
print
statements.
print
.
printf
statement.
printf
statement.
gawk
.
gawk
allows access to inherited file
descriptors.
awk
statements.
awk
.
awk
.
awk
gives you
information.
ARGC
and ARGV
.
for
statement. It
loops through the indices of an array's
existing elements.
delete
statement removes an element
from an array.
awk
.
awk
.
int
, sin
and rand
.
split
, match
, and
sprintf
.
awk
programs.
gawk
.
gawk
.
nextfile
function.
awk
programs.
cut
utility.
egrep
utility.
id
utility.
split
utility.
tee
utility.
uniq
utility.
wc
utility.
awk
programs.
tr
utility.
awk
that includes files.
awk
.
gawk
not in POSIX
awk
.
awk
's built-in variables.
awk
are numbers or strings.
awk
operators.
printf
.
gawk
distribution.
gawk
under various versions
of Unix.
gawk
under Unix.
gawk
on VMS.
gawk
under VMS.
gawk
under VMS.
gawk
under VMS.
gawk
on MS-DOS
and OS/2
gawk
on the Atari ST.
gawk
on Atari
gawk
on Atari
gawk
on an Amiga.
awk
implementations.
gawk
extensions.
gawk
.
gawk
.
gawk
to a new operating system.
This book teaches you about the awk
language and
how you can use it effectively. You should already be familiar with basic
system commands, such as cat
and ls
,(1) and basic shell
facilities, such as Input/Output (I/O) redirection and pipes.
Implementations of the awk
language are available for many different
computing environments. This book, while describing the awk
language
in general, also describes a particular implementation of awk
called
gawk
(which stands for "GNU Awk"). gawk
runs on a broad range
of Unix systems, ranging from 80386 PC-based computers, up through large scale
systems, such as Crays. gawk
has also been ported to MS-DOS and
OS/2 PC's, Atari and Amiga micro-computers, and VMS.
gawk
and awk
.
awk
and gawk
The name awk
comes from the initials of its designers: Alfred V.
Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of
awk
was written in 1977 at AT&T Bell Laboratories.
In 1985 a new version made the programming
language more powerful, introducing user-defined functions, multiple input
streams, and computed regular expressions.
This new version became generally available with Unix System V Release 3.1.
The version in System V Release 4 added some new features and also cleaned
up the behavior in some of the "dark corners" of the language.
The specification for awk
in the POSIX Command Language
and Utilities standard further clarified the language based on feedback
from both the gawk
designers, and the original Bell Labs awk
designers.
The GNU implementation, gawk
, was written in 1986 by Paul Rubin
and Jay Fenlason, with advice from Richard Stallman. John Woods
contributed parts of the code as well. In 1988 and 1989, David Trueman, with
help from Arnold Robbins, thoroughly reworked gawk
for compatibility
with the newer awk
. Current development focuses on bug fixes,
performance improvements, standards compliance, and occasionally, new features.
The Free Software Foundation (FSF) is a non-profit organization dedicated to the production and distribution of freely distributable software. It was founded by Richard M. Stallman, the author of the original Emacs editor. GNU Emacs is the most widely used version of Emacs today.
The GNU project is an on-going effort on the part of the Free Software
Foundation to create a complete, freely distributable, POSIX compliant
computing environment. (GNU stands for "GNU's not Unix".)
The FSF uses the "GNU General Public License" (or GPL) to ensure that
source code for their software is always available to the end user. A
copy of the GPL is included for your reference
(see section GNU GENERAL PUBLIC LICENSE).
The GPL applies to the C language source code for gawk
.
As of this writing (1995), the only major component of the
GNU environment still uncompleted is the operating system kernel, and
work proceeds apace on that. A shell, an editor (Emacs), highly portable
optimizing C, C++, and Objective-C compilers, a symbolic debugger, and dozens
of large and small utilities (such as gawk
),
have all been completed and are freely available.
Until the GNU operating system is released, the FSF recommends the use
of Linux, a freely distributable, Unix-like operating system for 80386
and other systems. There are many books on Linux. One freely available one
is Linux Installation and Getting Started, by Matt Welsh.
Many Linux distributions are available, often in computer stores or
bundled on CD-ROM with books about Linux. Also, the FSF provides a Linux
distribution ("Debian"); contact them for more information.
See section Getting the gawk
Distribution, for the FSF's contact
information.
(There are two other freely available, Unix-like operating systems for
80386 and other systems, NetBSD and FreeBSD. Both are based on the
4.4-Lite Berkeley Software Distribution, and both use recent versions
of gawk
for their versions of awk
.)
This book you are reading now is actually free. The
information in it is freely available to anyone, the machine readable
source code for the book comes with gawk
, and anyone
may take this book to a copying machine and make as many
copies of it as they like. (Take a moment to check the copying
permissions on the Copyright page.)
If you paid money for this book, what you actually paid for was the book's nice printing and binding, and the publisher's associated costs to produce it. We have made an effort to keep these costs reasonable; most people would prefer a bound book to over 300 pages of photo-copied text that would then have to be held in a loose-leaf binder (not to mention the time and labor involved in doing the copying). The same is true of producing this book from the machine readable source; the retail price is only slightly more than the cost per page of printing it on a laser printer.
This book itself has gone through several previous,
preliminary editions. I started working on a preliminary draft of
The GAWK Manual, by Diane Close, Paul Rubin, and Richard
Stallman in the fall of 1988.
It was around 90 pages long, and barely described the original, "old"
version of awk
. After substantial revision, the first version of
the The GAWK Manual to be released was Edition 0.11 Beta in
October of 1989. The manual then underwent more substantial revision
for Edition 0.13 of December 1991.
David Trueman, Pat Rankin, and Michal Jaegermann contributed sections
of the manual for Edition 0.13.
That edition was published by the
FSF as a bound book early in 1992. Since then there have been several
minor revisions, notably Edition 0.14 of November 1992 that was published
by the FSF in January of 1993, and Edition 0.16 of August 1993.
Edition 1.0 of AWK Language Programming represents a significant re-working of The GAWK Manual, with much additional material. The FSF and I agree that I am now the primary author. I also felt that it needed a more descriptive title.
AWK Language Programming will undoubtedly continue to evolve.
An electronic version
comes with the gawk
distribution from the FSF.
If you find an error in this book, please report it!
See section Reporting Problems and Bugs, for information on submitting
problem reports electronically, or write to me in care of the FSF.
I would like to acknowledge Richard M. Stallman, for his vision of a better world, and for his courage in founding the FSF and starting the GNU project.
The initial draft of The GAWK Manual had the following acknowledgements:
Many people need to be thanked for their assistance in producing this manual. Jay Fenlason contributed many ideas and sample programs. Richard Mlynarik and Robert Chassell gave helpful comments on drafts of this manual. The paper A Supplemental Document for
awk
by John W. Pierce of the Chemistry Department at UC San Diego, pinpointed several issues relevant both toawk
implementation and to this manual, that would otherwise have escaped us.
The following people provided many helpful comments on Edition 0.13 of The GAWK Manual: Rick Adams, Michael Brennan, Rich Burridge, Diane Close, Christopher ("Topher") Eliot, Michael Lijewski, Pat Rankin, Miriam Robbins, and Michal Jaegermann.
The following people provided many helpful comments for Edition 1.0 of AWK Language Programming: Karl Berry, Michael Brennan, Darrel Hankerson, Michal Jaegermann, Michael Lijewski, and Miriam Robbins. Pat Rankin, Michal Jaegermann, Darrel Hankerson and Scott Deifik updated their respective sections for Edition 1.0.
Robert J. Chassell provided much valuable advice on the use of Texinfo. He also deserves special thanks for convincing me not to title this book How To Gawk Politely. Karl Berry helped significantly with the TeX part of Texinfo.
David Trueman deserves special credit; he has done a yeoman job
of evolving gawk
so that it performs well, and without bugs.
Although he is no longer involved with gawk
,
working with him on this project was a significant pleasure.
Scott Deifik, Darrel Hankerson, Kai Uwe Rommel, Pat Rankin, and Michal
Jaegermann (in no particular order) are long time members of the
gawk
"crack portability team." Without their hard work and
help, gawk
would not be nearly the fine program it is today. It
has been and continues to be a pleasure working with this team of fine
people.
Jeffrey Friedl provided invaluable help in tracking down a number
of last minute problems with regular expressions in gawk
3.0.
David and I would like to thank Brian Kernighan of Bell Labs for
invaluable assistance during the testing and debugging of gawk
, and for
help in clarifying numerous points about the language. We could not have
done nearly as good a job on either gawk
or its documentation without
his help.
I would like to thank Marshall and Elaine Hartholz of Seattle, and Dr.
Bert and Rita Schreiber of Detroit for large amounts of quiet vacation
time in their homes, which allowed me to make significant progress on
this book and on gawk
itself. Phil Hughes of SSC
contributed in a very important way by loaning me his laptop Linux
system, not once, but twice, allowing me to do a lot of work while
away from home.
Finally, I must thank my wonderful wife, Miriam, for her patience through the many versions of this project, for her proof-reading, and for sharing me with the computer. I would like to thank my parents for their love, and for the grace with which they raised and educated me. I also must acknowledge my gratitude to G-d, for the many opportunities He has sent my way, as well as for the gifts He has given me with which to take advantage of those opportunities.
Arnold Robbins
Atlanta, Georgia
January, 1996