Abstract: |
The bourne shell (/bin/sh) is both the command interpreter and the
basic programming language on UNIX systems. Any serious UNIX user
should be able to write little programs - called shellscripts - in
sh. It is very easy to gain this abitity from your interactive use of
sh since syntax and commands are the same. You may also use the
knowledge you gained to modify existing shellscripts, such as the
ones that start up your system at boottime.
However, writing somewhat advanced programs in sh quickly gets hairy, as it is a language with low expressional power, but some field-grown, "interesting" constructs on top. At that point, people can either switch to a different language or they could improve their /bin/sh programming knowledge. The Web page is for the latter kind of people, and also for people forced to use /bin/sh for various reasons (i.e. maintaining legacy scripts or being responsible for startup scripts of some kind). |
---|---|
Intended audience: | People writing UNIX shell scripts, having problems with
|
Required knowledge: | Basic shell script knowledge. It also can't hurt if you know what
signals are, i.e. that pressing Control-C on your
keyboards (usually) sends signal SIGINT to all processes running in
the foreground.
|
All example constructs have been tested to work on FreeBSD's /bin/sh (derived from 4.4BSD ash, but seriously changed since then), bash2, SunOS 5.5 /bin/sh and /bin/ksh and pdksh. FIXME original ksh?
However, if you intend to write scripts that have to run on very old systems, you might have to install a more modern shell on those systems. In that case, installating a completely independent scripting system besides that one used for legacy scripts might be useful.
Thankfully, there's a way around that: Always install trap handlers for SIGINT and SIGQUIT, even if your desired action is the default action in most shells anyway. Just remember to kill yourself if you use SIGINT/SIGQUIT as an abortion signal, otherwise you face even more problems with runway, unbreakable scripts (as outlined in sigint.html.
It looks like this for a normal script with no interactive programs called that might use SIGINT or SIGQUIT, hence the signals will always terminate the whole script. This script also illustrates the use of traps to remove temporary files in shellscripts even when those are killed by signals.
#! /bin/sh tmp=/tmp/s.$$ onint () { # maybe do some cleanup here rm $tmp trap SIGINT kill -SIGINT $$ } trap SIGINT onint trap SIGQUIT onint # ignoring that this will exit with SIGINT when # sending SIGQUIT. In POSIX there is no way # for a shell script trap handler to get the number # of the signal sent. trap SIGTERM # In case someone kills this script from kill(1), # the cleanup procedure should also run. rm $tmp # do a long loop that *must* be interruptable for file in *.dat ; do dat2ascii $file >> $tmp done # whatever your temp file was good for wc $tmp rm $tmpThe script above will ensure that the shellscript ends if you break one of the dat2ascii processes with SIGINT, no matter if it exited with the right exit code or not.
As a side note, the specification of signal numbers in POSIX is a
little annoying, imho. You are quaranteed to be able to use the
symbolic names SIG...
as required in C. But you're only
able to use the numbers (written as digits) if the number are exactly
those that they usually are. This may makes sense to keep scripts
portable, but it's a fact that most old shells only allowed you to use
numbers, not names, so your scripts will most definitivly break on old
shells.
Back to our example. So we managed to ensure that it is properly
terminated on the first SIGINT. But if your shellscript contains
interactive programs that might use SIGINT or SIGQUIT as normal keys
that shouldn't terminate the whole thing, you ignore these signals in
your shell. Shells that are implemented with such programs in mind
will behave right, but by explicitly using the trap
command you will ensure that your script doesn't break on others.
#! /bin/sh # while emacs is running, block the signals trap '' SIGINT trap '' SIGQUIT # Understand the difference between blocking and defaulting. The empty # string '' as a trap command blocks the signal. Nothing (like below) # resets it to the default. emacs -nw /tmp/bla # Now that SIGINT and SIGQUIT have their usual meaning again, set # default actions. Even better, use the construction from the # non-interactive script above. trap SIGINT trap SIGQUIT cp /tmp/bla >> $HOME/sent-mail mail joey < /tmp/bla rm /tmp/blaIf you don't block the signals while emacs is running, shells from Type 3 of my sigint.html would execute cp, mail and rm when your emacs session went without using
C-g
, while these
shells would not execute cp, mail and rm when you used
C-g
in emacs. Since C-g
is part of normal
editing, it should not have any effect on later parts of the shell
script. As noted in my Web page, I think this behaviour isn't useful
for exactly this reason, but using this construction you can protect
yourself from these effects.
And don't forget that system(3) calls from C programs are shellscripts as well, an instance of /bin/sh is between you and your called program. You might want to use the same construction to protect yourself from undesired effects simuilar to the former shellscript example.
system("emacs /tmp/foobar.$$;mail foo@bar.com < /tmp/foobar.$$;rm /tmp/foobar.$$");
Always setting traps also saves you from strange effects if your shellscript is entered from another shellscript that already blocks signals. Unless you re-default the signals, your script would also block. On the other hand, it might be useful to inherit the signals settings, no take the required time to reason about the right thing to do.
When a shellscript runs and a signal that is trapped (to a shellscript routine of your choice) is received while a foreground child is running, then the trap function in the shell script is called after the childs exits.
In straightforward use, this is a problem, since you cannot install a trap handler to do anything about it when a program blocks all signals and refuses to give control back.
If you call such a blocking program directly, you will never get your program flow back to the shell, in interactive use this means you'll never get your prompt back.
In interactive use, you usually can send SIGSTOP to get commandline control back, but that is not in every shell and it may stop more than you intended (i.e. if you stack shell scripts).
Sending SIGKILL isn't perfect, either, since it requires you to get an additional command prompt over the network or from on a different (virtual) terminal, but there may not be another login possibilty. Also, SIGKILL will not kill programs that hang in system calls, for example processes hanging on dead NFS filesystems.
Thankfully, POSIX 1003.2 requires that the wait
builtin is
interruptable, so you can solve the simple problem of getting your
prompt (or control to upper scripts) back with SIGINT this way:
#! /bin/sh ./some-blocking-program & wait $!The blocking program will continue to run in the background. This may be useful since it has a chance to complete whatever it attempted to do. If you don't want this, write the script like this:
#! /bin/sh pid= onint() { kill $pid } trap onint SIGINT ./hardguy & pid=$! wait $pid
If you want more flexible handling of possibly blocking programs,
you can extend the mechanism to call wait
in a loop,
while doing bookeeping and decisions in the trap handler.
However, this way of doing things quickly becomes a mess if you have several diffferent commands in a shell loop that may or may not block.
Many of such constructs become more simple if traps would be called immedeately, while is foreground child is still running. You would just install a trap handler that does "something" about the problem and it would be called everytime you hit SIGINT (or SIGQUIT). Just as signals handler in C programs are called immedeately. Maybe I am too much of a C programmer, but I find the delayed sh behaviour very non-intuitive.
#! /bin/sh onsig() { trap SIGINT kill -SIGINT $$ } set -T # set async execution of traps on FreeBSD trap onsig SIGINT ./some-blocking-program set +T # set traps execution behaviour back to normal
This makes the trap handler a bit more complicated, but it allows you to write the main part of your shell script as usualy, without keeping in mind that a program may block and taking the appropriate action about it.
If you had a more complex script instead of just one call to
some-blocking-program, you would face serious complications if you had
to call all of them as a background child, remember the pid and place
the wait
commands at the right place.
To enable constructs like this, I introduced the -T
switch in FreeBSD's /bin/sh. With this switch, your shell will execute
traps immedeately.
Remark: Shell switches like this may be given from inside
the script as shown here, from the commandline as in sh -T
script.sh
or even from the first line of the script #!
/bin/sh -T
as long as you don't need more than one parameter
string (where the string may have more than one option letter).
But keep in mind that the former construction involving
backgrounding and wait
is the only portable solution.
foo=`echo bar | grep bar`
However, POSIX specifies a second mechanism
foo=$(echo bar | grep bar)
The latter has several advantages:
$(...)
is nearer to normal shell rules, especially when the command has
several lines.
This is part of my emacs setup.
(add-hook 'shell-mode-hook '(lambda () (modify-syntax-entry ?. "w"))) (setq sh-shell-file "/bin/sh") (add-hook 'sh-mode-hook '(lambda () (setq tab-width 8))) (add-hook 'sh-mode-hook '(lambda () (setq indent-tabs-mode t))) (add-hook 'sh-mode-hook '(lambda () (substitute-key-definition 'backward-delete-char-untabify 'delete-backward-char sh-mode-map)))This make it much more usable. Any tabwidth other than 8 is evil, since everyone else (probably including your printer) uses 8 and your scripts will be messed up. Also, it resets the construction to use your login script as default script for scripts and uses /bin/sh instead. That one is probably a good candidate for the "bad idea of the year" (TM).
#! /bin/sh for file in *.html ; do target=`echo $file | sed 's/\.html$/.dat/'` grep bla $file > $target doneThis isn't neccessary, POSIX 1003.2 defines the basic editing capabilities needed for this, and the shells I have access to all implement them with no problems.
A modern variant looks like this:
#! /bin/sh for file in *.html ; do target=${file%.html}.dat grep bla $file > $target doneLook up the manpage for your shell, such constructions are available for removing text at the beginning (# and ##) and the end (% and %%) of a variable. While you're at it, make sure you are familiar with the default parameter replacement constructs (the other ${varname...} constructs).
#! /bin/sh if [ $# != 2 ] ; then echo 'You fool!' exit 1In C, people usually remember to send error message to stderr, but in shellscripts this is - sadly - less common. Do it like this:
echo 'You fool!' 1>&2
$*
, use
"$@"
-including the double quotes - . otherwise your
parameters that have whitespace in them will be messed up.
Example script:
#! /bin/sh someprog $* someprog "$*" someprog "$@"
Example call of this script:
./script 1 '2 3'...ends up calling 'someprog' as if you directly typed
someprog 1 2 3 someprog '1 2 3' someprog 1 '2 3'As you can see, only the last one is right.
The nice thing is that you still can use shift
to get
rid of the first parameter(s), passing only the rest to some other
program, still preserving whitespace params, both in shifting and in
calling.
${varname%.txt}
and
similar constructs) and exceution of commands (and inserting their
output into the text that is in double quotes) still work in double
quotes. Be careful that "$@"
is a special thing in
itself.
set
- cannot work with switch parameters that have
whitespace in them. There is not way of fixing this without breaking
behaviour when switch parameters contain shell metachars, at least
these were my findings why I tried to fix it in FreeBSD last time. If
you're in doubt, please see the FreeBSD history of this utility in
/usr/src/usr.bin/getopt.
Huh?
Sorry, in clear text that means if you have a shellscript that uses switches and one of this switches accepts a paramater, this whole thing will not work when a parameter has whitespace in it, although this works with no problems in C programs that use getopt(3) or in shellscript that don't use getopt(1).
./shellscript -i -f 'bla fasel' -qIn this call, 'bla' and 'fasel' will end up as two seperate scripts after processing in the shellscript, breaking the whole commandline parsing.
I'm working on a solution, although a solution may not be non-intrusive ans getopt(1) tries to be. Watch my software page if you're interested (shameless plug :-).
bla=$(( (3 + 5) * 4))which is equivalent to
bla=`expr \( 3 + 5 \) \* 4 `Notes:
echo $(( (3 + 5) * 4))
If you want to use boolean logic expressions in control constructs, you cannot use the exit status, since none is returned. You can, however, compare the result to "0". I think this approach is much cleaner. expr(1) returns its results two ways, as string and as exit status. Exit status != 0 should be reserved for "real" errors, such as syntactically wrong expressions.
if [ $(( 3 > 4 )) != 0 ] ; then echo yes ; else echo no ; fi ==> no if [ $(( 3 > 2 )) != 0 ] ; then echo yes ; else echo no ; fi ==> yesIf the shell has a builtin test(1) (the [...] - construct is just another syntax for calling test), no programs are spawned at all. Even in shells that execute test(1), it still saves the call to expr(1), roughly doubling the speed.
-n
option is used without the required additional
parameter. If you read this, remember that the [...]
construct is just syntactic sugar for test(1).
test -n something
should return true if 'something' is
a string with a length greater than zero. The problem is that test(1)
also should return true when only a string is passed, with no switches.
test -n bla # case is clear: -n is used and there is a string and its # length is more than 0 ==> true test -n '' # case is clear: -n is used and there is a string which is # of length zero ==> false test # case is clear: -n is not used and there is no string # passed ==> false, but could also count as syntax error. test '' # case is clear: -n is not used and the string - although # one is given - has a length of zero. test -n # Now what is this? Is it the switch -n and its parameter # forgotten which would lead to a syntax error? # Or is it just the string '-n', not a switch and should # therefore return true?
From the possible options for the last case...
I don't offer an opinion what is right here, except that the non-switch behaviour should never been included in test(1), as it just doubles the -n option. Besides, this is a typical example of misusing constructs. The return code of UNIX commands is usually used to signal serious problems, failed program runs. In this case, by using the return code as a normal way to communicate, you loose the ability to make it clear when something serious went wrong, such as a call with entirely broken syntax. Ops, I think that counts as opinion.
POSIX 1003.2 is clear about the issue: One parameter always returns true if it isn't the empty string. A call to test(1) with just one paramater cannot be a failed call to a switch. However, this doesn't really improve the situation, since many current system don't follow this rule.
What makes the situation really bad (and the need to eliminate the
non-switch syntax) is the fact that it is easy to loose an empty
string somewhere. While test -n ''
is clear and defined,
many shells and shellscripts aren't careful enough not to through the
empty strings away when handling variable assignments and usage. Thus
the call would in fact lead to the fatal test -n
without
additional parameter.
To understand the following example, recognize that a variable that is assigned to the empty string evaluates to nothing in a shellscript's context, not the empty string (bad enough). Thus, to test whether it is the empty string, but to make sure there is any string at all passed to test(1), it is followed or preceeded by double quotes with nothing inside.
#!/bin/sh foobar="" if test -n ""$foobar; then echo "Help, I am broken" fi
There are shells where this still goes wrong, ignoring the fact
that we already uglified our code behind recognition for the shell's
hail. What happens here is not that ""$foobar
counts as a
nonempty string. What happens is that the complete
""$foobar
is thown away - although the double quotes are
as direct in the code as they can be - so that test(1) still doesn't
get any paramater, not even an empty one. FreeBSD has just
recently been fixed here (thank to Tor Egge), while NetBSD still uses
the old version that removes empty-but-existing strings with no
mercy. The result in this case is that 'test -n' is called without
additional parameter and 'test -n' in 4.4BSD counts as 'no -n switch
has been used, this is just the string "-n"' ==> BOOM! OpenBSD uses
pdksh as the default shell, BTW.
And if that wasn't enough, many shells have a builtin test(1) (like bash) but unlike FreeBSD's /bin/sh. Thus, your script will call different versions of test(1) even on the same machine.
Getting it almost right:
#! /bin/sh # reusable function isempty1 () { [ dreck"${*#-}" != dreck"${*}" ] && return 0 # at this point, it does no begin with -, so we can pass it to # test(1) without parameter [ "$*" ] && return 0 return 1 } # test cases unset foo ; isempty1 $foo && echo ja: $foo foo="" ; isempty1 $foo && echo ja: $foo foo="n" ; isempty1 $foo && echo ja: $foo foo="-n" ; isempty1 $foo && echo ja: $foo foo="-nn" ; isempty1 $foo && echo ja: $foo foo="-n n"; isempty1 $foo && echo ja: $foo foo="n -n"; isempty1 $foo && echo ja: $foo
This solution works by never using the '-n' switch to test(1), neither on purpose, nor by accedentially calling it when it is part of the search string.
The drawback is that you need a shell where the
${varname#xyz}
construct is implemented, which isn't the
case on older shells.
For modern shells this is an improvement, since all of them have
the ${varname#xyz}
construct, but even some current
shells don't treat the -n switch right.
The following is a less efficient version that should work on older shells:
#! /bin/sh # reusable function isempty2 () { echo dreck"${*}" | grep dreck- > /dev/null && return 0 # at this point, it does no begin with -, so we can pass it to # test(1) without parameter [ "$*" ] && return 0 return 1 } # test cases unset foo ; isempty2 $foo && echo ja: $foo foo="" ; isempty2 $foo && echo ja: $foo foo="n" ; isempty2 $foo && echo ja: $foo foo="-n" ; isempty2 $foo && echo ja: $foo foo="-nn" ; isempty2 $foo && echo ja: $foo foo="-n n"; isempty2 $foo && echo ja: $foo foo="n -n"; isempty2 $foo && echo ja: $foo
The efficiency of the version is of course horrible, a grep(1) for each test isn't exactly what you want.
Well, next shot: Make the function select the right one at runtime,
depending of the shell's ability to process the
${varname#xyz}
construct.
#! /bin/sh if [ nsh -c foo=txt.bla ; echo ${foo#txt.}' 2>/dev/null` != ntxt.bla ] ; then eval 'isempty ( ) { isempty1 "$@"; }' else eval 'isempty ( ) { isempty2 "$@"; }' fi # test cases unset foo ; isempty $foo && echo ja: $foo foo="" ; isempty $foo && echo ja: $foo foo="n" ; isempty $foo && echo ja: $foo foo="-n" ; isempty $foo && echo ja: $foo foo="-nn" ; isempty $foo && echo ja: $foo foo="-n n"; isempty $foo && echo ja: $foo foo="n -n"; isempty $foo && echo ja: $foo foo=bar eval echo \$foo foo=bar ; eval echo \$foo
TCL is a small language whose syntax is probably as close to the bourne shell as real languages can get. If you program bourne shell and tcl, you will face many edges where their similarities will help you. TCL is also very strong in writing little applications with graphical user interfaces. I'd say TCL is right is you generally like the bourne shell, if you can't invest as much time into programming as would be required to master two really different languages and/or you want to program GUI applications.
perl. I don't like perl. It's power is enourmous, but in my opinion big programs become unmaintainable too fast, and that makes its power much less useful. Also, like other "little" languages its only implementation is a bytecode interpreter and that may cause enourmous speed differences. Most common tools in perl are really fast, but once you implement something really on your own, performances drops horribly. For example, I once wrote a fault-tolerant string matching function that couldn't be expressed as regular expressions. I had to walk the string in perl code and the speed was as nearly unusable. Of course, the bourne shell and tcl are much slower than perl, but for it's enourmous power the speed just doesn't match. perl is unbeatable if you face input data that is more or less chaotically and doesn't follow a real regular syntax. Also, perl offers access to more features of UNIX than most other scripting languages.
awk is great to process ASCII data files (or program output) of all kind. Making statistic, finding specific things, even using a multiple-table database structure. I find awk much more elegant for these tasks as perl, it's heavy orientation to exacly these tasks make the program much smaller. perl is better when the input data is chaotic, but if the input data is under your control, awk still rules, IMHO. of course, you might also use perl to convert chatic data into regular awk-friendly data and write your program logic in awk. Another big advantage of awk is that it is available on all UNIX systems without requireing the adminstrator to install your own pet language.
If you want a scripting language that may also use lower-level features of the UNIX system (networking, signals) and packs them into a nice regular syntax, you might want to check out scsh, the Scheme shell. Perl is also good in accessing lower-level UNIX features.
C as the native language of UNIX is of course the language that makes using UNIX features most convenient and speed is also great. But it is quite hard to write C programs that are as safe (i.e. in case of errors they give a clean error message and exit instead of giving wrong results) and as free of hard limits (especially artificial data structures size limits are almost unavoidable for a new C programmer) as your usualy "little language" program. Also, the way of thinking doesn't really match the one from shellscripting, so I would recommend C as an addition to shell scripting only if you have the time to maintain knowledge and gain experience in two languages.
Amoung the small languages, Python is probably the one with the strongest program organization features. If you want to write bigger systems and want to make heavy use of code reuse, you might to to look into Python. Programming python also makes a great way to gain some of the discipline the "real" programming languages will require. On the downside, I don't think python offers great scripting features (for example, that means special syntax to use UNIX commandline utilities), speed is as horrible as real scripting languages and it is quite common to face detectable errors at runtime, not compile time.