Classic Shell Scripting - Arnold Robbins [195]
Most compiled programming languages have a debugger that allows single stepping, setting of breakpoints, examination of variables, and so on. On most systems, the shells have no debugger, so you sometimes have to use the shell's -v option to get shell input lines printed, and the -x option to get commands and their arguments printed. System-call tracers can provide a useful supplement to that output, since they give a deeper view into processes that the shell invokes.
Whenever you run an unknown program, you run the risk that it will do things to your system that you do not like. Computer viruses and worms are often spread that way. Commercial software usually comes with installation programs that customers are expected to trust and run, sometimes even with root privileges. If the program is a shell script, you can inspect it, but if it is a black-box binary image, you cannot. Programs like that always give us a queasy feeling, and we usually refuse to run them as root. A system-call trace log of such an installation can be helpful in finding out exactly what the installer program has done. Even if it is too late to recover deleted or changed files, at least you have a record of what files were affected, and if your filesystem backups or snapshots[4] are current, you can recover from a disaster.
Most long-running processes make a substantial number of system calls, so the trace output is likely to be voluminous, and thus, best recorded in a file. If only a few system calls are of interest, you can specify them in a command-line option.
Let's follow process creation on a GNU/Linux system, tracing a short Bourne shell session. This can be a bit confusing because there is output from three sources: the trace, the shell, and the commands that we run. We therefore set the prompt variable, PS1, to distinguish the original and traced shells, and we annotate each line to identify its source. The trace=process argument selects a group of process-related system calls:
$ PS1='traced-sh$ ' strace -e trace=process /bin/sh
Trace process-related system calls
execve("/bin/sh", ["/bin/sh"], [/* 81 vars */]) = 0 This is trace output
Now execute a command that we know is built-in:
traced-sh$ pwd
Run a shell built-in command
/home/jones/book This is command output
Only the expected output appeared, because no new process was created. Now use the separate program for that command:
traced-sh$ /bin/pwd
Run an external command
fork( ) = 32390 This is trace output
wait4(-1, This is trace output
/home/jones/book This is command output
[WIFEXITED(s) && WEXITSTATUS(s) = = 0], WUNTRACED, NULL) = 32390 This is trace output
--- SIGCHLD (Child exited) --- This is trace output
Finally, exit from the shell, and the trace:
traced-sh$ exit
Exit from the shell
exit This is trace output
_exit(0) = ? This is trace output
We are now back in the original shell session:
$ pwd
Back in original shell; check where we are
/home/jones/book Working directory is unchanged
The shell made a fork( ) system call to start the /bin/pwd process, whose output got mixed in with the next trace report for the wait4( ) system call. The command terminated normally, and the shell received a CHLD signal, indicating completion of the child process.
Here's an example of profiling system calls on Sun Solaris; the -c option requests that a summary report be displayed after the command completes, with the normal trace output suppressed:
$ truss -c /usr/local/bin/pathfind -a PATH truss
Trace the pathfind command
/usr/bin/truss This is output from pathfind
/bin/truss
/usr/5bin/truss
syscall seconds calls errors The truss report starts here
_exit .00 1
fork .00 2
read .00 26
write .00 3
open .00 5 1
close .00 10 1
brk .00 42
stat .01 19 15
...
stat64 .03 33 28
open64 .00 1
-------