strace Cheat Sheet

strace is a tool for debugging and troubleshooting programs. It basically captures and records all system calls made by a process and the signals received by the process.

Some basic examples of how I use it are below:

Troubleshooting slow loading website

You can enable timestamps within the strace output. This will show both the timestamps at the beginning of the time and also the execution time at the end of the line. This is useful to be able to quickly identify which element of the site is slow to load.

The following example simply shows the lag introduced by a 5 second sleep statement within index.php:

[[email protected] ~]# strace -fs 10000 -tT -o /tmp/strace.txt sudo -u apache php /var/www/vhosts/www.example.com/index.php
[[email protected] ~]# less /tmp/strace.txt

1745  13:31:44 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000010>
1745  13:31:44 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000011>
1745  13:31:44 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000009>
1745  13:31:44 nanosleep({5, 0}, 0x7ffd5926ea80) = 0 <5.000218>
1745  13:31:49 uname({sysname="Linux", nodename="web01", ...}) = 0 <0.000037>

Okay, so that was just a random coding error. A real world example is below. This shows the 60 second latency I was seeing on each page load as the site was trying to load something from a third party site.

[[email protected] ~]# strace -fs 10000 -tT -o /tmp/strace.txt sudo -u apache php /var/www/vhosts/www.example22.com/index.php
[[email protected] ~]# less strace.txt
...
35999 16:44:29 recvfrom(5, "\347$\201\200\0\1\0\1\0\1\0\0\3www\example22\3com\0\0\34\0\1\300\f\0\5\0\1\0\0!\3\0\2\300\20\300\20\0\6\0\1\0\0$(\0=\3ns1\3net\0\300GxH\262z\0\0\16\20\0\0\34 \0\22u\0\0\1Q\200", 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("123.123.123.123")}, [16]) = 128 <0.000025>
35999 16:44:29 close(5)                 = 0 <0.000029>
35999 16:44:29 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 5 <0.000029>
35999 16:44:29 fcntl(5, F_GETFL)        = 0x2 (flags O_RDWR) <0.000018>
35999 16:44:29 fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000019>
35999 16:44:29 connect(5, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("123.123.123.4")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000054>
35999 16:44:29 poll([{fd=5, events=POLLOUT|POLLWRNORM}], 1, 299993) = 1 ([{fd=5, revents=POLLERR|POLLHUP}]) <63.000274>
35999 16:45:32 getsockopt(5, SOL_SOCKET, SO_ERROR, [110], [4]) = 0 <0.000013>
35999 16:45:32 close(5)                 = 0 <0.000024>

Notice the timestamps that are in bold. You can clearly see the delay while the page was still loading. When looking up one line from there, I can see that the site was trying to call something from 123.123.123.4 and it appeared to be timing out.

Log all calls in and out of Apache

Sometimes you just cannot seem to narrow down the issue. Therefore you have to log everything and try to find that needle in the haystack. The command below will record all Apache web processes and their forks and log them to a file. Do not keep this running for long as the log can quickly fill up your disk!

[[email protected] ~]# pgrep "apache2|httpd" | awk '{print "-fp "$1}' | xargs strace -vvv -ts 2048 2>&1 | grep -vE "gettime|ENOENT" > /tmp/strace.txt
[[email protected] ~]# less /tmp/strace.txt

When going through the /tmp/strace.txt, you are basically looking for gaps in the timestamps that may or may not explain why a single pid hung while serving a request. Some common ways to begin looking for clues:

[[email protected] ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | less

[[email protected] ~]# grep -E 'connect\(|stat\(" /tmp/strace.txt  | cut -b -115 |less

# WordPress specific ones are below:
[[email protected] ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | grep wp-content | grep open | less

[[email protected] ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | grep -iE "open.*wp-content|connect" | less