strace Cheat Sheet

strace is a tool for debugging and troubleshooting programs. It basically captures and records all system calls made by a process and the signals received by the process.

Some basic examples of how I use it are below:

Troubleshooting slow loading website

You can enable timestamps within the strace output. This will show both the timestamps at the beginning of the time and also the execution time at the end of the line. This is useful to be able to quickly identify which element of the site is slow to load.

The following example simply shows the lag introduced by a 5 second sleep statement within index.php:

[root@web01 ~]# strace -fs 10000 -tT -o /tmp/strace.txt sudo -u apache php /var/www/vhosts/www.example.com/index.php
[root@web01 ~]# less /tmp/strace.txt

1745  13:31:44 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000010>
1745  13:31:44 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000011>
1745  13:31:44 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000009>
1745  13:31:44 nanosleep({5, 0}, 0x7ffd5926ea80) = 0 <5.000218>
1745  13:31:49 uname({sysname="Linux", nodename="web01", ...}) = 0 <0.000037>

Okay, so that was just a random coding error. A real world example is below. This shows the 60 second latency I was seeing on each page load as the site was trying to load something from a third party site.

[root@web01 ~]# strace -fs 10000 -tT -o /tmp/strace.txt sudo -u apache php /var/www/vhosts/www.example22.com/index.php
[root@web01 ~]# less strace.txt
...
35999 16:44:29 recvfrom(5, "\347$\201\200\0\1\0\1\0\1\0\0\3www\example22\3com\0\0\34\0\1\300\f\0\5\0\1\0\0!\3\0\2\300\20\300\20\0\6\0\1\0\0$(\0=\3ns1\3net\0\300GxH\262z\0\0\16\20\0\0\34 \0\22u\0\0\1Q\200", 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("123.123.123.123")}, [16]) = 128 <0.000025>
35999 16:44:29 close(5)                 = 0 <0.000029>
35999 16:44:29 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 5 <0.000029>
35999 16:44:29 fcntl(5, F_GETFL)        = 0x2 (flags O_RDWR) <0.000018>
35999 16:44:29 fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000019>
35999 16:44:29 connect(5, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("123.123.123.4")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000054>
35999 16:44:29 poll([{fd=5, events=POLLOUT|POLLWRNORM}], 1, 299993) = 1 ([{fd=5, revents=POLLERR|POLLHUP}]) <63.000274>
35999 16:45:32 getsockopt(5, SOL_SOCKET, SO_ERROR, [110], [4]) = 0 <0.000013>
35999 16:45:32 close(5)                 = 0 <0.000024>

Notice the timestamps that are in bold. You can clearly see the delay while the page was still loading. When looking up one line from there, I can see that the site was trying to call something from 123.123.123.4 and it appeared to be timing out.

Here is another example similar to the ones above that filter the strace output to only show ‘sendto’, ‘connect’, ‘open’ and ‘write’ to filter out some of the noise so you can more easily see the file/page being accessed as well as the resulting database lookup:

[root@web01 ~]# strace -tt -T -e trace=sendto,connect,open,write php /var/www/vhosts/www.example.com/index.php
...
12:22:56.362994 open("/var/www/vhosts/example.com/application/colors/red.php", O_RDONLY) = 4 <0.000027>
12:22:56.363933 write(3, "M\0\0\0\3SELECT *\nFROM (`tbl_color_red"..., 81) = 81 <0.000026>
12:22:56.364143 open("/usr/share/zoneinfo/America/New_York", O_RDONLY) = 4 <0.000026>
12:22:56.364974 write(3, "Y\0\0\0\3SELECT *\nFROM (`tbl_colors_orange`)"..., 93) = 93 <0.000021>
12:22:56.365747 write(3, "<\t\0\0\3Select `id`.`color` as "..., 2368) = 2368 <0.000021>
12:27:02.354995 write(3, "G\0\0\0\3SELECT *\nFROM (`tbl_paper_"..., 75) = 75 <0.000023>

In the example above, that may indicate that I need to look at the slow query log or run an explain against the query to identify why it is taking so long to execute.

Log all calls in and out of Apache

Sometimes you just cannot seem to narrow down the issue. Therefore you have to log everything and try to find that needle in the haystack. The command below will record all Apache web processes and their forks and log them to a file. Do not keep this running for long as the log can quickly fill up your disk!

[root@web01 ~]# pgrep "apache2|httpd" | awk '{print "-fp "$1}' | xargs strace -vvv -ts 2048 2>&1 | grep -vE "gettime|ENOENT" > /tmp/strace.txt
[root@web01 ~]# less /tmp/strace.txt

When going through the /tmp/strace.txt, you are basically looking for gaps in the timestamps that may or may not explain why a single pid hung while serving a request. Some common ways to begin looking for clues:

[root@web01 ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | less

[root@web01 ~]# grep -E 'connect\(|stat\(" /tmp/strace.txt  | cut -b -115 |less

# WordPress specific ones are below:
[root@web01 ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | grep wp-content | grep open | less

[root@web01 ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | grep -iE "open.*wp-content|connect" | less