System Administration – Stephen R Lang

Testing ports without telnet or nc

Ever hop onto a server where the network admin may have been a bit over-caffeinated when they were locking down the firewall? What if they also locked down egress along with ingress? They want you to prove you cannot connect outbound, but you cannot even install ‘telnet’ or ‘nc’ since yum/apt can’t get outbound. While that is proof in and of itself, what if you needed something more for some reason?

Assuming you have root access and ‘telnet’ or ‘nc’ is not installed, you can use the bash networking features (see REDIRECTION man page). The example below shows connections that succeed since they return instantly:

[root@web01 ~]# echo > /dev/tcp/1.1.1.1/80
[root@web01 ~]# echo > /dev/tcp/1.1.1.1/443
[root@web01 ~]# echo > /dev/tcp/google.com/443
[root@web01 ~]#

You can tell the connection failed as the command will hang or return an error about ‘connection refused’.

Another way around this is to use curl if it is available. Below is an example for checking if you can connect to port 25 on the remote server:

[root@web01 ~]# curl -v telnet://1.1.1.1:25
* About to connect() to 1.1.1.1 port 25 (#0)
*   Trying 1.1.1.1...
* Connected to 1.1.1.1 (1.1.1.1) port 25 (#0)

strace Cheat Sheet

strace is a tool for debugging and troubleshooting programs. It basically captures and records all system calls made by a process and the signals received by the process.

Some basic examples of how I use it are below:

Troubleshooting slow loading website

You can enable timestamps within the strace output. This will show both the timestamps at the beginning of the time and also the execution time at the end of the line. This is useful to be able to quickly identify which element of the site is slow to load.

The following example simply shows the lag introduced by a 5 second sleep statement within index.php:

[root@web01 ~]# strace -fs 10000 -tT -o /tmp/strace.txt sudo -u apache php /var/www/vhosts/www.example.com/index.php
[root@web01 ~]# less /tmp/strace.txt

1745  13:31:44 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000010>
1745  13:31:44 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000011>
1745  13:31:44 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000009>
1745  13:31:44 nanosleep({5, 0}, 0x7ffd5926ea80) = 0 <5.000218>
1745  13:31:49 uname({sysname="Linux", nodename="web01", ...}) = 0 <0.000037>

Okay, so that was just a random coding error. A real world example is below. This shows the 60 second latency I was seeing on each page load as the site was trying to load something from a third party site.

[root@web01 ~]# strace -fs 10000 -tT -o /tmp/strace.txt sudo -u apache php /var/www/vhosts/www.example22.com/index.php
[root@web01 ~]# less strace.txt
...
35999 16:44:29 recvfrom(5, "\347$\201\200\0\1\0\1\0\1\0\0\3www\example22\3com\0\0\34\0\1\300\f\0\5\0\1\0\0!\3\0\2\300\20\300\20\0\6\0\1\0\0$(\0=\3ns1\3net\0\300GxH\262z\0\0\16\20\0\0\34 \0\22u\0\0\1Q\200", 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("123.123.123.123")}, [16]) = 128 <0.000025>
35999 16:44:29 close(5)                 = 0 <0.000029>
35999 16:44:29 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 5 <0.000029>
35999 16:44:29 fcntl(5, F_GETFL)        = 0x2 (flags O_RDWR) <0.000018>
35999 16:44:29 fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000019>
35999 16:44:29 connect(5, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("123.123.123.4")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000054>
35999 16:44:29 poll([{fd=5, events=POLLOUT|POLLWRNORM}], 1, 299993) = 1 ([{fd=5, revents=POLLERR|POLLHUP}]) <63.000274>
35999 16:45:32 getsockopt(5, SOL_SOCKET, SO_ERROR, [110], [4]) = 0 <0.000013>
35999 16:45:32 close(5)                 = 0 <0.000024>

Notice the timestamps that are in bold. You can clearly see the delay while the page was still loading. When looking up one line from there, I can see that the site was trying to call something from 123.123.123.4 and it appeared to be timing out.

Here is another example similar to the ones above that filter the strace output to only show ‘sendto’, ‘connect’, ‘open’ and ‘write’ to filter out some of the noise so you can more easily see the file/page being accessed as well as the resulting database lookup:

[root@web01 ~]# strace -tt -T -e trace=sendto,connect,open,write php /var/www/vhosts/www.example.com/index.php
...
12:22:56.362994 open("/var/www/vhosts/example.com/application/colors/red.php", O_RDONLY) = 4 <0.000027>
12:22:56.363933 write(3, "M\0\0\0\3SELECT *\nFROM (`tbl_color_red"..., 81) = 81 <0.000026>
12:22:56.364143 open("/usr/share/zoneinfo/America/New_York", O_RDONLY) = 4 <0.000026>
12:22:56.364974 write(3, "Y\0\0\0\3SELECT *\nFROM (`tbl_colors_orange`)"..., 93) = 93 <0.000021>
12:22:56.365747 write(3, "<\t\0\0\3Select `id`.`color` as "..., 2368) = 2368 <0.000021>
12:27:02.354995 write(3, "G\0\0\0\3SELECT *\nFROM (`tbl_paper_"..., 75) = 75 <0.000023>

In the example above, that may indicate that I need to look at the slow query log or run an explain against the query to identify why it is taking so long to execute.

Log all calls in and out of Apache

Sometimes you just cannot seem to narrow down the issue. Therefore you have to log everything and try to find that needle in the haystack. The command below will record all Apache web processes and their forks and log them to a file. Do not keep this running for long as the log can quickly fill up your disk!

[root@web01 ~]# pgrep "apache2|httpd" | awk '{print "-fp "$1}' | xargs strace -vvv -ts 2048 2>&1 | grep -vE "gettime|ENOENT" > /tmp/strace.txt
[root@web01 ~]# less /tmp/strace.txt

When going through the /tmp/strace.txt, you are basically looking for gaps in the timestamps that may or may not explain why a single pid hung while serving a request. Some common ways to begin looking for clues:

[root@web01 ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | less

[root@web01 ~]# grep -E 'connect\(|stat\(" /tmp/strace.txt  | cut -b -115 |less

# WordPress specific ones are below:
[root@web01 ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | grep wp-content | grep open | less

[root@web01 ~]# grep -Ev "munmap|gettime" /tmp/strace.txt  | cut -b -115 | grep -iE "open.*wp-content|connect" | less

tcpdump Cheat Sheet

Packets coming inbound and outbound from a network interface contain a treasure trove of information that can be useful for troubleshooting purposes. Using the command tcpdump allows you to view the contents of the packets in real time, or it can be saved to a file for inspection later on.

This article will show some of the common tasks I use tcpdump for.

How to view Cisco Discovery Protocol

This is not always available. Cisco Discovery Protocol is a management protocol that Cisco uses to communicate a great deal of information about a network connection. It can tell you what switch and port the server is connected to, if there are connectivity issues due to the wrong duplex being set and can also help identify if the server is on the wrong VLAN. It can also show the management interface and operating system of the switch, amongst other things.

An example of how to run it and grepping for the fields I generally need is below::

[root@web01 ~]# tcpdump -nn -v -i eth0 -s 1500 -c 1 'ether[20:2] == 0x2000' | egrep "Device-ID|Address|Port-ID"
        Device-ID (0x01), length: 24 bytes: 'switch27.nyc4.example.com'
        Address (0x02), length: 13 bytes: IPv4 (1) 10.1.0.11
        Port-ID (0x03), length: 18 bytes: 'GigabitEthernet0/9'
        Management Addresses (0x16), length: 13 bytes: IPv4 (1) 10.1.0.11

Confirm traffic is flowing

Lets assume you have vlan tagging in place on the server, but for some reason that vlan cannot ping the gateway. You can check to see if your network interface is at least configured correcting by checking for ARP traffic by:
1. On another terminal, ping the target gateway.
2. Then in the other terminal, run:

[root@web01 ~]# tcpdump -i eth0 -nn -e vlan
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
18:08:58.039740 63:b3:d2:5c:dd:dc > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 14, p 0, ethertype ARP, Request who-has 192.168.22.1 tell 192.168.22.100, length 28
18:08:59.039934 63:b3:d2:5c:dd:dc > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 14, p 0, ethertype ARP, Request who-has 192.168.22.1 tell 192.168.22.100, length 28
18:09:00.041922 63:b3:d2:5c:dd:dc > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 14, p 0, ethertype ARP, Request who-has 192.168.22.1 tell 192.168.22.100, length 28

This tells me that the server is sending out ARP requests successfully over VLAN 14, but no responses are coming back.

Check the payload of any traffic coming in over port 80

This example will provide you output similar to what you would see on an IDS. It is highly useful to be able to determine exactly what was accessed and what the web server responded with.

[root@web01 ~]# tcpdump -nnvvXS 'tcp port 80'
...
	GET /bogus HTTP/1.1
	Host: www.example22.com
	User-Agent: curl/7.54.0
	Accept: */*
...
	0x0030:  89c5 4347 4745 5420 2f62 6f67 7573 2048  ..CGGET./bogus.H
	0x0040:  5454 502f 312e 310d 0a48 6f73 743a 2077  TTP/1.1..Host:.w
	0x0050:  7777 2e65 7861 6d70 6c65 3232 2e63 6f6d  ww.example22.com
	0x0060:  0d0a 5573 6572 2d41 6765 6e74 3a20 6375  ..User-Agent:.cu
	0x0070:  726c 2f37 2e35 342e 300d 0a41 6363 6570  rl/7.54.0..Accep
	0x0080:  743a 202a 2f2a 0d0a 0d0a                 t:.*/*....
...
	HTTP/1.1 404 Not Found
	Date: Wed, 16 May 2018 02:51:16 GMT
	Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips
	Content-Length: 203
	Content-Type: text/html; charset=iso-8859-1
	
	<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
	<html><head>
	<title>404 Not Found</title>
	</head><body>
	<h1>Not Found</h1>
	<p>The requested URL /bogus was not found on this server.</p>
	</body></html>
	0x0000:  4500 01b3 fe88 4000 4006 2fda a5e3 44b3  E.....@.@./...D.
	0x0010:  182d 081f 0050 b35f c379 bddd 7fc3 db3c  .-...P._.y.....<
	0x0020:  8018 00e3 0c88 0000 0101 080a 89c5 435a  ..............CZ
	0x0030:  2681 c10b 4854 5450 2f31 2e31 2034 3034  &...HTTP/1.1.404
	0x0040:  204e 6f74 2046 6f75 6e64 0d0a 4461 7465  .Not.Found..Date
	0x0050:  3a20 5765 642c 2031 3620 4d61 7920 3230  :.Wed,.16.May.20
	0x0060:  3138 2030 323a 3531 3a31 3620 474d 540d  18.02:51:16.GMT.
	0x0070:  0a53 6572 7665 723a 2041 7061 6368 652f  .Server:.Apache/
	0x0080:  322e 342e 3620 2843 656e 744f 5329 204f  2.4.6.(CentOS).O
	0x0090:  7065 6e53 534c 2f31 2e30 2e32 6b2d 6669  penSSL/1.0.2k-fi

IO Scheduler tuning

What is an I/O scheduler? The I/O scheduler is a kernel level tunable whose purpose is to optimize disk access requests. Traditionally this is critical for spinning disks as I/O requests can be grouped together to avoid “seeking”.

Different I/O schedulers have their pro’s and con’s, so choosing which one to use depends on the type of environment and workload. There is no one right I/O scheduler to use, it all simply ‘depends’. Benchmarking your application before and after the I/O scheduler change is usually your best indicator. The good news is, the I/O scheduler can be changed at run time and can be configured to persist after reboots.

The three common I/O schedulers are:
– noop
– deadline
– cfq

noop

The noop I/O scheduler is optimized for systems that don’t need an I/O scheduler such as VMware, AWS EC2, Google Cloud, Rackspace public cloud, etc. Since the hypervisor already controls the I/O scheduling, it doesn’t make sense for the VM to waste CPU cycles on it. The noop I/O scheduler simply works as a FIFO (First In First Out) queue.

You can update the I/O scheduler to noop by:

## CentOS 6

# Change at runtime
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq] 
[root@db01 ~]# echo 'noop' > /sys/block/sda/queue/scheduler
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
[noop] anticipatory deadline cfq

# Change at boot time by appending 'elevator=noop' to end of kernel line:
[root@db01 ~]# vim /boot/grub/grub.conf
kernel /vmlinuz-2.6.9-67.EL ro root=/dev/vg0/lv0 elevator=noop


## CentOS 7

# Change at run time
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq] 
[root@db01 ~]# echo 'noop' > /sys/block/sda/queue/scheduler
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
[noop] anticipatory deadline cfq

# Change at boot time by appending 'elevator=noop' end of the following line, then rebuild the grub config:
[root@db01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=rhel00/root rd.lvm.lv=rhel00/swap elevator=noop"
...
[root@db01 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg


## Ubuntu 14.04

# Change at runtime
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
[root@db01 ~]# echo noop > /sys/block/sda/queue/scheduler
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

# Change at boot time by appending 'elevator=noop' end of the following line, then rebuild the grub config:
[root@db01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="elevator=noop"
...
[root@db01 ~]# grub-mkconfig -o /boot/grub/grub.cfg


## Ubuntu 16.04

# Change at runtime
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
[root@db01 ~]# echo noop > /sys/block/sda/queue/scheduler
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

# Change at boot time by appending 'elevator=noop' end of the following line, then rebuild the grub config:
[root@db01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="elevator=noop"
...
[root@db01 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

deadline

The deadline I/O scheduler is optimized by default for read heavy workloads like MySQL. It attempts to optimize I/O request by putting it in a read queue or write queue and assigning a timestamp to the request. For requests in the read queue, they have 500ms (by default) to execute before they are given the highest priority to run. Requests entering the write queue have 5000ms to execute before they are given the highest priority to run.

This deadline assigned to each I/O request is what makes deadline I/O scheduler optimal for read heavy workloads like MySQL.

You can update the I/O scheduler to deadline by:

## CentOS 6

# Change at runtime
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq] 
[root@db01 ~]# echo 'deadline' > /sys/block/sda/queue/scheduler
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory [deadline] cfq

# Change at boot time by appending 'elevator=deadline' to end of kernel line apply the changes to grub:
[root@db01 ~]# vim /boot/grub/grub.conf
kernel /vmlinuz-2.6.9-67.EL ro root=/dev/vg0/lv0 elevator=deadline


## CentOS 7

# Change at run time
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq] 
[root@db01 ~]# echo 'deadline' > /sys/block/sda/queue/scheduler
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory [deadline] cfq

# Change at boot time by appending 'elevator=deadline' end of the following line and apply the changes to grub:
[root@db01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=rhel00/root rd.lvm.lv=rhel00/swap elevator=deadline"
...
[root@db01 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg


# Ubuntu 14.04

# Change at runtime
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]
[root@db01 ~]# echo deadline > /sys/block/sda/queue/scheduler
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

# Change at boot time by appending 'elevator=deadline' end of the following line apply the changes to grub:
[root@db01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="elevator=deadline"
...
[root@db01 ~]# grub-mkconfig -o /boot/grub/grub.cfg


# Ubuntu 16.04

# Change at runtime
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]
[root@db01 ~]# echo deadline > /sys/block/sda/queue/scheduler
[root@db01 ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

# Change at boot time by appending 'elevator=deadline' end of the following line apply the changes to grub:
[root@db01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="elevator=deadline"
...
[root@db01 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

cfg

The cfg I/O scheduler is probably best geared towards things running GUIs (like a desktop) where each process needs a fast response. The goal of the cfq I/O scheduler (Complete Fairness Queueing) is to give a fair allocation of disk I/O bandwidth for all the processes which requests an I/O operation.

You can update the I/O scheduler to cfq by:

## CentOS 6

# Change at runtime
[root@server01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory [deadline] cfq 
[root@server01 ~]# echo 'cfq' > /sys/block/sda/queue/scheduler
[root@server01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]

# Change at boot time by appending 'elevator=cfq' to end of kernel line apply the changes to grub:
[root@server01 ~]# vim /boot/grub/grub.conf
kernel /vmlinuz-2.6.9-67.EL ro root=/dev/vg0/lv0 elevator=cfq


## CentOS 7

# Change at run time
[root@server01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory [deadline] cfq 
[root@server01 ~]# echo 'cfg' > /sys/block/sda/queue/scheduler
[root@server01 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]

# Change at boot time by appending 'elevator=cfq' end of the following line and apply the changes to grub:
[root@server01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=rhel00/root rd.lvm.lv=rhel00/swap elevator=cfq"
...
[root@server01 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg


# Ubuntu 14.04

# Change at runtime
[root@server01 ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
[root@server01 ~]# echo cfq > /sys/block/sda/queue/scheduler
[root@server01 ~]# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

# Change at boot time by appending 'elevator=cfq' end of the following line apply the changes to grub:
[root@server01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="elevator=cfq"
...
[root@server01 ~]# grub-mkconfig -o /boot/grub/grub.cfg


# Ubuntu 16.04

# Change at runtime
[root@server01 ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
[root@server01 ~]# echo cfq > /sys/block/sda/queue/scheduler
[root@server01 ~]# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

# Change at boot time by appending 'elevator=cfq' end of the following line apply the changes to grub:
[root@server01 ~]# vim /etc/default/grub
...
GRUB_CMDLINE_LINUX="elevator=cfq"
...
[root@server01 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

As with any performance tuning recommendations, there is never a one size fits all solution! Always benchmark your application to establish a baseline before you make the change. After the performance changes have been made, run the same benchmark and compare the results to ensure that they had the desired outcomes.

Disabling Transparent Huge Pages in Linux

Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages.

However, database workloads often perform poorly with THP, because they tend to have sparse rather than contiguous memory access patterns. The overall recommendation for MySQL, MongoDB, Oracle, etc is to disable THP on Linux machines to ensure best performance.

You can check to see if THP is enabled or not by running:

[root@db01 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
[root@db01 ~]# cat /sys/kernel/mm/transparent_hugepage/defrag
[always] madvise never

If the result shows [never], then THP is disabled. However if the result shows [always], then THP is enabled.

You can disable THP at runtime on CentOS 6/7 and Ubuntu 14.04/16.04 by running:

[root@db01 ~]# echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
[root@db01 ~]# echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag

However once the system reboots, it will go back to its default value again. To make the setting persistent on CentOS 7 and Ubuntu 16.04, you can disable THP on system startup by making a systemd unit file:

# CentOS 7 / Ubuntu 16.04:
[root@db01 ~]# vim /etc/systemd/system/disable-thp.service
[Unit]
Description=Disable Transparent Huge Pages (THP)

[Service]
Type=simple
ExecStart=/bin/sh -c "echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled && echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag"

[Install]
WantedBy=multi-user.target

[root@db01 ~]# systemctl daemon-reload
[root@db01 ~]# systemctl start disable-thp
[root@db01 ~]# systemctl enable disable-thp

On CentOS 6 and Ubuntu 14.04, you can disable THP on system startup by adding the following to /etc/rc.local. If this is on Ubuntu 14.04, make sure its added before the ‘exit 0’:

# CentOS 6 / Ubuntu 14.04
[root@db01 ~]# vim /etc/rc.local
...
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
...

Allowing multiple developers to read/write to website via SFTP or FTP

This article simply exists to serve as a visual reference when I’m explaining permissions to others. If you are looking to apply the concepts in this article on a live site, make sure you create a backup of the permissions and ownerships before proceeding as this article could break a pre-existing site!

This is one of those things where there is more than one way to go about it. The goal here is to allow multiple users the ability to work with the site via FTP/SFTP using basic permissions and groups.

First, create the shared group. In this case, as my domain is going to be example.com, I will call it exampleadmins:

[root@web01 ~]# groupadd exampleadmins

Now add the preexisting users to the group

[root@web01 ~]# usermod -aG exampleadmins user01
[root@web01 ~]# usermod -aG exampleadmins user02

Now change the group ownership recursively on the website directory:

[root@web01 ~]# chgrp -R exampleadmins /var/www/vhosts/example.com

Since we want users in the exampleadmins group to have write access, set the group write permissions on the website directory by:

[root@web01 ~]# chmod -R g+w /var/www/vhosts/example.com

To ensure that any new files or directory inherit the group ownership, use the SetGID bit on the directory recursively:

[root@web01 ~]# find /var/www/vhosts/example.com -type d -exec chmod g+s "{}" \;

To ensure that files or directories the user creates or uploads are group writable by default, you need to adjust the default umask for the FTP and SFTP server. For vsftpd which is generally the default FTP server, change the default umask from 022 to 002 by:

[root@web01 ~]# vim /etc/vsftpd.conf
...
local_umask = 002
...
[root@web01 ~]# service vsftpd restart

When using SFTP, update the sftp subsystem within /etc/ssh/sshd_config to set a umask of 0002 by:

[root@web01 ~]# vim /etc/ssh/sshd_config
...
Subsystem       sftp    /usr/libexec/openssh/sftp-server -u 0002
...
# Append to bottom of file:
Match Group exampleadmins
   ForceCommand internal-sftp -u 0002
[root@web01 ~]# service sshd restart

Now whenever you need to add additional users, simply create the user with a membership to exampleadmins

[root@web01 ~]# useradd -s /sbin/nologin -d /var/www/vhosts/example.com -G exampleadmins user03

And if the user already exists, simply run:

[root@web01 ~]# usermod -aG exampleadmins user03

How to install Elastic Stack

Your logs are trying to talk to you! The problem though is that reading through logs is like trying to pick out one conversation in a crowded and noisy room. Some people talk loud and others speak softly. With all this noise, how can you pick out the critical information? This is where Elastic Stack can help!

Elastic Stack is a group of open source products from Elastic designed to help users take data from any type of source and in any format and search, analyze, and visualize that data in real time. This is commonly referred to as an ELK stack (Elasticsearch, Logstash, and Kibana).

Setting up Elastic Stack can be quite confusing as there are several moving parts. As a very basic primer, logstash is the workhouse that applies various filters to parse the logs better. Logstash will then forward the parsed logs to elasticsearch for indexing. Kibana allows you to visualize the data stored in elasticsearch.

Server Installation

This guide is going to be based on CentOS/RHEL 7. Elasticsearch needs at least 2G of memory. So for the entire stack (Elasticsearch, Logstash and Kibana) to work, the absolute minimum required memory should be around 4G. Anything less than this may cause the services to become unstable or not start up at all.

Elastic Stack relies on Java, so install Java 1.8.0 by:

[root@elk01 ~]# yum install java-1.8.0-openjdk
[root@elk01 ~]# java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

Elastic Stack packages all the needed software within their own repos, so to setup their repo by:

[root@elk01 ~]# rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
[root@elk01 ~]# echo '[elasticstack-6.x]
name=Elastic Stack repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md' > /etc/yum.repos.d/elasticstack.repo

Now install the needed packages for Elastic Stack and set them to start on boot:

[root@elk01 ~]# yum install elasticsearch kibana logstash filebeat
[root@elk01 ~]# systemctl daemon-reload
[root@elk01 ~]# systemctl enable elasticsearch kibana logstash filebeat

Server Configuration

Setup Elasticsearch to listen for connects on the public IP of the server. Mine is also configured to listen on localhost as well since I am monitoring logs locally as well:

[root@elk01 ~]# vim /etc/elasticsearch/elasticsearch.yml
...
network.host: 123.123.123.123, localhost
...

Setup Elasticsearch to be able to use geoip and user-agent by:

[root@elk01 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-geoip
[root@elk01 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-user-agent

Configure logstash with a basic configuration to accept logs from filebeats and forward them to elasticsearch by:

[root@elk01 ~]# echo 'input {
  beats {
    port => 5044
  }
}

# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }

filter {
  if [type] == "apache-access" {
    # This will parse the apache access event
    grok {
      match => [ "message", "%{COMBINEDAPACHELOG}" ]
    }
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" 
    document_type => "%{[@metadata][type]}" 
  }
}' > /etc/logstash/conf.d/logstash.conf

Start and test services by:

[root@elk01 ~]# systemctl start kibana elasticsearch logstash filebeat

Elasticsearch will take about 15 seconds or more to start. To ensure elasticsearch is running, check that the output is similar to the following:

[root@elk01 ~]# curl -XGET 'localhost:9200/?pretty'
{
  "name" : "Cp8oag6",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "AT69_T_DTp-1qgIJlatQqA",
  "version" : {
    "number" : "6.0.1",
    "build_hash" : "f27399d",
    "build_date" : "2016-03-30T09:51:41.449Z",
    "build_snapshot" : false,
    "lucene_version" : "7.0.1",
    "minimum_wire_compatibility_version" : "1.2.3",
    "minimum_index_compatibility_version" : "1.2.3"
  },
  "tagline" : "You Know, for Search"
}

Then log into Kibana by navigating your browser to:

http://localhost:5601

If this is installed on a remote server, then you can easily install Nginx to act as a front end for Kibana by:

# Install Nginx
[root@elk01 ~]# yum install nginx httpd-tools

# Setup username/password
[root@elk01 ~]# htpasswd -c /etc/nginx/htpasswd.users kibanaadmin

# Create Nginx vhost
[root@elk01 ~]# echo 'server {
    listen 80;

    server_name kibana.yourdomain.com;

    auth_basic "Restricted Access";
    auth_basic_user_file /etc/nginx/htpasswd.users;

    location / {
        proxy_pass http://localhost:5601;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;        
    }
}' > /etc/nginx/conf.d/kibana.conf

# Set services to start on boot and start nginx
[root@elk01 ~]# systemctl daemon-reload
[root@elk01 ~]# systemctl enable nginx
[root@elk01 ~]# systemctl start nginx

# Open up the firewall to allow inbound port 80 traffic from anywhere
[root@elk01 ~]# firewall-cmd --zone=public --add-port=80/tcp --permanent
[root@elk01 ~]# firewall-cmd --reload

# Allow nginx to connect to Kibana port 5601 if you’re using SELinux:
[root@elk01 ~]# semanage port -a -t http_port_t -p tcp 5601

# Navigate your browser to your new domain you setup, assuming you already setup DNS for it:
http://kibana.yourdomain.com

Client installation – Filebeat

The question now becomes, how can I get the log messages from other servers into our Elastic Stack server? As my needs are more basic since I am not doing any manipulation of log data, I can make use of filebeat and its associated plugins to get the Apache, Nginx, MySQL, Syslog, etc data I need over to the ElasticSearch server.

Assuming filebeat is not installed, ensure that you have the repos setup for it:

[root@web01 ~]# rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
[root@web01 ~]# echo '[elasticstack-6.x]
name=Elastic Stack repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md' > /etc/yum.repos.d/elasticstack.repo

Then install filebeat by:

[root@web01 ~]# yum install filebeat
[root@web01 ~]# systemctl daemon-reload
[root@web01 ~]# systemctl enable filebeat

Setup filebeats to send the logs over to your Elastic Stack server:

[root@web01 ~]# vim /etc/filebeat/filebeat.yml
...
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["123.123.123.123:9200"]
...

Now setup the plugins for filebeat. Only setup the ones you need. Be sure to restart filebeat after you have your desired modules enabled.

To send over Apache logs

[root@web01 ~]# filebeat modules enable apache2
[root@web01 ~]# filebeat setup -e
[root@web01 ~]# systemctl restart filebeat

Note, you may need to modify the filebeat apache2 module to pickup your logs. In my case, I had to set the ‘var.paths’ for both the access and error logs by:

[root@web01 ~]# vim /etc/filebeat/modules.d/apache2.yml
- module: apache2
  # Access logs
  access:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths: ["/var/log/httpd/*access.log*"]

  # Error logs
  error:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths: ["/var/log/httpd/*error.log*"]

[root@web01 ~]# systemctl restart filebeat

To send over syslog data:

[root@web01 ~]# filebeat modules enable system
[root@web01 ~]# filebeat setup -e
[root@web01 ~]# systemctl restart filebeat

To handle MySQL data:

[root@web01 ~]# filebeat modules enable mysql
[root@web01 ~]# filebeat setup -e
[root@web01 ~]# systemctl restart filebeat

To send over auditd logs

[root@web01 ~]# filebeat modules enable auditd
[root@web01 ~]# filebeat setup -e
[root@web01 ~]# systemctl restart filebeat

To send over Nginx logs

[root@web01 ~]# filebeat modules enable nginx
[root@web01 ~]# filebeat setup -e
[root@web01 ~]# systemctl restart filebeat

Enable Docker log shipping to elasticsearch. There is no plugin for this, but its easy enough to configure:
Reference: https://www.elastic.co/blog/enrich-docker-logs-with-filebeat

[root@web01 ~]# vim /etc/filebeat/filebeat.yml
filebeat.prospectors:
...
- type: log
  paths:
   - '/var/lib/docker/containers/*/*.log'
  json.message_key: log
  json.keys_under_root: true
  processors:
  - add_docker_metadata: ~
...

[root@web01 ~]# systemctl restart filebeat

Then browse to the Kibana dashboard to view the available dashboards for Filebeat, or create your own!

Client installation – Metricbeat

What about shipping metrics and statistics over to the Elastic Stack server? This is where Metricbeat comes into play. Metricbeat is a lightweight shipper that you can install on your client nodes that will collect metrics and ships them to Elasticsearch. There are modules for Apache, HAProxy, MySQL, Nginx, PostgreSQL, Redis, System and more. This can be installed on your client servers or on the ELK server itself if you like.

Assuming Metricbeat is not installed, ensure that you have the repos setup for it:

[root@web01 ~]# rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
[root@web01 ~]# echo '[elasticstack-6.x]
name=Elastic Stack repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md' > /etc/yum.repos.d/elasticstack.repo

Then install Metricbeat by:

[root@web01 ~]# yum install metricbeat
[root@web01 ~]# systemctl daemon-reload
[root@web01 ~]# systemctl enable metricbeat

Setup Metricbeat to send the logs over to your Elastic Stack server:

[root@web01 ~]# vim /etc/metricbeat/metricbeat.yml
...
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["123.123.123.123:9200"]
...

Now setup the plugins for Metricbeat. Only setup the ones you need. Be sure to restart Metricbeat after you have your desired modules enabled.

To see the full listing of available modules and what is currently enabled:

[root@web01 ~]# metricbeat modules list

To send over Apache, MySQL, Nginx and System metrics:

[root@web01 ~]# metricbeat modules enable apache mysql nginx system
[root@web01 ~]# filebeat setup -e

After enabling each one, be sure to check out the modules associated config file as you may need to make changes to it so it will work with your environment. The modules config files can be found in:

[root@web01 ~]# cd /etc/metricbeat/modules.d

Once the configurations are updated accordingly, restart Filebeat by:

[root@web01 ~]# systemctl restart filebeat

Then browse to the Kibana dashboard to view the available dashboards for Metricbeat, or create your own!

Quick Kibana Primer

Now that you have data coming into Elasticsearch, you can use Kibana to generate some quick searches and visualizations. This is not meant to be a full fledge tutorial on how to use Kibana, just a way to jump start the learning process as Kibana can be somewhat complicated if you have never seen it.

To search Kibana looking for failed logins, type the following in the discover search box:

system.auth.ssh.event:Failed OR system.auth.ssh.event:Invalid

To see what packages have been recently installed, type the following in the discover search box:

source: "/var/log/messages" AND system.syslog.message: *Install*

What about visualizations? To see the top 5 countries accessing Apache:

- Click 'Visualizations' over on the left
- Select 'Vertical bar chart'
- Select 'filebeat-*' from the existing index
Click 'X-Axis'
Aggregation:  Terms
Field:  apache2.access.geoip.country_iso_code
Order by:  Metric: Count
Order Descending:  5

To break it down further by city:
- Click 'Add sub-buckets'
- Select 'Split series'
Sub Aggregation:  Terms
Field:  apache2.access.geoip.city_name
Order by:  metric: Count
Order Decending:  5
Click run

View the top 5 remote IP’s hitting Apache:

- Click 'Visualizations' over on the left
- Select 'Vertical bar chart'
- Select 'filebeat-*' from the existing index
- Click 'X-Axis'
Aggregation:  Terms
Field:  apache2.access.remote_ip
Size:  5

Click 'Add sub-buckets'
- Select 'Split series'
Sub Aggregation:  Terms
Field:  apache2.access.remote_ip
Order by:  metric: Count
Order: Descending
Size:  5

View the top 10 requested URL’s in Apache:

- Click 'Visualizations' over on the left
- Select 'Data Table'
- Select 'filebeat-*' from the existing index
- Under Buckets, click 'Split Rows'
Aggregation:  Terms
Field:  apache2.access.url
Order By:  metric: Count
Order:  Descending
Size:  10
Custom Label:  URL

Then click 'Split Rows'
Sub Aggregation:  Terms
Field:  apache2.access.body_sent_bytes
Order By:  metric: Count
Descending:  10
Custom Label:  Size
Click run

Create line chart for apache response codes:

- Click 'Visualizations' over on the left
- Select 'Line chart'
- Select 'filebeat-*' from the existing index
- Click X-Axis
Aggregation:  Date Histogram
Field:  @timestamp
Interval:  Minute

Click 'Split Series'
Sub Aggregation:  Terms
Field:  apache2.access.response_code
Oder by:  metric: Count
Order:  Descending
Size: 5
Click run

See which logs are receiving a lot of activity:

- Click 'Visualizations' over on the left
- Select 'Pie Chart'
- Select 'filebeat-*' from the existing index
- Click 'Split Slices'
Aggregation:  Terms
Field:  source
Order by:  metric: Count
Order: Descending
Size: 5

Show total traffic by domains:

- Click 'Visualizations' over on the left
- Select 'Line Chart'
Aggregation:  Date Histogram
Field:  @timestamp
Interval:  Auto

- Click 'Split Series'
Sub Aggregation:  Filters
Filter 1:  apache2.access.method:* AND source:"/var/log/httpd/domain01.com-access.log"
Filter 2:  apache2.access.method:* AND source:"/var/log/httpd/domain02.com-access.log"

Show GET request counts:

- Click 'Visualizations' over on the left
- Select 'Metrics'
- Select 'filebeat-*' from the existing index
- Click 'Metrics'
Aggregation:  Count

- Click Buckets
- Click 'Split Group'
Aggregation:  Filters
Filter 1 - GET:  apache2.access.method:"GET" AND source:"/var/log/httpd/domain01.com-access.log"

Show POST request counts:

- Click 'Visualizations' over on the left
- Select 'Metrics'
- Select 'filebeat-*' from the existing index
- Click 'Metrics'
Aggregation:  Count

Click Buckets
- Select 'Split Group'
Aggregation:  Filters
Filter 1 - GET:  apache2.access.method:"POST" AND source:"/var/log/httpd/domain01.com-access.log"

Show GET vs POST requests by domain:

- Click 'Visualizations' over on the left
- Select 'Line chart'
- Select 'filebeat-*' from the existing index
- Click X-Axis
Aggregation:  Date Histogram
Field:  @timestamp
Interval:  Auto

Click 'Split Series'
Sub Aggregation:  Filters
Filter 1:  apache2.access.method:"GET" AND source:"/var/log/httpd/domain01.com-access.log"
Filter 2:  apache2.access.method:"POST" AND source:"/var/log/httpd/domain01.com-access.log"

Show total requests on domain:

- Click 'Visualizations' over on the left
- Select 'Line chart'
- Select 'filebeat-*' from the existing index
- Click Y-Axis
Aggregation:  Count

Click 'Add sub-buckets'
- Aggregation:  Data Histogram
- Field:  @timestamp
- Interval:  Auto

Click 'Split Series'
- Sub Aggregation:  Filters
- Filter:  apache2.access.method:* AND source:"/var/log/httpd/domain01.com-access.log"

Display the current Apache error logs:

- Click 'Visualizations' over on the left
- Select 'Data Table'
- Select 'Apache error log [Filebeat Apache2] from Saved Search

View top 10 WordPress posts:

- Click 'Visualizations' over on the left
- Select 'Data Table'
- Select 'filebeat-*' from the existing index
In the search bar above, type:  apache2.access.url: like \/20*

- Under Buckets, click 'Split Rows'
Aggregation:  Terms
Field:  apache2.access.url
Order By:  metric: Count
Order:  Descending
Size:  10
Custom Label:  Posts

Purging all data from Elasticsearch indexes

Whatever the reason for wanting to completely purge the Elasticsearch indexes, its really simple to do as shown below. Just keep in mind you will lose all the data collected in the indexes! This example will clear out the filebeat indexes:

[root@elk01 ~]# curl -XDELETE 'http://localhost:9200/filebeat-*'

Backing up permissions on directory

Before doing anything in Linux, it is also smart to have a rollback plan. Making blanket, recursive permission changes on a directory would certainly fall into this category!

Lets say you found a directory on your system where the file permissions were all 777, so you want to secure them a bit by changing the permissions over to 644. Something like:

[root@web01 ~]# find /var/www/vhosts/domain.com -type f -perm 0777 -print -exec chmod 644 {} \;

The paranoid among us will want to ensure we can revert things back to the way they were before. Thankfully there are two commands that can be used to either backup or restore permissions on a directory recursively: getfacl and setfacl

To backup all the permissions and ownerships within a given directory such as /var/www/vhosts/domain.com, do the following:

[root@web01 ~]# cd /var/www/vhosts/domain.com
[root@web01 ~]# getfacl -R . > permissions_backup

Now lets say you ran the find command, changed everything over to 644, then realized you broke your application cause it needed some files to be 664 or something, so you just want to roll back so you can investigate what happened.

You can roll back the permissions by running:

[root@web01 ~]# cd /var/www/vhosts/domain.com
[root@web01 ~]# setfacl --restore=permissions_backup

Backup entire servers permissions

If you wanted to backup the entire server’s permissions, you can do that by:

[root@web01 ~]# getfacl -R --absolute-names / > server_permissions_backup

And the restoration process remains the same:

[root@web01 ~]# setfacl --restore=server_permissions_backup

Find command examples

This is just a quick reference page for using find to do basic things.

Find is a pretty powerful tool that accepts a bunch of options for narrowing down your search. Some basic examples of stuff you can do are below.

Find a specific file type and extension older than 300 days and remove them

This will find files:
– Older than 300 days
– Is a file
– Match *.jpg
– Will not go into sub directories

This also works for those pesky directories that have millions of files.

First, always confirm the command will work before blindly removing files:

[root@web01 ~]# cd /path/to/directory
[root@web01 ~]# find . -maxdepth 1 -type f -name '*.jpg' -mtime +300 | xargs ls -al

Once you verified that the files displayed are the ones you want removed, remove them by running:

[root@web01 ~]# cd /path/to/directory
[root@web01 ~]# find . -maxdepth 1 -type f -name '*.jpg' -mtime +300 | xargs rm -f

Find files with 777 permissions

This will find all files that have 777 permissions:

[root@web01 ~]# cd /path/to/directory
[root@web01 ~]# find . -type f -perm 0777 -print

This will find all files that do NOT have 777 permissions

[root@web01 ~]# cd /path/to/directory
[root@web01 ~]# find / -type f ! -perm 777

Find Files with 777 Permissions and change to 644

Use caution with this, this is generally not smart to run blindly as it will go into subdirectories unless you set maxdepth.

[root@web01 ~]# cd /path/to/directory
[root@web01 ~]# find . -type f -perm 0777 -print -exec chmod 644 {} \;

Find Directories with 777 Permissions and change to 755

Use caution with this, this is generally not smart to run blindly as it will go into subdirectories unless you set maxdepth.

[root@web01 ~]# cd /path/to/directory
[root@web01 ~]# find . -type d -perm 777 -print -exec chmod 755 {} \;

Find empty directories

[root@web01 ~]# cd /path/to/directory
[root@web01 ~]# find /tmp -type d -empty

Find all hidden files within a directory

[root@web01 ~]# find /path/to/directory -type f -name ".*"

Find files owned by user or group

[root@web01 ~]# cd /path/to/directory
[root@web01 ~]# find /var/www -user apache
[root@web01 ~]# find /var/www -group apache

Find files that were modified in the last 30 days

[root@web01 ~]# find / -mtime 30

Find files that were modified in the last hour

[root@web01 ~]# find / -mmin -60

Find files that were changed within the last hour
Note, this one is specified in minutes only!

[root@web01 ~]# find / -cmin -60

Find files that were accessed in the last 5 days

[root@web01 ~]# find / -atime 5

Find files that were accessed within the last hour
Note, this one is specified in minutes only!

[root@web01 ~]# find / -amin -60

Count files per directory with find
This one is useful when you need to find the top 10 directories that contain the most amount of files.

[root@web01 ~]# vim count-files-per-directory.sh
#!/bin/bash

if [ $# -ne 1 ];then
  echo "Usage: `basename $0` DIRECTORY"
  exit 1
fi

echo "Please wait..."

find "$@" -type d -print0 2>/dev/null | while IFS= read -r -d '' file; do 
    echo -e `ls -A "$file" 2>/dev/null | wc -l` "files in:\t $file"
done | sort -nr | head | awk '{print NR".", "\t", $0}'

exit 0

Now run it against the / directory:

[root@web01 ~]# bash count-files-per-directory.sh /
Please wait...
1. 	 768 files in:	 /usr/share/man/man1
2. 	 631 files in:	 /usr/lib64/python2.6
3. 	 575 files in:	 /usr/share/locale
4. 	 566 files in:	 /usr/share/vim/vim74/syntax
5. 	 496 files in:	 /usr/bin
6. 	 487 files in:	 /usr/share/man/man8
7. 	 393 files in:	 /usr/share/perl5/unicore/lib/gc_sc
8. 	 380 files in:	 /usr/include/linux
9. 	 354 files in:	 /usr/lib64/python2.6/encodings
10. 	 334 files in:	 /usr/share/man/man3

Or if you only need to run the search in a specific directory:

[root@web01 ~]# bash count-files-per-directory.sh /usr/share/man
Please wait...
1. 	 768 files in:	 /usr/share/man/man1
2. 	 487 files in:	 /usr/share/man/man8
3. 	 334 files in:	 /usr/share/man/man3
4. 	 124 files in:	 /usr/share/man/man5
5. 	 49 files in:	 /usr/share/man
6. 	 35 files in:	 /usr/share/man/ru/man8
7. 	 31 files in:	 /usr/share/man/man7
8. 	 27 files in:	 /usr/share/man/fr/man8
9. 	 25 files in:	 /usr/share/man/de/man8
10. 	 22 files in:	 /usr/share/man/ja/man8

Rolling back yum transactions

Ever had the system update a package, which winds up breaking the most random things? How can you roll back? How can you prevent that same buggy package from updating itself again the next time the system checks for updates, yet still get newer versions of that package when its released?

I ran across something like this recently. The symptom was that PHPMyAdmin was no longer working on this LAMP server. In short, it was found that an Apache update was to blame, which was found in this bug report: https://bz.apache.org/bugzilla/show_bug.cgi?id=61202

So how can the update to Apache be rolled back? First, try to confirm that Apache was indeed updated recently:

[root@web01 ~]# tail /var/log/yum.log
Jul 08 04:23:49 Updated: httpd24u-filesystem-2.4.26-1.ius.centos6.noarch
Jul 08 04:23:49 Updated: httpd24u-tools-2.4.26-1.ius.centos6.x86_64
Jul 08 04:23:50 Updated: httpd24u-2.4.26-1.ius.centos6.x86_64
Jul 08 04:23:50 Updated: 1:httpd24u-mod_ssl-2.4.26-1.ius.centos6.x86_64

Now find the transaction ID within yum by running:

[root@web01 ~]# yum history
ID     | Login user               | Date and time    | Action(s)      | Altered
-------------------------------------------------------------------------------
   220 | root               | 2017-07-08 04:23 | Update         |    4

View the details of this transaction by running:

[root@web01 ~]# yum history info 220
...
Transaction performed with:
    Installed     rpm-4.8.0-55.el6.x86_64                       @centos6-x86_64
    Installed     yum-3.2.29-81.el6.centos.noarch               @centos6-x86_64
    Installed     yum-metadata-parser-1.1.2-16.el6.x86_64       @anaconda-CentOS-201410241409.x86_64/6.6
    Installed     yum-plugin-fastestmirror-1.1.30-40.el6.noarch @centos6-x86_64
    Installed     yum-rhn-plugin-2.4.6-1.el6.noarch             @spacewalk
Packages Altered:
    Updated httpd24u-2.4.25-4.ius.centos6.x86_64            @rackspace-centos6-x86_64-ius
    Update           2.4.26-1.ius.centos6.x86_64            @rackspace-centos6-x86_64-ius
    Updated httpd24u-filesystem-2.4.25-4.ius.centos6.noarch @rackspace-centos6-x86_64-ius
    Update                      2.4.26-1.ius.centos6.noarch @rackspace-centos6-x86_64-ius
    Updated httpd24u-mod_ssl-1:2.4.25-4.ius.centos6.x86_64  @rackspace-centos6-x86_64-ius
    Update                   1:2.4.26-1.ius.centos6.x86_64  @rackspace-centos6-x86_64-ius
    Updated httpd24u-tools-2.4.25-4.ius.centos6.x86_64      @rackspace-centos6-x86_64-ius
    Update                 2.4.26-1.ius.centos6.x86_64      @rackspace-centos6-x86_64-ius
history info
...

To roll back the updates, getting us back to Apache 2.4.25 in this case, simple undo the transaction by running:

[root@web01 ~]# yum history undo 220

Then confirm Apache is back to the previous version 2.4.25:

[root@web01 ~]# rpm -qa |grep -i httpd24u
httpd24u-filesystem-2.4.25-4.ius.centos6.noarch
httpd24u-2.4.25-4.ius.centos6.x86_64
httpd24u-mod_ssl-2.4.25-4.ius.centos6.x86_64
httpd24u-tools-2.4.25-4.ius.centos6.x86_64

Next, restart Apache so the changes take place:

[root@web01 ~]# service httpd restart

Finally, exclude the buggy packages from ever being installed again. In this example, Apache 2.4.26 will never be installed, however any newer versions released after that will install/update normally.

[root@web01 ~]# yum install yum-plugin-versionlock
[root@web01 ~]# yum versionlock add! httpd24u-mod_ssl-2.4.26-1.ius.centos6.x86_64 httpd24u-2.4.26-1.ius.centos6.x86_64 httpd24u-tools-2.4.26-1.ius.centos6.x86_64 httpd24u-filesystem-2.4.26-1.ius.centos6.noarch