Server monitoring script

Without an agent based monitoring system, monitoring your servers internals for items such as CPU, memory, storage, processes, etc becomes very difficult without manually checking. There are many reputable monitoring services on the web such as Pingdom (www.pingdom.com), and most hosting providers provide a monitoring system, but they do not provide an agent. Therefore, you can only do basic external checks such as ping, port, and http content checks. There is no way to report if your MySQL replication has failed, some critical process has stopped running, or if your about to max out your / partition.

This simple bash script located on github, is meant to compliment these types of monitoring services. Just drop the script into a web accessible directory, configure a few options and thresholds, setup a URL content check that looks at the status page searching for the string ‘OK’, and then you can rest easy at night that your monitoring service will alert you if any of the scripts conditions are triggered.

Security note: To avoid revealing information about your system, it is strongly recommended that you place this and all web based monitoring scripts behind a htaccess file that has authentication, whitelisting your monitoring servers IP addresses if they are known.

Features

– Memory Check
– Swap Check
– Load Check
– Storage Check
– Process Check
– Replication Check

Configuration

The currently configurable options and thresholds are listed below:

 # Status page
status_page=/var/www/system-health-check.html

# Enable / Disable Checks
memory_check=off
swap_check=on
load_check=on
storage_check=on
process_check=on
replication_check=off

# Configure partitions for storage check
partitions=( / )

# Configure process(es) to check
process_names=( httpd mysqld postfix )

# Configure Thresholds
memory_threshold=99
swap_threshold=80
load_threshold=10
storage_threshold=80

Implementation

Download script to desired directory and set it to be executable:

 cd /root
git clone https://github.com/stephenlang/system-health-check
chmod 755 system-health-check/system-health-check.sh

After configuring the tunables in the script (see above), create a cron job to execute the script every 5 minutes:

 crontab -e
*/5 * * * * /root/system-health-check/system-health-check.sh

Now configure a URL content check with your monitoring providers tools check the status page searching for the string “OK”. Below are two examples:

 http://1.1.1.1/system-health-check.html
http://www.example.com/system-health-check.html

Testing

It is critical that you test this monitoring script before you rely on it. Bugs always exist somewhere, so test this before you implement it on your production systems! Here are some basic ways to test:

1. Configure all the thresholds really low so they will create an alarm. Manually run the script or wait for the cronjob to fire it off, then check the status page to see if it reports your checks are now in alarm.

2. To test out the process monitoring (assuming the system is not in production), configure the processes you want the script to check, then stop the process you are testing, and check the status page after the script runs to see if it reports your process is not running.

3. To test out the replication monitoring (assuming the system is not in production), log onto your MySQL slave server and run ‘stop slave;’. Then check the status page after the script runs to see if it reports an error on replication.

Resetting a MySQL root password

Resetting a forgotten MySQL root password is a pretty straight forward task to complete assuming you have sudo or root access to the server. It is important to note that by performing this procedure, MySQL will be down till you complete everything. So be sure to do this during a time when it will not impact your business.

First, stop the MySQL service

service mysqld stop # RHEL clones
service mysql stop # Ubuntu / Debian distro

Now its time to bring MySQL back up in safe mode. This means we will start MySQL, but we are simply skipping the user privileges table:

sudo mysqld_safe --skip-grant-tables &

Time to log into MySQL and switch to the MySQL database:

mysql -uroot
use mysql;

Now reset the root password and flush privileges:

update user set password=PASSWORD("enternewpasswordhere") where User='root';
flush privileges;
quit

Once you have completed that, its time to take MySQL out of safe mode and start it back up normally. NOTE: Be sure MySQL is fully stopped before trying to start the service again. You may need to kill -9 the process if its being stubborn.

Stop MySQL:

service mysqld stop # RHEL clones
service mysql stop # Ubuntu / Debian distro

Verify that the MySQL process is no longer running. If they are, you may have to kill -9 the process:

ps -waux |grep mysql

Start MySQL back up:

service mysqld start # RHEL clones
service mysql start # Ubuntu / Debian distro

Finally, test out logging in to ensure its now working properly:

mysql -uroot -p

Keeping multiple web servers in sync with rsync

People looking to create a load balanced web server solution often ask, how can they keep their web servers in sync with each other? There are many ways to go about this: NFS, lsync, rsync, etc. This guide will discuss a technique using rsync that runs from a cron job every 10 minutes.

There will be two different options presented, pulling the updates from the master web server, and pushing the updates from the master web server down to the slave web servers.

Our example will consist of the following servers:

web01.example.com (192.168.1.1) # Master Web Server
web02.example.com (192.168.1.2) # Slave Web Server
web03.example.com (192.168.1.3) # Slave Web Server

Our master web server is going to the single point of truth for the web content of our domain. Therefore, the web developers will only be modifying content from the master web server, and will let rsync handle keeping all the slave nodes in sync with each other.

There are a few prerequisites that must be in place:
1. Confirm that rsync is installed.
2. If pulling updates from the master web server, all slave servers must be able to SSH to the master server using a SSH key with no pass phrase.
3. If pushing updates from the master down to the slave servers, the master server must be able to SSH to the slave web servers using a SSH key with no passphrase.

To be proactive about monitoring the status of the rsync job, both scripts posted below allow you to perform a http content check against a status file to see if the string “SUCCESS” exists. If something other then SUCCESS is found, that means that the rsync script may have failed and should be investigated. An example of this URL to monitor would be is: 192.168.1.1/datasync.status

Please note that the assumption is being made that your web server will serve files that are placed in /var/www/html/. If not, please update the $status variable accordingly.

Using rsync to pull changes from the master web server:

This is especially useful if you are in a cloud environment and scale your environment by snapshotting an existing slave web server to provision a new one. When the new slave web server comes online, and assuming it already has the SSH key in place, it will automatically grab the latest content from the master server with no interaction needed by yourself except to test, then enable in your load balancer.

The disadvantage with using the pull method for your rsync updates comes into play when you have multiple slave web servers all running the rsync job at the same time. This can put a strain on the master web servers CPU, which can cause performance degradation. However if you have under 10 servers, or if your site does not have a lot of content, then the pull method should work fine.

Below will show the procedure for setting this up:

1. Create SSH keys on each slave web server:

ssh-keygen -t dsa

2. Now copy the public key generated on the slave web server (/root/.ssh/id_dsa.pub) and append it to the master web servers, /root/.ssh/authorized_keys2 file.

3. Test ssh’ing in as root from the slave web server to the master web server
# On web02

ssh [email protected]

4. Assuming you were able to log in to the master web server cleanly, then its time to create the rsync script on each slave web server. Please note that I am assuming your sites documentroot’s are stored in /var/www/vhosts. If not, please change the script accordingly and test!

mkdir -p /opt/scripts/
vi /opt/scripts/pull-datasync.sh

#!/bin/bash
# pull-datasync.sh : Pull site updates down from master to front end web servers via rsync

status="/var/www/html/datasync.status"

if [ -d /tmp/.rsync.lock ]; then
echo "FAILURE : rsync lock exists : Perhaps there is a lot of new data to pull from the master server. Will retry shortly" > $status
exit 1
fi

/bin/mkdir /tmp/.rsync.lock

if [ $? = "1" ]; then
echo "FAILURE : can not create lock" > $status
exit 1
else
echo "SUCCESS : created lock" > $status
fi

echo "===== Beginning rsync ====="

nice -n 20 /usr/bin/rsync -axvz --delete -e ssh [email protected]:/var/www/vhosts/ /var/www/vhosts/

if [ $? = "1" ]; then
echo "FAILURE : rsync failed. Please refer to solution documentation" > $status
exit 1
fi

echo "===== Completed rsync ====="

/bin/rm -rf /tmp/.rsync.lock
echo "SUCCESS : rsync completed successfully" > $status

Be sure to set executable permissions on this script so cron can run it:

chmod 755 /opt/scripts/pull-datasync.sh

Using rsync to push changes from the master web server down to slave web servers:

Using rsync to push changes from the master down to the slaves also has some important advantages. First off, the slave web servers will not have SSH access to the master server. This could become critical if one of the slave servers is ever compromised and try’s to gain access to the master web server. The next advantage is the push method does not cause a serious CPU strain cause the master will run rsync against the slave servers, one at a time.

The disadvantage here would be if you have a lot of web servers syncing content that changes often. Its possible that your updates will not be pushed down to the web servers as quickly as expected since the master server is syncing the servers one at a time. So be sure to test this out to see if the results work for your solution. Also if you are cloning your servers to create additional web servers, you will need to update the rsync configuration accordingly to include the new node.

Below will show the procedure for setting this up:

1. To make administration easier, its recommended to setup your /etc/hosts file on the master web server to include a list of all the servers hostnames and internal IP’s.

vi /etc/hosts
192.168.1.1 web01 web01.example.com
192.168.1.2 web02 web02.example.com
192.168.1.3 web03 web03.example.com

2. Create SSH keys on the master web server:

ssh-keygen -t dsa

3. Now copy the public key generated on the master web server (/root/.ssh/id_dsa.pub) and append it to the slave web servers, /root/.ssh/authorized_keys2 file.

4. Test ssh’ing in as root from the master web server to each slave web server
# On web01

ssh root@web02

5. Assuming you were able to log in to the slave web servers cleanly, then its time to create the rsync script on the master web server. Please note that I am assuming your sites documentroot’s are stored in /var/www/vhosts. If not, please change the script accordingly and test!

mkdir -p /opt/scripts/
vi /opt/scripts/push-datasync.sh

#!/bin/bash
# push-datasync.sh - Push site updates from master server to front end web servers via rsync

webservers=(web01 web02 web03 web04 web05)
status="/var/www/html/datasync.status"

if [ -d /tmp/.rsync.lock ]; then
echo "FAILURE : rsync lock exists : Perhaps there is a lot of new data to push to front end web servers. Will retry soon." > $status
exit 1
fi

/bin/mkdir /tmp/.rsync.lock

if [ $? = "1" ]; then
echo "FAILURE : can not create lock" > $status
exit 1
else
echo "SUCCESS : created lock" > $status
fi

for i in ${webservers[@]}; do

echo "===== Beginning rsync of $i ====="

nice -n 20 /usr/bin/rsync -avzx --delete -e ssh /var/www/vhosts/ root@$i:/var/www/vhosts/

if [ $? = "1" ]; then
echo "FAILURE : rsync failed. Please refer to the solution documentation " > $status
exit 1
fi

echo "===== Completed rsync of $i =====";
done

/bin/rm -rf /tmp/.rsync.lock
echo "SUCCESS : rsync completed successfully" > $status

Be sure to set executable permissions on this script so cron can run it:

chmod 755 /opt/scripts/push-datasync.sh

Now that you have the script in place and tested, its now time to set this up to run automatically via cron. For the example here, I am setting up cron to run this script every 10 minutes.

If using the push method, put the following into the master web servers crontab:

crontab -e
# Datasync script
*/10 * * * * /opt/scripts/push-datasync.sh

If using the pull method, put the following onto each slave web servers crontab:

crontab -e
# Datasync script
*/10 * * * * /opt/scripts/pull-datasync.sh

Dirty disks

Many companies these days are making use of hosting providers to house their critical compute infrastructure. There are many financial benefits to doing this as the costs of running a data center far exceeds the costs of leasing servers from a hosting provider.

But as with many things, there is always another side of the story that must be considered, your environments security. Dedicated and cloud hosting providers reuse their hardware often as clients may lease the hardware for only a short time. The question you must ask yourself is: What do these providers do with the hard drives once a server has been removed from the clients account before they lease that server to another client? Do they have documentation proving their compliance with the DOD_5220.22-M standard for secure data removal?

Just because an operating system has been re-installed, that does not mean the data is permanently wiped. In theory, one could argue that your data is never fully removed from a drive unless you properly destroy the drive, which can be an expensive operation, but the DOD_5220.22-M standard provides a set of reasonable guidelines for rendering the data unrecoverable for many situations.

Extracting the data from a recycled server is not all that complicated for your average system administrator. In fact, using one technique, its a simple one liner using dd and strings. So the next time you are curious about your hosting providers security practices, try running the following on the server or cloud server and see what fragments of data you can recover by:

dd if=/dev/sda1 bs=1M skip=5000 | strings

Review the output. If you see data on here from other users, the disks are dirty and that hosting provider is not properly sanitizing the hard drives before leasing the server to another client. This has huge implications, especially if you need to comply with security standards such as HIPAA, FISMA, PCI compliance or any of the other standards out there.

So the moral of the story is, always do your homework before choosing a dedicated or cloud hosting provider. Ask about their security procedures, ask them about their PCI, HIPAA, FISMA, ISO 27001, SSAE 16, etc. Never take anything at face value. Ask them for their report on compliance.

When it comes to your security stance, paranoia is your greatest defense.

Using AIDE for file integrity monitoring (FIM) on CentOS

PCI-DSS 3.1 section 10.5.5 has the following requirement:

Use file-integrity monitoring or change-detection software on logs to ensure that existing log data cannot be changed without generating alerts (although new data being added should not cause an alert).

For large solutions, I would suggest using a well known tool such as Tripwire Enterprise. However many small to mid size companies that have a small footprint within their card holder data environment (CDE), may not be able to afford this. So what can companies use to meet this requirement? Implement AIDE (Advanced Intrusion Detection Environment).

Taken from the projects website, AIDE creates a database from the regular expression rules that it finds from the config file(s). Once this database is initialized it can be used to verify the integrity of the files.

AIDE is a very simple (yet powerful) program that runs from cron checking your files (typically once a night), and it will scan your system looking for any changes in the directories its monitoring. There are a number of different ways to use this program, but I’ll outline one that I like to use.

My requirements:
1. I want the reports to run nightly.
2. All change reports are emailed to me so I can archive them for a year offsite.
3. Have the database automatically commit the additions, deletions, and changes to baseline each time its ran.

In the event my system was compromised, I want to ensure that the malicious user was not able to modify, or delete my previous reports. Therefore, I choose not to store them on the machine. Its true that once the malicious user gained access to my system, they could change my AIDE config on me, but at least my previous reports will be intact which should help me when determining what malicious changes this user made to my server. Please note that I am making an assumption here that you are already backing up your system nightly, which would include your AIDE database! If you do not currently have a backup strategy in place, get one. Tools such as AIDE helps identify what files a malicious user may have changed, but if they completely crippled the system, you will need to restore from backups.

Setting up AIDE is fairly straight forward. It exists in most of package repositories out there including most variants of Linux and BSD.

On Linux based systems, you can install it by:

[root@web01 ~]# yum install aide

Once you have AIDE installed, the default configuration from the upstream provider should give you a reasonable default aide.conf. But what if you wanted to add your website documentroot to this so you can keep track of what files are changing on your website? Well, we simple add the directory to the aide.conf by including:

[root@web01 ~]# vim /etc/aide.conf
...
/var/www/vhosts/domain.com
...

Now AIDE will be keeping track of our website. But adding your site may lead to very noisy reports because most websites implement caching. So this now becomes a balancing act to exclude directories that change often, yet retain enough of your sites critical content. We could just leave the entire directory in AIDE, but I know I personally don’t want to read a change report that contains 1,000 changes every day. So in the case of this wordpress site, I exclude the cache directory by appending the following to my custom configuration:

[root@web01 ~]# vim /etc/aide.conf
...
/var/www/vhosts/domain.com
!/var/www/vhosts/domain.com/web/wp-content/cache
...

The “!” means NOT to monitor that specific directory. You will need to run AIDE a few times and fine tune the configuration before you get a report that is useful for your specific needs.

On CentOS, I had to change the following setting in /etc/aide.conf for the initialization to work:

[root@web01 ~]# vim /etc/aide.conf
...
# Whether to gzip the output to database
gzip_dbout=no
...

Once you have your configuration tuned for your specific purposes, you first my initialize the database to create a baseline before you can start getting change reports. You do this by:

[root@web01 ~]# aide --init
[root@web01 ~]# mv -f /var/lib/aide/aide.db.new /var/lib/aide/aide.db

Now, try making a basic change to /etc/hosts, then run a check on AIDE to see if it detects the change:

[root@web01 ~]# aide --check

If you are like me and would prefer not to have to log into 10 servers a day to run and view the reports, you can configure cron to run the report, and email you the results daily, while committing the changes to baseline. If you choose to go this route, it is critical that you review your change reports as they come in because we are essentially committing every change to the baseline. Here is how I configure cron:

[root@web01 ~]# crontab -e
# Perform daily change report
0 3 * * * /usr/sbin/aide --update | mail -s "AIDE Audit Report : web01.example.com" [email protected]

# Initialize the AIDE database once a day:
30 3 * * *  nice -19 /usr/sbin/aide --init;mv -f /var/lib/aide/aide.db.new /var/lib/aide/aide.db

Posted below is an example report that AIDE would send me via email daily:

AIDE found differences between database and filesystem!!
Start timestamp: 2012-09-13 01:24:05

Summary:
Total number of files: 57620
Added files: 1
Removed files: 1
Changed files: 1

---------------------------------------------------
Added files:
---------------------------------------------------

added: /var/spool/cron/root

---------------------------------------------------
Removed files:
---------------------------------------------------

removed: /etc/.aide.conf.swp

---------------------------------------------------
Changed files:
---------------------------------------------------

changed: /etc/aide.conf

--------------------------------------------------
Detailed information about changes:
---------------------------------------------------

File: /etc/aide.conf
Size : 2381 , 2390
Mtime : 2012-09-13 01:24:05 , 2012-09-13 01:24:05
Ctime : 2012-09-13 01:24:05 , 2012-09-13 01:24:05
MD5 : b+qbBDYEPesd+NCR1VRQHQ== , rG5pNPghdweedpU/c0ieHw==
RMD160 : T081ixhqik4efC3dfeCOBDCKpP4= ,
qe8MV0eteklAKmlZ5LTubaOUNKo=
SHA256 : g4jstEtfU8BNu+43jkrxJc9Cpr2SABZj ,
a65iaV54XR4vu8/zbA4Tdfe2U+W5uPNY

So this reports tells me that root’s crontab was added, a swap file for aide.conf was removed, and I updated the /etc/aide.conf recently.

Please remember that utilizing a tool to provide file integrity monitoring is only one part of a defense in depth strategy. There is no silver bullet for system security, but every layer you add will increase your security footprint which helps you with taking a proactive approach to security.

Using logrotate for custom log directories with wildcards

Logrotate is a useful application for automatically rotating your log files. If you choose to store certain logs in directories that logrotate doesn’t know about, you need to create a definition for this.

In the example listed near the bottom of the post, we are showing how we can rotate our logs that are stored in /home/sites/logs. These logs are for Apache access and error logs. But also has a third log file that ends in .logs.

If you ran through this quickly, one might simply create a definition as a wildcard, such as:
/home/sites/logs/*

Unfortunately, logrotate will read this literally and now will rotate compressed log files that were already in rotation, leaving you with a mess in your log directories like this:

/home/sites/logs/http-access.log.1.gz
/home/sites/logs/http-access.log.1.gz.1
/home/sites/logs/http-access.log.1.gz.1.1
/home/sites/logs/http-access.log.1.gz.1.1.1
/home/sites/logs/http-access.log.1.gz.1.1.1.1

And it just goes down hill from there. This exact thing happened to me cause I forgot to read the man page which clearly stated:

 Please use wildcards with caution. If you specify *, log rotate will
rotate all files, including previously rotated ones. A way around this
is to use the olddir directive or a more exact wildcard (such as
*.log).

So using wildcards are still acceptable, but use them with caution. As I had three types of files to rotate in this example, I found that I can string them together as follows:

/home/sites/logs/*.error /home/sites/logs/*.access /home/sites/logs/*.log

I’ll post the example configuration below that was implemented on a FreeBSD solution for logging this custom directory:

cat /usr/local/etc/logrotate.conf
# see "man log rotate" for details
# rotate log files weekly
daily

# keep 4 weeks worth of backlogs
rotate 30

# create new (empty) log files after rotating old ones
create

# uncomment this if you want your log files compressed
compress

# RPM packages drop log rotation information into this directory
# include /usr/local/etc/logrotate.d

# system-specific logs may be configured here
/home/sites/logs/*.error
/home/sites/logs/*.access
/home/sites/logs/*.log {
daily
rotate 30
sharedscripts
postrotate
/usr/local/etc/rc.d/apache22 reload > /dev/null 2>/dev/null
endscript
}

Now lets test this out to confirm the logs will rotate properly by running:

logrotate -f /usr/local/etc/logrotate.conf
logrotate -f /usr/local/etc/logrotate.conf
logrotate -f /usr/local/etc/logrotate.conf

When you check your logs directory, you should now see the files rotating out properly:

/home/sites/logs/http-access.log
/home/sites/logs/http-access.log.1.gz
/home/sites/logs/http-access.log.2.gz
/home/sites/logs/http-access.log.3.gz

Full Server Rsync Migrations

There are times when you need to do a one to one server migration.  This can seem like a daunting task, however with the magic of rsync, performing full server migrations can be a fairly painless task requiring little downtime.

Prerequisites

On both the old and new servers, you want to ensure the following requirements are met:

1. Confirm both the old and new servers are using the same hardware architecture. You cannot perform an rsync migration if one server is 32-bit, and the other is a 64-bit system. This can be verified by running the following command, and checking to see if it has “i386”, which means 32-bit, or if both have “x86_64”, which stands for 64-bit.

uname -a
Linux demo 2.6.18-308.el5xen #1 SMP Tue Feb 21 20:47:10 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

So in our example, I have verified that both the old and new servers are 64-bit.

2. Confirm they are both running the same exact version of the operating system. Normally this means simply confirming both servers are at the latest patch level which you can do by running:

yum update
cat /etc/redhat-release
CentOS release 5.8 (Final)

For this article, I will be using two servers that are running CentOS 5.8 (Final).

3. Confirm rsync is installed on both servers

yum install rsync

4. Clean up server before migration. Depending on the amount of files on your server, the initial rsync migration can take quite a while. So you will want to browse through your old server and remove any extraneous temporary, cache, or other large files that you no longer need. You should also check your logs and ensure that their sizes are reasonable, and archive or delete the older logs you no longer need.

5. If you are not going to be swapping IP’s, and simply updating DNS to point to the new server, confirm that all your domains on the old server have a very low TTL set in the zonefile. A TTL of 300 is usually considered the lowest acceptable TTL to set.

Begin server migration

The procedure I’ll be writing out below is a two step process. It is meant to help minimize the amount of downtime that is involved when you swap the IP’s or update DNS, assuming you have a low TTL set. The steps are below:
1. Perform initial rsync
2. Perform final rsync and ip swap (or DNS update)

The initial rsync is just used to get the majority of the static files over to the new server. The final rsync is meant to update anything that is dynamic, such as logs, updated web content, databases, etc.

So before we begin, you will want to create an excludes file on the old server. This file will tell rsync NOT to copy over system specific information that is not needed for your new system.

vi /root/rsync-exclude.txt
/boot
/proc
/sys
/tmp
/dev
/var/lock
/etc/fstab
/etc/mdadm.conf
/etc/mtab
/etc/resolv.conf
/etc/conf.d/net
/etc/network/interfaces
/etc/networks
/etc/sysconfig/network*
/etc/sysconfig/hwconf
/etc/sysconfig/ip6tables-config
/etc/sysconfig/kernel
/etc/hostname
/etc/HOSTNAME
/etc/hosts
/etc/modprobe*
/etc/modules
/etc/udev
/net
/lib/modules
/etc/rc.conf
/lock
/etc/ssh/ssh_host_rsa_key.pub
/etc/ssh/ssh_host_rsa_key
/etc/ssh/ssh_host_dsa_key.pub
/etc/ssh/ssh_host_dsa_key
/etc/network.d
/etc/network/*
/etc/machine-id
/usr/share/nova-agent*
/usr/sbin/nova-agent*
/etc/rc.d/init.d/nova-agent*
/etc/init.d/nova-agent*
/etc/rackspace-monitoring-agent*
/etc/rackspace
/etc/driveclient/bootstrap.json

The example above should cover a couple of different distros. But always review someone else’s example before applying it to your own systems!!

Now that we have the proper excludes in place, lets run a dry run of rsync to see what would have happened before we actually do it. Please note that this is to run on the old server. Replace xxx.xxx.xxx.xxx with the IP of the new server:

rsync -azPx -e ssh --dry-run -azPx --delete-after --exclude-from="/root/rsync-exclude.txt" / [email protected]:/

If all looks well, lets perform the actual initial rsync:

rsync -azPx -e ssh -azPx --delete-after --exclude-from="/root/rsync-exclude.txt" / [email protected]:/

Depending on how much data you have, this could take a few minutes, or many hours. Once this is complete, you will want to schedule a maintenance window to perform the final rsync and IP swap (or DNS update). You want to perform this during a maintenance window as you will need to stop any database services or anything else that has very dynamic data to avoid corruption. So in the example, I just have a basic LAMP server, so I will need to shut down MySQL before performing the final rsync. Here are the steps I’ll be using:
1. Stop MySQL on old server
2. Perform final rsync
3. On new server, reboot server and test everything
4. Swap IP from old server to new, or update your DNS accordingly.

On the old server:

service mysql stop
rsync -azPx -e ssh -azPx --delete-after --exclude-from="/root/rsync-exclude.txt" / [email protected]:/

Now we are ready to start testing our new server.

Testing And Go-Live

Lets wave that dead chicken over the alter, its time to see if your new server survives a reboot, and if all the services come back online properly. There are no guides for this. Just reboot your server, then test out your sites, databases, and keep a wary eye on your system logs.

Once you have confirmed that everything looks good, it will then be safe to swap the IP’s, or update DNS accordingly. In the event that a problem surfaces shortly after the migration, you always have the option of rolling back to your older server, assuming you won’t be losing any critical transactions.

Using PAM to enforce access based on time

Sometimes there is a need to restrict user access based on time. This could be access to one particular service, or all PAM enabled services. A common example is to only allow access for the user ‘bob’ monday through friday between 9:00AM – 6:00PM. This can be enforced by utilizing the pam_time module.

The pam_time module is an account module type. No arguments are passed directly to the module, but instead all configuration takes place within /etc/security/time.conf.

The time.conf operates based on rules, and each rule uses the following syntax:

services;ttys;users;times

Example Rules
Restrict SSHD access for bob to weekdays between 9:00AM – 7:00PM

sshd;*;bob;Wk0900-1900

Restrict ALL access for bob to weekdays between 9:00AM – 5:00PM

*;*;bob;Wk0900-1700

Restrict ALL access for ALL users except root to weekdays between 9:00AM – 5:00PM

sshd;*;bob;Wk0900-1700

Restrict SSH access for ALL users except bob and jane to weekdays between 9:00AM – 5:00PM

sshd;*;!bob|!jane;Wk0900-1700

To only allow bob to access SSH on Tuesdays between 3:23PM and 4:24PM:

sshd;*;bob;Tu1523-1624

Below is all the available abbreviates for the days of the week:

Mo : Monday Fr : Friday Wd : Sa/Su
Tu : Tuesday Sa : Saturday wk : Mo/Tu/We/Th/Fr
We : Wenesday Su : Sunday
Th : Thursday Al : All Days

Installation And Configuration
In our example, I am going be setting this up on a CentOS 5.x server. For the restricted user, the following variables will be used:

username: bob
allowed access times: 9:00AM - 6:00PM
restricted services: SSHD

First, add the user and time restriction to /etc/security/time.conf:

sshd;*;bob;Wk0900-1800

Now, update the pam module for login and sshd. You are including ‘account required pam_time.so‘. But I’ll post entire file for reference

cat /etc/pam.d/sshd
#%PAM-1.0
auth required pam_sepermit.so
auth include password-auth
account required pam_time.so
account required pam_nologin.so
account include password-auth
password include password-auth
# pam_selinux.so close should be the first session rule
session required pam_selinux.so close
session required pam_loginuid.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session required pam_selinux.so open env_params
session optional pam_keyinit.so force revoke
session include password-auth
cat /etc/pam.d/system-auth
#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth required pam_env.so
auth sufficient pam_unix.so nullok try_first_pass
auth requisite pam_succeed_if.so uid >= 500 quiet
auth required pam_deny.so
account required pam_time.so
account required pam_unix.so
account sufficient pam_localuser.so
account sufficient pam_succeed_if.so uid < 500 quiet
account required pam_permit.so
password requisite pam_cracklib.so try_first_pass retry=3 type=
password sufficient pam_unix.so md5 shadow nullok try_first_pass use_authtok
password required pam_deny.so
session optional pam_keyinit.so revoke
session required pam_limits.so
session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session required pam_unix.so

And finally, restart SSH

service sshd restart

Creating table indexes in MySQL

You may ask, what is a table index and how will it help performance? Table indexes provide MySQL a more efficient way to retrieve records. I often like to use the following example to explain it:

Imagine you have a phone book in front of you, and there are no letters in the top right corner that you can reference if you are looking up a last name. Therefore, you have to search page by page through hundreds of pages that have tens of thousands of results. Very inefficient and intensive. Think of this as a full table scan.

Now picture the phone book having the letter references in the top right corner. You can flip right to section “La – Lf” and only have to search through a smaller result set. The time to find the results is must faster and easier.

Common symptoms where this logic can be applied is when you log onto a server and see MySQL frequently chewing up a lot of CPU time, either constantly, or in spikes. The slow-query-log is also a great indicator cause if the query is taking a long time to execute, chances are it was because the query was making MySQL work too hard performing full table scans.

The information below will provide you with the tools to help identify these inefficient queries and how to help speed them up.

There are 2 common ways to identify queries that are very inefficient and may be creating CPU contention issues:

View MySQL’s process list:

When entering into MySQL CLI, you will want to look for any queries that you see that are often running to evaluate. You can see the queries by:

mysql
show processlist;

View slow queries log:

To view this, first check to ensure the slow-query-log variables are enabled in the my.cnf:

log-slow-queries=/var/lib/mysqllogs/slow-log
long_query_time=5

Now, lets look at an example of a slow query that got logged. Please note, these queries got logged here cause they took longer to run then the max seconds defined on long_query_time:

# Time: 110404 22:45:25
# User@Host: wpadmin[wordpressdb] @ localhost []
# Query_time: 14.609104  Lock_time: 0.000054 Rows_sent: 4  Rows_examined: 83532
SET timestamp=1301957125;
SELECT * FROM wp_table WHERE `key`='5544dDSDFjjghhd2544xGFDE' AND `carrier`='13';

Here is a query that we know is know runs often, and takes over 5 seconds to execute:

SELECT * FROM wp_table WHERE `key`='5544dDSDFjjghhd2544xGFDE' AND `carrier`='13';

Within the MySQL cli, run the following to view some more details about this query:

explain SELECT * FROM wp_table WHERE `key`='5544dDSDFjjghhd2544xGFDE' AND `carrier`='13';
+----+-------------+------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | wp_table   | ALL  | NULL          | NULL |    NULL | NULL | 83532 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+-------+-------------+

The 2 important fields here are:

- Type: When you see "ALL", MySQL is performing a full table scan which is a very CPU intensive operation.
- Row: This is the total amount of rows returned in the table, so 83,000 results is a lot to sort through.

In general, when you are creating an index, you want to choose a field that has the highest amount of unique characters. In our case, we are going to use the field ‘key’ as shown below:

create index key_idx on wp_table(`key`);

Now, lets rerun our explain to see if the query is now returning less rows:

explain SELECT * FROM wp_table WHERE `key`='5544dDSDFjjghhd2544xGFDE' AND `carrier`='13';
+----+-------------+------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | wp_table   | ALL  | NULL          | NULL |    NULL | NULL | 13    | Using where |
+----+-------------+------------+------+---------------+------+---------+------+-------+-------------+

This is much better. Now each time that common query runs, MySQL will only have to go through 13 rows, instead of it having to check through 83,000.

Important note: Each time a table is updated, MySQL has to update the indexes, which could create some performance issues. Therefore, its recommended to keep the amount of indexes per table low, perhaps in the 4-6 range.

How to see what indexes already exist on a table and their cardinality:

show indexes from wp_table;

How to remove a table index:

delete index key_idx from wp_table;

RCS – Introduction

When there are 40+ admin’s logging into a client’s server, it can become difficult to keep track of who modified what. And more importantly, in the event that a change created an undesired result, being able to find out exactly what was changed so it can be quickly rolled back. This also becomes a critical component of change control if the client requires specific security requirements such as PCI-DSS 2.0.

This system of revision control is much cleaner to track changes rather creating a bunch of apache2.bak, apache2.20120212, apahce2.conf.031212, etc. Instead, you can view all the versions of the file available simply by:
rlog /etc/apache2/apache2.conf

RCS offers the following features in a very easy to use CLI:

- Store and retrieve multiple revisions of text
- Maintain a complete history of changes
- Maintain a tree of revisions
- Automatically identify each revision with name, revision number, creation time, author, etc
- And much more

For our specific use case, critical files to check into RCS would be configuration files such as /etc/sysctl.conf, /etc/ssh/sshd_config /etc/vsftpd/vsftpd.conf, /etc/httpd/conf/httpd.conf and stuff of that nature.

If RCS is not already installed, then simply run the following depending on your operating system:

yum install rcs
apt-get install rcs

Basic Use Case
The easiest way to learn RCS is to see it in action. So in the use case below, we are going to perform a series of changes to the httpd.conf file.  Before making changes to the file, check it into RCS first so we have a starting point:

root@web01:/etc/apache2# ci -l -wjdoe /etc/apache2/apache2.conf
/etc/apache2/apache2.conf,v  <--  /etc/apache2/apache2.conf
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>> Original Apache Configuration File
>> .
initial revision: 1.1
done

Now we can make our change to the config. As an example, we are going to be making some tuning changes to Apache.

vi /etc/apache2/apache2.conf

Once our changes are made, we check the changes in:

root@web01:/etc/apache2# ci -l -wjdoe /etc/apache2/apache2.conf
/etc/apache2/apache2.conf,v  <--  /etc/apache2/apache2.conf
new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
>> Tuning changes per ticket #123456
>> .
done

Pretend a few days go by and you receive a call from the client reporting issues with Apache. You log into the server and checks to see if anyone recently made changes to Apache:

root@web01:/etc/apache2# rlog /etc/apache2/apache2.conf

RCS file: /etc/apache2/apache2.conf,v
Working file: /etc/apache2/apache2.conf
head: 1.2
branch:
locks: strict
        root: 1.2
access list:
symbolic names:
keyword substitution: kv
total revisions: 2;     selected revisions: 2
description:
Original Apache Configuration File
----------------------------
revision 1.2    locked by: root;
date: 2012/03/19 15:44:06;  author: jdoe;  state: Exp;  lines: +3 -3
Tuning changes per ticket #123456
----------------------------
revision 1.1
date: 2012/03/19 15:28:38;  author: jdoe;  state: Exp;
Initial revision
=============================================================================

So this tells us that user jdoe make changes to the apache2.conf on 3/19/2012 per ticket #123456. Lets see what changes he made by comparing version 1.1 to version 1.2:

root@web01:/etc/apache2# rcsdiff -r1.1 -r1.2 /etc/apache2/apache2.conf
===================================================================
RCS file: /etc/apache2/apache2.conf,v
retrieving revision 1.1
retrieving revision 1.2
diff -r1.1 -r1.2
77c77
< KeepAlive On
---
> KeepAlive Off
105,106c105,106
<     MaxSpareServers      10
<     MaxClients          150
---
>     MaxSpareServers      1
>     MaxClients          15
root@web01:/etc/apache2#

From the looks of this, it appears he may have typo’ed the MaxClient and MaxSpareServer variable when working that ticket. So lets roll back the configuration file to version 1.1 since that was the last known working version:

root@web01:/etc/apache2# co -r1.1 /etc/apache2/apache2.conf
/etc/apache2/apache2.conf,v  -->  /etc/apache2/apache2.conf
revision 1.1
writable /etc/apache2/apache2.conf exists; remove it? [ny](n): y
done

Then test Apache to confirm everything is working again. Be sure to commit your changes as a rollback is still a change:

root@web01:/etc/apache2# ci -l -wmsmith /etc/apache2/apache2.conf
/etc/apache2/apache2.conf,v  <--  /etc/apache2/apache2.conf
new revision: 1.3; previous revision: 1.2
enter log message, terminated with single '.' or end of file:
>> Rolling back changes made in ticket #123456 due to problems
>> .
done

When the next person logs in to see what changes have been made to the apache.conf, they will see the following:

root@web01:/etc/apache2# rlog /etc/apache2/apache2.conf

RCS file: /etc/apache2/apache2.conf,v
Working file: /etc/apache2/apache2.conf
head: 1.3
branch:
locks: strict
        root: 1.3
access list:
symbolic names:
keyword substitution: kv
total revisions: 3;     selected revisions: 3
description:
Original Apache Configuration File
----------------------------
revision 1.3    locked by: root;
date: 2012/03/19 16:00:38;  author: msmith;  state: Exp;  lines: +3 -3
Rolling back changes made in ticket #123456 due to problems
----------------------------
revision 1.2
date: 2012/03/19 15:44:06;  author: jdoe;  state: Exp;  lines: +3 -3
Tuning changes per ticket #123456
----------------------------
revision 1.1
date: 2012/03/19 15:28:38;  author: jdoe;  state: Exp;
Initial revision
=============================================================================