Purging MySQL binary logs

Binary logs contain a record of all changes to the database, both in data and structure. It does not keep track of simple SELECT statements. These are required for MySQL replication and can also be useful for performing point in time backups after a nightly database dump has been restored.

If binary logs are enabled within MySQL, but you are not expiring them after X amount of days, they will eventually cause your server to run out of disk space. Typically I see this value set to 5 days.

As a quick note before we begin, if the server is completely out of disk space, you may need to free up some space beforehand so you can work with the system. You can temporarily free up space by changing the filesystems reserved block setting from 5% to 1% by:

[root@db01 ~]# df -h /
/dev/xvda1       79G   76G     0 100% /
[root@web01 ~]# tune2fs -m 2 /dev/xvda1
[root@web01 ~]# df -h /
/dev/xvda1       79G   76G  1.8G  98% /

To determine how many days worth of logs you can purge, first identify if this server is Master MySQL running running MySQL replication to a Slave MySQL server. If it is, log onto the Slave MySQL server, and see which binary log it is currently reading from:

[root@db02 ~]# mysql
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.xx.xx.xx
Master_User: replication
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000244
...

So based off that, we can purge the binary logs up to mysql-bin.000244. So back on the master server, check the MySQL bin log status by:

[root@db01 ~]# mysql
mysql> show binary logs;
+------------------+------------+
| Log_name         | File_size  |
+------------------+------------+
| mysql-bin.000230 |  454230880 |
| mysql-bin.000231 |  511519497 |
| mysql-bin.000232 |  483552032 |
| mysql-bin.000233 |  472847181 |
| mysql-bin.000234 |  443236582 |
| mysql-bin.000235 |  408021824 |
| mysql-bin.000236 |  531519875 |
| mysql-bin.000237 |  468583798 |
| mysql-bin.000238 |  495423661 |
| mysql-bin.000239 |  474475274 |
| mysql-bin.000240 |  689162898 |
| mysql-bin.000241 | 1073749465 |
| mysql-bin.000242 |  939200512 |
| mysql-bin.000243 |  702552334 |
| mysql-bin.000244 |   40656827 |
+------------------+------------+

In my example, as I only want to keep 5 days worth of binary logs, we can purge all the prior bin logs by:

[root@db01 ~]# mysql
mysql> purge binary logs to 'mysql-bin.000240;
mysql> show binary logs;
+------------------+------------+
| Log_name         | File_size  |
+------------------+------------+
| mysql-bin.000240 |  689162898 |
| mysql-bin.000241 | 1073749465 |
| mysql-bin.000242 |  939200512 |
| mysql-bin.000243 |  702552334 |
| mysql-bin.000244 |   40656827 |
+------------------+------------+

As I only want to have a 5 day retention set in general, you can do this live without restarting mysql by:

[root@db01 ~]# mysql
mysql> show variables like '%expire_logs_days%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| expire_logs_days | 0     |
+------------------+-------+

mysql> SET GLOBAL expire_logs_days = 5;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%expire_logs_days%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| expire_logs_days | 5     |
+------------------+-------+
1 row in set (0.00 sec)

Then make the setting persistent across MySQL restarts by adding it to the /etc/my.cnf:

[root@web01 ~]# vim /etc/my.cnf
[mysqld]
...
expire_logs_days = 5
...

If you had to modify the filesystems block reservation, change it back to how it was beforehand, which is typically 5%. You can do this by:

[root@db01 ~]# tune2fs -m 5 /dev/xvda1

cURL Cheat Sheet

Curl is the swiss army knife for gathering information while troubleshooting websites. There are many ways to use curl and this guide is simply documenting the ones I often use, but can never seem to remember the specific flags.

General Usage

Test a page behind a username/password prompt:

[user@workstation01 ~]# curl --user name:password http://www.example.com

Download files from github:

[user@workstation01 ~]# curl -O https://raw.github.com/username/reponame/master/filename

Download content via curl to a specific filename:

[user@workstation01 ~]# curl -o archive.tgz https://www.example.com/file.tgz

Run a script from a remote source on server. Understand the security implications of doing this as it can be dangerous!

[user@workstation01 ~]# source <( curl -sL https://www.example.com/your_script.sh)

Have curl make the connection through a specific interface on the server:

[user@workstation01 ~]# curl --interface bond0 icanhazip.com

Website Troubleshooting

Receive output detailing the HTTP response and request headers, following all redirects, and discarding the page body:

[user@workstation01 ~]# curl -Lsvo /dev/null https://www.example.com

Test a domain hosted on the local server, bypassing DNS:

[user@workstation01 ~]# curl -sLH "host: www.example.com" localhost

Test a domain against specific IP, bypassing the need for modifying /etc/hosts:

[user@workstation01 ~]# curl -IL https://www.example.com --resolve www.example.com:443:123.123.123.123

or 

[user@workstation01 ~]# curl -Lsvo /dev/null --resolve 'example.com:443:123.123.123.123' https://www.example.com/

or

[user@workstation01 ~]# curl -Lsvo /dev/null --header "Host: example.com" https://123.123.123.123/

Send request using a specific user-agent. Sometimes a server will have rules in place to block the default curl user-agent string. Or perhaps you need to test using a specific user-agent. You can pass specific user-agent strings by running:

[user@workstation01 ~]# curl --user-agent "USER_AGENT_STRING_HERE" www.example.com

A comprehensive listing of available user-agent strings available resides at:
http://www.useragentstring.com/pages/useragentstring.php

For example, lets say you need to test a mobile device user-agent to see if a custom redirect works:

[user@workstation01 ~]# curl -H "User-Agent: Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19" -IL http://www.example.com/about/contact-us

HTTP/1.1 301 Moved Permanently
Date: Tue, 17 Nov 2015 18:10:09 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips
Location: http://www.example.com/mobi/contact.php
Content-Type: text/html; charset=iso-8859-1

SSL/TLS Testing

Test to see if the site supports specific SSL/TLS protocols:

[user@workstation01 ~]# curl --sslv2 https://www.example.com
[user@workstation01 ~]# curl --sslv3 https://www.example.com
[user@workstation01 ~]# curl --tlsv1 https://www.example.com
[user@workstation01 ~]# curl --tlsv1.0 https://www.example.com
[user@workstation01 ~]# curl --tlsv1.1 https://www.example.com
[user@workstation01 ~]# curl --tlsv1.2 https://www.example.com

Performance Troubleshooting

Load times can be impacted by a number of things, such as the TLS handshake, DNS lookup time, redirects, transfers, upload/downloads, etc. The curl command shown below will break down the times for each accordingly:

[user@workstation01 ~]# curl -Lsvo /dev/null https://www.example.com/ -w "\nContent Type: %{content_type} \
\nHTTP Code: %{http_code} \
\nHTTP Connect:%{http_connect} \
\nNumber Connects: %{num_connects} \
\nNumber Redirects: %{num_redirects} \
\nRedirect URL: %{redirect_url} \
\nSize Download: %{size_download} \
\nSize Upload: %{size_upload} \
\nSSL Verify: %{ssl_verify_result} \
\nTime Handshake: %{time_appconnect} \
\nTime Connect: %{time_connect} \
\nName Lookup Time: %{time_namelookup} \
\nTime Pretransfer: %{time_pretransfer} \
\nTime Redirect: %{time_redirect} \
\nTime Start Transfer (TTFB): %{time_starttransfer} \
\nTime Total: %{time_total} \
\nEffective URL: %{url_effective}\n" 2>&1

The example output is below:

...
HTTP Code: 200 
HTTP Connect:000 
Number Connects: 2 
Number Redirects: 1 
Redirect URL:  
Size Download: 136112 
Size Upload: 0 
SSL Verify: 0 
Time Handshake: 0.689 
Time Connect: 0.556 
Name Lookup Time: 0.528 
Time Pretransfer: 0.689 
Time Redirect: 0.121 
Time Start Transfer (TTFB): 0.738 
Time Total: 0.962 
Effective URL: https://www.example.com/

Another example for quickly seeing performance is below. For simplicity purposes, first create a file with our curl options already in it:

[user@workstation01 ~]# vim site-performance.cfg
\n
      DNS lookup                          :  %{time_namelookup}\n
      Connect to server (TCP)             :  %{time_connect}\n
      Connect to server (HTTP/S)          :  %{time_appconnect}\n
      Time from start until transfer began:  %{time_pretransfer}\n
      Time for redirection (if any)       :  %{time_redirect}\n
      Total time before transfer started  :  %{time_starttransfer}\n
\n
             Total time                   :  %{time_total}\n
             Size of download (bytes)     :  %{size_download}\n
             Average d/l speed (bytes/s)  :  %{speed_download}\n
\n

Then run it:

[user@workstation01 ~]# curl -w "@site-performance.cfg" -o /dev/null -s https://www.example.com
      DNS lookup                          :  0.138664
      Connect to server (TCP)             :  0.171131
      Connect to server (HTTP/S)          :  0.268969
      Time from start until transfer began:  0.269021
      Time for redirection (if any)       :  0.000000
      Total time before transfer started  :  0.532772

             Total time                   :  0.628730
             Size of download (bytes)     :  162510
             Average d/l speed (bytes/s)  :  258473.000

If a web server is running a bunch of sites and has a high load, how can you narrow down which site is likely causing the high load condition on the server? One way would be to see which site takes the longest to load, as that may indicate a resource intensive site. See the example below:

[user@workstation01 ~]# for i in www.example1.com www.example2.com www.example3.com; do echo -n "$i "; (time curl -IL $i -XGET) 2>&1 | grep -E "real|HTTP"; echo; done

www.example1.com HTTP/1.1 200 OK
real	0m0.642s

www.example2.com HTTP/1.1 200 OK
real	0m2.234s

www.example3.com HTTP/1.1 200 OK
real	0m0.421s

So www.example2.com takes 2 seconds to load. What happens to the load times on that domain during increased traffic. The example below will send 25 requests to the domain:

[user@workstation01 ~]# for i in {1..25}; do (time curl -sIL http://www.example2.com -X GET &) 2>&1 | grep -E "real|HTTP" & done

HTTP/1.1 200 OK
real	0m11.297s
HTTP/1.1 200 OK
real	0m11.395s
HTTP/1.1 200 OK
real	0m11.906s
HTTP/1.1 200 OK
real	0m12.079s
...
HTTP/1.1 200 OK
real	0m11.297s

Determining why this is happening will involve investigation outside the scope of this article. Mainly around investigating ways to cache the site or otherwise optimizing it. However at least now we know which site doesn’t perform well under increased requests, and may be causing the high server load.

How to setup Proxmox VE 5 with LXC containers on Rackspace Cloud

Testing out changes in a production environment is never a good idea. However prepping test servers can be tedious as you have to find the hardware and setup the operating system before you can begin. So I want a faster and more cost effective medium, turning a single Cloud Server into a virtualized host server for my test servers. Welcome LXC.

Taken from the providers site, LXC is an operating system-level virtualization technology for Linux. It allows a physical server to run multiple isolated operating system instances, called containers, virtual private servers (VPSs), or virtual environments (VEs.) LXC is similar to Solaris Containers, FreeBSD jails and OpenVZ.

To managed my LXC containers, I prefer to use Proxmox VE 5, which provides a clean control panel for managing my containers.

This guide will document how to install Proxmox on a 4G Rackspace Cloud Server running Debian 9. There will be a 50G SSD Cloud Block Storage volume attached to the server utilizing ZFS that will store the containers, which is outlined more below. The Proxmox installation will install everything needed to run LXC. The IP’s for the containers will be provided via NAT served from the server, therefore creating a self contained test environment.

Configure system for LXC according to best practices

Increase the open files limit by appending the following to the bottom of /etc/security/limits.conf:

[root@proxmox01 ~]# vim /etc/security/limits.conf
...
*       soft    nofile  1048576 unset
*       hard    nofile  1048576 unset
root    soft    nofile  1048576 unset
root    hard    nofile  1048576 unset
*       soft    memlock 1048576 unset
*       hard    memlock 1048576 unset

Now setup some basic kernel tweaking at the bottom of /etc/sysctl.conf:

[root@proxmox01 ~]# vim /etc/sysctl.conf
...
# LXD best practices:  https://github.com/lxc/lxd/blob/master/doc/production-setup.md
fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
vm.max_map_count = 262144

Install Proxmox VE 5

For this to work, we need a vanilla Debian 9 Cloud Server and install Proxmox on top of it, which will install the required kernel.

To get things started, update /etc/hosts to setup your fqdn, and remove any resolvable ipv6 domains:

[root@proxmox01 ~]# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
123.123.123.123 proxmox01.yourdomain.com proxmox01-iad

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Test to confirm /etc/files is setup properly. This should return your servers IP address:

[root@proxmox01 ~]# hostname --ip-address

Add the Proxmox VE repo and add the repo key:

[root@proxmox01 ~]# echo "deb http://download.proxmox.com/debian/pve stretch pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list
[root@proxmox01 ~]# wget http://download.proxmox.com/debian/proxmox-ve-release-5.x.gpg -O /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg

Update the package index and then update the system for Proxmox:

[root@proxmox01 ~]# apt update && apt dist-upgrade
* Select option for 'Install the package maintainer's version' when asked about grub

Install Proxmox VE and reboot:

[root@proxmox01 ~]# apt install proxmox-ve postfix open-iscsi
[root@proxmox01 ~]# reboot

Once the cloud server comes back online, confirm you are running the pve kernel:

[root@proxmox01 ~]# uname -a
Linux proxmox 4.13.4-1-pve #1 SMP PVE 4.13.4-25 (Fri, 13 Oct 2017 08:59:53 +0200) x86_64 GNU/Linux

Setup NAT for the containers

As the Rackspace Cloud server comes with 1 IP address, I will be making use of NAT’ed IP addresses to assign to my individual containers. The steps are documented below:

Update /etc/sysctl.conf to allow ip_forwarding:

[root@proxmox01 ~]# vim /etc/sysctl.conf
...
net.ipv4.ip_forward = 1
...

Then apply the new settings without a reboot:

[root@proxmox01 ~]# sysctl -p

To setup the NAT rules, we need to setup a script that will start on boot. Two things need to be taken into consideration here:

1. Change IP address below (123.123.123.123) in the NAT rule to your Cloud server’s public IP address.
2. This assumes you want to use a 192.168.1.0/24 network for your VE’s.

The quick and dirty script is below:

[root@proxmox01 ~]# vim /etc/init.d/lxc-routing
#!/bin/sh
case "$1" in
 start) echo "lxc-routing started"
# It's important that you change the SNAT IP to the one of your server (not the local but the internet IP)
# The following line adds a route to the IP-range that we will later assign to the VPS. That's how you get internet access on # your VPS.
/sbin/iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -o eth0 -j SNAT --to 123.123.123.123

# Allow servers to have access to internet:
/sbin/iptables -A FORWARD -s 192.168.1.0/24 -j ACCEPT
/sbin/iptables -A FORWARD -d 192.168.1.0/24 -j ACCEPT
# Be sure to add net.ipv4.ip_forward=1 to /etc/sysctl.conf, then run sysctl -p

# These are the rules for any port forwarding you want to do
# In this example, all traffic to and from the ports 11001-11019 gets routed to/from the VPS with the IP 192.168.1.1.
# Also the port 11000 is routed to the SSH port of the vps, later on you can ssh into your VPS through yourip:11000

#/sbin/iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 11000 -j DNAT --to 192.168.1.1:22
#/sbin/iptables -t nat -A PREROUTING -i eth0 -p udp --dport 11001:11019 -j DNAT --to 192.168.1.1
#/sbin/iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 11001:11019 -j DNAT --to 192.168.1.1

# In my case I also dropped outgoing SMTP traffic, as it's one of the most abused things on servers

#/sbin/iptables -A FORWARD -j DROP -p tcp --destination-port 25
#/sbin/iptables -A FORWARD -j DROP -p tcp --destination-port 2525
#/sbin/iptables -A FORWARD -j DROP -p tcp --destination-port 587
#/sbin/iptables -A FORWARD -j DROP -p tcp --destination-port 465
#/sbin/iptables -A FORWARD -j DROP -p tcp --destination-port 2526
#/sbin/iptables -A FORWARD -j DROP -p tcp --destination-port 110
#/sbin/iptables -A FORWARD -j DROP -p tcp --destination-port 143
#/sbin/iptables -A FORWARD -j DROP -p tcp --destination-port 993

;;

*) echo "Usage: /etc/init.d/lxc-routing {start}"
exit 2
;;

esac
exit 0

Setup permissions, set to run on boot, and run it:

[root@proxmox01 ~]# chmod 755 /etc/init.d/lxc-routing
[root@proxmox01 ~]# update-rc.d lxc-routing defaults
[root@proxmox01 ~]# /etc/init.d/lxc-routing start

When you go to start a new container, the container will not start as Proxmox will complain about an error similar to below:

-- Unit [email protected] has begun starting up.
Nov 06 06:07:07 proxmox01.*********** systemd-udevd[11150]: Could not generate persistent MAC address for vethMVIWQY: No such file or directory
Nov 06 06:07:07 proxmox01.*********** kernel: IPv6: ADDRCONF(NETDEV_UP): veth100i0: link is not ready

This can be corrected by:

[root@proxmox01 ~]# vim /etc/systemd/network/99-default.link
[Link]
NamePolicy=kernel database onboard slot path
MACAddressPolicy=none

Then reboot:

[root@proxmox01 ~]# reboot

Navigate your browser to the control panel, login with your root SSH credentials, and setup a Linux Bridge

- Navigate your browser to: https://x.x.x.x:8006
- Click on System --> Network
- On top, click 'Create' --> 'Linux Bridge'
	- Name:  vmbr0
	- IP address:  192.168.1.1
	- Subnet mask: 255.255.255.0
	- Autostart:  checked
	- Leave everything else blank

Setup the 50G SSD Cloud Block Storage Volume with ZFS and add to proxmox. Assuming the device is already mounted, check to see what it got mapped to by:

[root@proxmox01 ~]# lsblk 
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0  80G  0 disk 
└─xvda1 202:1    0  80G  0 part /
xvdb    202:16   0  50G  0 disk   <--- This is my new volume

First, install the ZFS utils for Linux, and enable the kernel module:

[root@proxmox01 ~]# apt-get install zfsutils-linux
[root@proxmox01 ~]# /sbin/modprobe zfs

Then add the drive to the zpool:

[root@proxmox01 ~]# zpool create zfs /dev/xvdb 
[root@proxmox01 ~]# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zfs   49.8G  97.5K  49.7G         -     0%     0%  1.00x  ONLINE  -

Now add the new disk to proxmox:

- Navigate your browser to: https://x.x.x.x:8006
- Click on Datacenter --> Storage
- On top, click 'Add' --> 'ZFS'
	- Name:  zfs
	- ZFS Pool:  zfs
	- Enable:  Checked
	- Thin provision:  Checked

Add Docker support to the containers

Docker can successfully run within a LXC container with some additional configuration. However, as the proxmox kernel may be older, the latest versions of Docker may fail to work properly. The versions of Docker you receive from the OS repos seem to work though.

First, create the containers as desired for Docker via Proxmox, then add the following to the bottom of containers LXC config file:

[root@proxmox01 ~]# /etc/pve/lxc/100.conf
...
#insert docker part below
lxc.aa_profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:

After restarting that container, you will be able to install and configure Docker as normal on that container.

Add NFS support to the containers

NFS can successfully run within a LXC container with an additional configuration.

First, create an apparmor profile for NFS:

vim /etc/apparmor.d/lxc/lxc-default-with-nfs
# Do not load this file.  Rather, load /etc/apparmor.d/lxc-containers, which
# will source all profiles under /etc/apparmor.d/lxc

profile lxc-container-default-with-nfs flags=(attach_disconnected,mediate_deleted) {
  #include 

# allow NFS (nfs/nfs4) mounts.
  mount fstype=nfs*,
}

Then reload the LXC profiles by:

apparmor_parser -r /etc/apparmor.d/lxc-containers

You can explicitly allow NFS in containers by adding another apparmor profile for them
Finally, add the following to the bottom of containers LXC config file:

[root@proxmox01 ~]# /etc/pve/lxc/100.conf
...
#insert near bottom
lxc.apparmor.profile: lxc-container-default-with-nfs

After restarting that container, you will be able to install and configure NFS as normal on that container.

Discrepancy on disk usage between df and du

When df is reporting your filesystem is almost full, but when you start using du to locate the offending directory and nothing lines up, what do you do? The example scenario is below:

[root@web01 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       40G   36G  1.5G  97% /

[root@web01 ~]# du -sh /* |grep G
1.8G    /home
1.8G    /root
1.4G    /usr
12G     /var

So there is almost 19G of space that is unaccounted for! This can happen when a large file is deleted, but the file itself is still held open by a running process. You can see a listing of open files on your server lsof. To find files that are held open by a process, but are marked as deleted, run:

[root@web01 ~]# lsof |grep del
vsftpd     1479    root  txt       REG              202,1      167536      90144 /usr/sbin/vsftpd (deleted)
mysqld     8432   mysql    4u      REG              202,1           0    1155075 /tmp/ib4ecsWH (deleted)
mysqld     8432   mysql    5u      REG              202,1           0    1155076 /tmp/ib8umUE8 (deleted)
mysqld     8432   mysql    6u      REG              202,1           0    1155077 /tmp/ib0CGmnz (deleted)
mysqld     8432   mysql    7u      REG              202,1           0    1155078 /tmp/ibGK9i6Z (deleted)
mysqld     8432   mysql   10w      REG              202,1 19470520347    1114270 /var/lib/mysql/mysql-general.log (deleted)
mysqld     8432   mysql   11u      REG              202,1           0    1155079 /tmp/ib4M9ZPq (deleted)

Notice the file: /var/lib/mysql/mysql-general.log (deleted). When doing the math on the file size, it works out to be about 18G. That would account for the ‘missing’ space that the command du is not seeing almost perfectly.

So in this case, the culprit is the mysql general log, which is typically used for debugging purposes only as it logs every query that MySQL uses. The problem with leaving this log enabled is that it can quickly fill up your disks. So MySQL simply needs to be restarted so the process will release the file, and therefore allow the space to be reclaimed by the file system. Of course, if the general log is no longer being used, it should probably be disabled in the my.cnf.