How to setup DRBD

Distributed Replicated Block Device (DRBD) mirrors block devices between multiple hosts. You can think of this loosely as network Raid 1.

DRBD is meant to run in a Active / Passive setup, meaning, you can only mount the disk on one node at a time. This is not a DRBD limitation, but rather a limitation of the common file systems (ext3, ext4, xfs, etc), since they cannot account for 2 or more servers accessing a single disk.

As with any form of data replication, always ensure you have good backups before you begin, and ensure that you have good backups throughout the life cycle of the setup. There is always a chance of data corruption or complete data loss due to some unforeseen situation, so make sure you have backups, and you have tested restoring from those backups!

Requirements

There are a few requirements that need to be met for DRBD to function properly and securely:

1. 2x servers with similar block devices
2. DRBD kernel module and userspace utilities
3. Private network between the servers
4. iptables port 7788 open between servers on the Private network
5. /etc/hosts configured
6. NTP synchronized

Preparation

For the purposes of this article, my two servers running CentOS 6 will be:

drbd01 192.168.5.2 | Cloud Block Storage 50G SSD
drbd01 192.168.5.3 | Cloud Block Storage 50G SSD

First, ensure that /etc/hosts are setup properly on both servers:

cat /etc/hosts
192.168.5.2 drbd01
192.168.5.3 drbd02

Next, open up iptables on both servers to allow communications across the private network:

cat /etc/sysconfig/iptables
-A INPUT -i eth2 -s 192.168.5.0/24 -p tcp --dport 7788 -m comment --comment "Allow DRBD on private interface" -j ACCEPT
...
service iptables restart

Finally, prep your block devices, but do not format them with a filesystem! For this guide, I am going to assume you are using separate disks for this, which are setup on /dev/xvdb:

fdisk /dev/xvdb
N
P 
1
enter
enter
t (choose 83)
w (write)
fdisk -l /dev/xvdb1 (confirm all looks well)

Install DRBD

CentOS requires the use of the RPM packages found in the repo, http://www.elrepo.org. This will provide the DKMS-based kernel module and userspace tools.

On both nodes:

rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
yum repolist
yum install drbd83-utils kmod-drbd83 dkms ntp ntpdate
service ntpd restart && chkconfig ntpd on
reboot

Configure DRBD

First, configure the global_common.conf

vi /etc/drbd.d/global_common.conf
# Change
usage-count no;
# To 
usage-count yes;

Then search for syncer {, and add rate 10M;. An example is posted below:

syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
rate 10M;
}

Some important notes:

1. usage-count. The DRBD project keeps statistics about the usage of various DRBD versions. This is done by contacting an HTTP server every time a new DRBD version is installed on a system. This can be disabled by setting usage-count no;. The default is usage-count ask; which will prompt you everytime you upgrade DRBD.

2. rate 10M: This throttles the total bandwidth that DRBD will use to perform its tasks between the 2 nodes. A good rule of thumb for this value is to use about 30% of the available replication bandwidth. Thus, if you had an I/O subsystem capable of sustaining write throughput of 180MB/s, and a Gigabit Ethernet network capable of sustaining 110 MB/s network throughput (the network being the bottleneck), you would calculate: 110 x 0.3 = 33MB/s. I opted to go with 10M for this article. 10M is a bit on the low side, so read the following guide and increased your limits as needed depending on your available bandwidth: https://drbd.linbit.com/users-guide/s-configure-sync-rate.html

Resource Settings

Configure the 2 nodes so they can communicate with each other. On both servers, setup:

vi /etc/drbd.d/cent00.res
resource cent00 {
  protocol C;
  startup { wfc-timeout 0; degr-wfc-timeout 120; }
  disk { on-io-error detach; }
  net { cram-hmac-alg "sha1"; shared-secret "4ftl421dg987d33gR"; }
  on drbd01 {
    device /dev/drbd0;
    disk /dev/xvdb1;
    meta-disk internal;
    address 192.168.5.2:7788;
  }
  on drbd02 {
    device /dev/drbd0;
    disk /dev/xvdb1;
    meta-disk internal;
    address 192.168.5.3:7788;
  }
}

Now initialize the resources DRBD will be using, and set drbd01 to be primary. This is done by:

[root@drbd01 ~]# drbdadm create-md cent00
[root@drbd02 ~]# drbdadm create-md cent00
 
[root@drbd01 ~]# service drbd start; chkconfig drbd on
[root@drbd02 ~]# service drbd start; chkconfig drbd on
[root@drbd01 ~]# drbdadm -- --overwrite-data-of-peer primary cent00 

Once this is done, the disks will begin to sync up. This could take several hours. You can check the status by:

[root@drbd01 ~]# cat /proc/drbd 
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2013-09-27 16:00:43
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:1124352 nr:0 dw:0 dr:1125016 al:0 bm:68 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:19842524
[>...................] sync'ed: 5.4% (19376/20472)M
finish: 0:31:21 speed: 10,536 (10,312) K/sec

Setup Filesystem

It is recommended to wait until the initial synchronization is complete. It simply depends on the size of the block storage, and the speed of the internal network connecting the 2 servers. You can check the status by running

[root@drbd01 ~]# cat /proc/drbd

Then before continuing, make sure you are on the primary node first:
** Due to WordPress, I had to put a space in the opening tags to avoid it being processed as markup.

[root@drbd01 ~]# drbdadm -- status cent00
< drbd-status version="8.3.16" api="88">
< resources config_file="/etc/drbd.conf">
< resource minor="0" name="cent00" cs="Connected" ro1="Primary" ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
< /resources>
< /drbd-status>

We’ll use the standard ext4 file system for this:

[root@drbd01 ~]# mkfs.ext4 /dev/drbd0
[root@drbd01 ~]# mkdir /data
[root@drbd01 ~]# mount -t ext4 /dev/drbd0 /data

Testing Scenarios

Below are some basic test scenarios you can simulate pretty easily. This goes without saying, but do not experiment with these scenarios on your production environment! Know what they do before you run them in production since they can cause problems if your not ready for it!

These are broken down into the following tests:
Test 1: Promote drbd02 to become primary
Test 2: Testing secondary node failure
Test 3: Testing primary node failure
Test 4: Recovering from split-brain

Test 1: Promote drbd02 to become primary

Unmount the partition and demote the current primary (drbd01) to secondary:

[root@drbd01 ~]# umount /data
[root@drbd01 ~]# drbdadm secondary cent00
On other server, drbd02, promote it to primary and mount the drbd device:
[root@drbd02 ~]# drbdadm primary cent00
[root@drbd02 ~]# mkdir /data
[root@drbd02 ~]# mount -t ext4 /dev/drbd0 /data
[root@drbd02 ~]# ls -d /data/*

At this time, drbd02 will now be the primary, and drdb01 will now be the secondary node.

Test 2: Testing secondary node failure

To see what happens when the secondary server goes offline:
Shutdown your secondary node, which in this case, is drbd02:

[root@drbd02 ~]# shutdown -h now

Now, back on the primary node drbd01, add a few files to the volume:

[root@drbd01 ~]# mkdir -p /data/test/
[root@drbd01 ~]# cp /etc/hosts /data/test/

Power back on the secondary node drbd02, and watch the system sync back up. Note, depending on how much data was written, it may take a bit of time for the volumes to become consistent again. You can check the status with:

[root@drbd01 ~]# cat /proc/drbd

Test 3: Testing primary node failure

This tests what happens when primary node goes offline, and someone promotes the secondary node before the primary comes online and can be demoted (split-brain).

If you want to simulate this worst case scenario, and you don’t care about your data, then perform the following:

[root@drbd01 ~]# echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
[root@drbd01 ~]# reboot -f -n

Or just shutdown drbd01 (primary), and then log into drbd02 (secondary), and promote it to master:

[root@drbd02 ~]# drbdadm primary cent00
[root@drbd02 ~]# mkdir /data
[root@drbd02 ~]# mount -t ext4 /dev/drbd0 /data

Then boot drbd01 again and enjoy the split-brain scenario! For obvious reasons, do NOT do this on drives containing any data you need for anything! If the primary node loses the replication link, and you made the other node primary BEFORE connectivity is restored, you WILL have split-brain. Avoid that at all costs.

Test 4: Recovering from split-brain

In the event of split-brain, you may be able to correct it by performing the following, but do not do this blindly! Make sure you understand what this is doing before you run it on your production data, otherwise you may lose data you wanted! More information can be found at http://drbd.linbit.com/docs/working/

For reference:
– drbd01 : Primary node
– drbd02 : Secondary node

On secondary node

[root@drbd02 ~]# drbdadm secondary cent00
[root@drbd02 ~]# drbdadm -- --discard-my-data connect cent00

And back on the primary node

[root@drbd01 ~]# drbdadm connect cent00
[root@drbd01 ~]# cat /proc/drbd