DRBD with OCFS2
The ebuild sys-fs/ocfs2 was removed from main portage tree.
Introduction
This is a guide for DRBD with OCFS2.
The tough part came with finding that Samba Active Directory (AD) Domain Controller (DC) sysvol is not being replicated between DC – a current Samba4 known limitation.
Dropping the sysvol though is a bad idea. Without CTDB, files would corrupted when accessed twice simultaneously, however CTDB is not compatible with Samba AD.
With this guide, the OCFS2 does not have a DLM, required by Corosync, cman, openais and pacemaker. I've yet to have a proper configuration and thus the current way is improper. Also it will not run with a shared lock on CTDB.
Q: Why DRBD only is not enough...
There might be case in which there is multiple write access to the block device by multiple servers, thus without cluster files system we will have trouble on file locking etc. due to the fact that DRBD volumes are being mounted by two or more servers, so OCFS2 is selected for its performance and also its popularity.
You can use always use other cluster files system.
This guide covers only a two node, dual primary configuration. If you required more nodes, please read up on DRBD for more information.
The Samba team does not suggest using this solution, so this is purely experimental. But still, other DRBD + OCFS2 examples are available to assist in other configurations.
Installation
Kernel
File systems --->
[*] OCFS2 file system support
[*] O2CB Kernelspace Clustering
[*] OCFS2 statistics
[*] OCFS2 logging support
[ ] OCFS2 expensive checks
<*> Distributed Lock Manager (DLM)
Device Drivers --->
[*] Block devices --->
[*] DRBD Distributed Replicated Block Device support
[ ] DRBD fault injection
USE flags
USE flags for sys-cluster/drbd-utils mirror/replicate block-devices across a network-connection
Emerge
Install the following packages:
root #
emerge --ask sys-cluster/drbd-utils sys-fs/ocfs2-tools
The DRBD 8.4 guide suggests the use of heartbeat and pacemaker, but I think that was just too complicated. You are welcome to try and add on an option with heartbeat and pacemaker.
DRBD Configuration
We will now configure DRBD.
DRBD version check and prompt are a bit annoying as it will suggest you to upgrade every time you execute a command.
Please check the DRBD User’s Guide Ver 8.4[1] on their website it is well written with examples.
This is a Global DRBD configuration.
It will need to change according to your DRBD need.
Please check the on the manual for more info.
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification
}
common {
handlers {
pri-on-incon-degr "/usr/lib64/drbd/notify-pri-on-incon-degr.sh; /usr/lib64/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib64/drbd/notify-pri-lost-after-sb.sh; /usr/lib64/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib64/drbd/notify-io-error.sh; /usr/lib64/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
# fence-peer "/usr/lib64/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib64/drbd/notify-split-brain.sh root";
out-of-sync "/usr/lib64/drbd/notify-out-of-sync.sh root";
# before-resync-target "/usr/lib64/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target /usr/lib64/drbd/unsnapshot-resync-target-lvm.sh;
outdate-peer "/sbin/kill-other-node.sh";
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
}
options {
# cpu-mask on-no-data-accessible
}
disk {
# size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
# disk-drain md-flushes resync-rate resync-after al-extents
# c-plan-ahead c-delay-target c-fill-target c-max-rate
# c-min-rate disk-timeout
fencing resource-and-stonith;
}
net {
# protocol timeout max-epoch-size max-buffers unplug-watermark
# connect-int ping-int sndbuf-size rcvbuf-size ko-count
# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
# after-sb-1pri after-sb-2pri always-asbp rr-conflict
# ping-timeout data-integrity-alg tcp-cork on-congestion
# congestion-fill congestion-extents csums-alg verify-alg
# use-rle
# protocol C;
# allow-two-primaries yes;
}
}
DRBD manual doesn't recommended to set the allow-two-primaries option to yes until the initial resource synchronization has completed.
By configuring auto-recovery policies, you are effectively configuring automatic data-loss! Be sure you understand the implications. - By DRBD Manual
Please create the files below.
If you have multiple resource setup, you can do it one resource per files to avoid confusion.
Below files is a 2 Node Resource example. Note on the Comment Part, they will need to be uncommented later when we enable dual primary.
resource r0 {
handlers {
split-brain "/usr/lib64/drbd/notify-split-brain.sh root";
}
startup {
# become-primary-on both;
}
on serverNode01 {
device /dev/drbd1;
disk /dev/sda5;
address 192.168.11.23:7789;
meta-disk internal;
}
on serverNode02 {
device /dev/drbd1;
disk /dev/sda5;
address 192.168.11.27:7789;
meta-disk internal;
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
protocol C;
# allow-two-primaries yes;
}
}
In plain English, we will use /dev/sda5 on serverNode01 (IP 192.168.11.23) and /dev/sda5 on serverNode02 (IP 192.168.11.27) to create a drbd node /dev/drbd1 respectively on both server. Both drbd nodes will store the metadata internally.
Name | Description |
---|---|
resource r0 | The DRBD resource named r0 |
on serverNode01 on serverNode01 |
The Server Name for each node for your reference. |
device /dev/drbd1; | The device node which will be created by DRBD that need to be mounted. |
disk /dev/sda5; | The physical partition that DRBD will be using to store the information. |
address 192.168.11.23:7789; address 192.168.11.27:7789; |
The server ip address for DRBD will be used to sync between all nodes, you will have to assign this ip to the interface. Also DRBD does not support multiple ip address. Use Channel Bonding/Teaming like LACP. |
meta-disk internal; | This tells DRBD where to store the metadata; it can be external as well. |
OCFS2 Configuration
We will now continue with OCFS2 setup.
Unfortunately ocfs2 configuration doesn't include the cluster files example and thus you will need to copy the files below as an example.
Please change as necessary to suit your needs:
cluster:
node_count = 2
name = ocfs2cluster
node:
number = 1
cluster = ocfs2cluster
ip_port = 7777
ip_address = 192.168.11.23
name = serverNode01
node:
number = 2
cluster = ocfs2cluster
ip_port = 7777
ip_address = 192.168.11.27
name = serverNode02
Name | Descriptions |
---|---|
cluster: | OCFS2 cluster declaration |
node_count = 2 | How many nodes do we have in this cluster? |
name = ocfs2cluster | The ocfs2 cluster name |
node: | The Node declaration |
number = 1 number = 2 |
The node number, it must be different on all nodes |
cluster = ocfs2cluster | The name of the cluster |
ip_port = 7777 | The ocfs2 node service port, you can change it if you want to. |
ip_address = 192.168.11.23 ip_address = 192.168.11.27 |
The ocfs2 node ip address. Please change this to your interface ip |
name = serverNode01 name = serverNode02 |
The node servername for your reference. |
Now it is time to change /etc/conf.d/ocfs2 files.
There are a lot of configurable parameters there but we will only make one change.
OCFS2_CLUSTER
and change it to ocfs2 cluster name.
OCFS2_CLUSTER="ocfs2cluster"
It is also time we add the required mount to /etc/fstab. Please add the following.
# Needed by ocfs2
none /sys/kernel/config configfs defaults 0 0
none /sys/kernel/dlm ocfs2_dlmfs defaults 0 0
# Our DRBD and OCFS2 mount
/dev/drbd1 /ocfs2cluster/ ocfs2 _netdev,nointr,user_xattr,acl 0 0
DRBD Initialization and Setup
After we have all this.
It is time to initial DRBD
Run this command on both node.
root #
drbdadm create-md r0
DRBD module version: 8.4.3 userland version: 8.4.2 you should upgrade your drbd tools! Writing meta data... initializing activity log NOT initializing bitmap New drbd meta data block successfully created. success
Let us up the DRBD devices.
Run this command on both node.
root #
drbdadm up r0
DRBD module version: 8.4.3 userland version: 8.4.2 you should upgrade your drbd tools!
When both nodes are up we can check on the BRDB status.
root #
cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101) built-in 1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----s ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:2026376
When Both node is up, we will see something...
The Secondary/Unknown become Secondary/Secondary.
root #
cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101) built-in 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:2026376
We now have the DRBD device.
I do have a question, should we format this partition to OCFS2 at this time then only sync it or not?
Please try and let me know.
Lets start to synchronize both DRBD nodes by making it primary
We should only run this command on one node only
root #
drbdadm primary --force r0
DRBD module version: 8.4.3 userland version: 8.4.2 you should upgrade your drbd tools!
Let us check the synchronization status
root #
cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101) built-in 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n- ns:313236 nr:0 dw:0 dr:315032 al:0 bm:19 lo:5 pe:2 ua:7 ap:0 ep:1 wo:f oos:1715080 [==>.................] sync'ed: 15.6% (1715080/2026376)K finish: 0:00:44 speed: 38,912 (38,912) K/sec
Please wait until it is done.
root #
cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101) built-in 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:2026376 nr:0 dw:0 dr:2027040 al:0 bm:124 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
Let make 2nd Node primary
We should only run this command the 2nd node only
root #
drbdadm primary --force r0
DRBD module version: 8.4.3 userland version: 8.4.2 you should upgrade your drbd tools!
Let start the DRBD services
root #
/etc/init.d/drbd start
Yes, our DRBD cluster are done. Now lets continue to the OCFS2 Setup
OCFS2 Setup
The 1st things that we need to do are to mount the required kernel config and dlm that is needed by OCFS2. Since that we already modified the /etc/fstab. You can simple follow the command below.
root #
mount /sys/kernel/config
root #
mount /sys/kernel/dlm
We should now start the OCFS2 Cluster
root #
/etc/init.d/ocfs2 start
* Starting OCFS2 cluster * - ocfs2cluster...
Now let us make the DRBD cluster run OCFS2
You should create more node than what you need on OCFS in case when more server need to access this OCFS2 cluster.
root #
mkfs.ocfs2 -N 8 -L "ocfs2cluster" /dev/drbd1
mkfs.ocfs2 1.8.2 Cluster stack: classic o2cb Label: ocfs2cluster Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super xattr indexed-dirs refcount discontig-bg Block size: 4096 (12 bits) Cluster size: 4096 (12 bits) Volume size: 2075009024 (506594 clusters) (506594 blocks) Cluster groups: 16 (tail covers 22754 clusters, rest cover 32256 clusters) Extent allocator size: 4194304 (1 groups) Journal size: 67108864 Node slots: 8 Creating bitmaps: done Initializing superblock: done Writing system files: done Writing superblock: done Writing backup superblock: 1 block(s) Formatting Journals: done Growing extent allocator: done Formatting slot map: done Formatting quota files: done Writing lost+found: done mkfs.ocfs2 successful
Enable Dual-Primary Mode
Enabling Dual-Primary mode is easy, just uncomment the resources files we have made on top.
resource r0 {
handlers {
split-brain "/usr/lib64/drbd/notify-split-brain.sh root";
}
startup {
become-primary-on both;
}
on serverNode01 {
device /dev/drbd1;
disk /dev/sda5;
address 192.168.11.23:7789;
meta-disk internal;
}
on serverNode02 {
device /dev/drbd1;
disk /dev/sda5;
address 192.168.11.27:7789;
meta-disk internal;
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
protocol C;
allow-two-primaries yes;
}
}
Then it's time to make changes on both nodes, in sequence primary then only secondary.
root #
drbdadm adjust r0
root #
drbdadm primary r0
Check if drbd are now dual primary
root #
cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101) built-in 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:535088 dw:535088 dr:928 al:0 bm:131068 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
We can now mount it the OCFS2 cluster on both node.
root #
mount /ocfs2cluster/
Adding both DRBD and OCFS2 on system boot
Let us start both DRBD and OCFS2 when system boot up.
root #
rc-update add drbd
root #
rd-update add ocfs2
Manual Split Brain Recovery
Summary from [2] Under certain condition, Split Brain will happen and cannot be recover automatically by system. And we have to manually recovery the DRBD disk.
Split Brain Victim, Choose a node upon which the changes will be discarded
We will disconnect this node, make it secondary and connect it back to the drbd array and we will discard all the differences so that it will sync with the latest version from other node.
root #
drbdadm disconnect r0
root #
drbdadm secondary r0
root #
drbdadm connect --discard-my-data r0
Split Brain Survivor, Choose The node upon which we have the latest version of files
We just connect this back to the drbd array so the secondary will sync from here.
root #
drbdadm connect r0
Result
Sync should happen and we should have our array back. What if that don't happen.
Troubleshooting and error recovery[3]
See also
References
- ↑ The DRBD User’s Guide Ver 8.4, The DRBD User’s Guide
- ↑ Manual Split Brain Recovery Ver 8.4
- ↑ Troubleshooting and error recovery Ver 8.4