HA集群配置

HA集群配置

HA 即 (high available)高可用,又被叫做双机热备,用于关键性业务。 简单理解就是,有两台机器A和B,正常是A提供服务,B待命闲置,当A宕机或服务宕掉,会切换至B机器继续提供服务。常用实现高可用的开源软件有heartbeat和keepalived,其中keepalived有负载均衡的功能。image
如图所示为一个HA架构,一个交换机下面有两台机器web1和web2,其中web1为主节点,正常是它在提供服务,web2为备用闲置节点。web1和web2中间有一个心跳线,检查对方是否存活状态。流动IP,也叫vip是对外提供服务的IP,正常情况下是配置在web1上的,当web1宕机后,web2会自动配置该vip,对外提供服务。

下面我们用heartbeat来做HA集群,并且把Nginx服务作为HA对应的服务。

准备工作:
两个机器,都是CentOS6.5 网卡eth0 ip如下:

master 192.168.0.161
slave  192.168.0.162
1. 设置hostname (主从上都要进行)
1
2
3
4
5
6
# 在主上
hostname master
bash
#在从上
hostname slave
bash
2. 关闭防火墙 (主从上都要进行)
1
2
3
4
5
6
# iptables
iptables -F
service iptables save
# selinux
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
3. 配置hosts (主从上都要进行)
1
2
3
4
vim /etc/hosts
# 加入:
192.168.0.161 master
192.168.0.162 slave
4. 安装epel扩展源 (主从上都要进行)
1
yum install -y epel-release
5. 安装heartbeat libnet Nginx
1
yum install -y heartbeat* libnet nginx
6. 主(master)上配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
cd /usr/share/doc/heartbeat-3.0.4/
cp authkeys ha.cf haresources /etc/ha.d/
cd /etc/ha.d
vi authkeys //加入或更改为

auth 3
3 md5 Hello!

chmod 600 authkeys
vi haresources //加入

master 192.168.0.150/24/eth0:0 nginx

vi ha.cf //改为如下内容:
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 60
udpport 694
ucast eth0 192.168.0.162
auto_failback on
node master
node slave
ping 192.168.0.1
respawn hacluster /usr/lib/heartbeat/ipfail

配置说明:

  • debugfile /var/log/ha-debug:保存heartbeat的调试信息;
  • dlogfile /var/log/ha-log:heartbeat日志信息;
  • dlogfacility local0:日志级别;
  • dkeepalive 2:跳时间间隔;
  • ddeadtime 30:超出该时间间隔未收到对方节点心跳,则认为对方已死亡;
  • dwarntime 10:超出该时间间隔未收到对方节点心跳,则发出警告并记录到日志;
  • dinitdead 60:在某些系统上,系统启动或重启之后需要经过一段时间网络才能恢复正常工作,该选项用于解决这种情况产生的时间间隔;
  • dudpport 694:设置广播通信使用的端口,649为默认端口;
  • ducast eth0 192.168.0.162:设置对方奇迹心跳检测的网卡和IP;
  • dauto_failback on:heartbeat的两台之极分别为主节点和从节点,主节点在正常情况下占用资源并运行所有服务,遇到故障时把资源交给从节点并由从节点运行服务;
  • dnode master:指定主;
  • dnode slave:指定从;
  • dping 192.168.0.1
  • drespawn hacluster/usr/lib/heartbeat/ipfail:指定与heartbeat一同启动和关闭的进程,该进程被自动监听视,遇到故障则从新启动。最常见的进程是ipfail,该进程用于检测和处理网络故障,需要配合ping语句指定pingnode来检测网络连接。如果你的系统是64位,请注意该文件路径
7. 把主上的三个配置拷贝到从上
1
2
cd /etc/ha.d/
scp authkeys ha.cf haresources slave:/etc/ha.d/
8. 到从上(slave) 编辑ha.cf
1
2
vi  /etc/ha.d/ha.cf   //只需要更改一个地方
ucast eth1 192.168.0.162 改为 ucast eth1 192.168.0.161
9. 启动heartbeat

先主后从

1
service heartbeat start
10. 检查测试
1
ifconfig

看是否有 eth0:0

1
ps aux |grep nginx

看是否有nginx进程

11. 测试1

主上故意禁ping

1
iptables -I INPUT -p icmp -j DROP

可以看到日志/var/log/ha-log 发生如下变化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
ResourceManager(default)[1751]: 2017/03/28_23:01:07 info: Running /etc/init.d/nginx  start
Mar 28 23:06:49 master heartbeat: [1543]: WARN: node 192.168.0.1: is dead
Mar 28 23:06:49 master heartbeat: [1543]: info: Link 192.168.0.1:192.168.0.1 dead.
Mar 28 23:06:49 master ipfail: [1571]: info: Status update: Node 192.168.0.1 now has status dead
harc(default)[2053]: 2017/03/28_23:06:49 info: Running /etc/ha.d//rc.d/status status
Mar 28 23:06:51 master ipfail: [1571]: info: NS: We are dead. :<
Mar 28 23:06:51 master ipfail: [1571]: info: Link Status update: Link 192.168.0.1/192.168.0.1 now has status dead
Mar 28 23:06:52 master ipfail: [1571]: info: We are dead. :<
Mar 28 23:06:52 master ipfail: [1571]: info: Asking other side for ping node count.
Mar 28 23:06:55 master ipfail: [1571]: info: Giving up because we were told that we have less ping nodes.
Mar 28 23:06:55 master ipfail: [1571]: info: Delayed giveup in 4 seconds.
Mar 28 23:06:59 master ipfail: [1571]: info: giveup() called (timeout worked)
Mar 28 23:06:59 master heartbeat: [1543]: info: master wants to go standby [all]
Mar 28 23:07:00 master heartbeat: [1543]: info: standby: slave can take our all resources
Mar 28 23:07:00 master heartbeat: [2079]: info: give up all HA resources (standby).
ResourceManager(default)[2092]: 2017/03/28_23:07:00 info: Releasing resource group: master 192.168.0.150/24/eth0:0 nginx
ResourceManager(default)[2092]: 2017/03/28_23:07:00 info: Running /etc/init.d/nginx stop
ResourceManager(default)[2092]: 2017/03/28_23:07:00 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.150/24/eth0:0 stop
IPaddr(IPaddr_192.168.0.150)[2178]: 2017/03/28_23:07:00 INFO: IP status = ok, IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.150)[2152]: 2017/03/28_23:07:00 INFO: Success
Mar 28 23:07:00 master heartbeat: [2079]: info: all HA resource release completed (standby).
Mar 28 23:07:00 master heartbeat: [1543]: info: Local standby process completed [all].
Mar 28 23:07:02 master heartbeat: [1543]: WARN: 1 lost packet(s) for [slave] [197:199]
Mar 28 23:07:02 master heartbeat: [1543]: info: remote resource transition completed.
Mar 28 23:07:02 master heartbeat: [1543]: info: No pkts missing from slave!
Mar 28 23:07:02 master heartbeat: [1543]: info: Other node completed standby takeover of all resources.
12. 测试2

主上停止heartbeat服务

1
service heartbeat stop