? ? ? 在實際生產開發中,遇到一些多節點共存,需要選主,并且要實現HA自動容錯的場景,思考了寫方法拿出來和大家分享一下。
?
- Lease協議,Mysql ACID
- 高可用選主方案設計
- 適用場景
- Java語言實現描述
- 進一步優化
? ? ? 系統中有很多應用場景要類似主從架構,主服務器(Master)對外提供服務,從服務器(Salve)熱備份,不提供服務但隨時活著,如果Master出現宕機或者網絡問題,Slave即可接替Master對外服務,并由Slave提升為Master(新主)。典型的多節點共存, 但只能同時存在一個主,并且所有節點的狀態能統一維護 。
? ? ? 大家一定首先想到了著名的Paxos算法( http://baike.baidu.com/view/8438269.htm )。簡單的說,Paxos通過每個節點的投票算法,來決議一個事情,當多余1/2個節點都投票通過時,Paxos產生一個唯一結果的決議,并通知各個節點維護這個信息。例如Paxos的選主,首先產生一個關于某個節點希望當Master的投票,然后各個節點給出反饋,最終Paxos集群維護唯一的Master的結論。Zookeeper就是Paxos的一種實現。這種場景最適合用zookeeper來選主, 但zookeeper有個明顯的缺點,當存活的節點小于zookeeper集群的1/2時,就不能工作了。比如zk有10各節點,那么必須滿足可用的節點大于5才可。
? ? ? 在實際環境中,如果對Master要求不是那么嚴格的話,可以通過某些改進和取舍來達到目的。比如可能在秒級別允許Master暫時不能訪問、選主時間內可能存在一定的沖突但通過再次選主即可。本人設計了一個簡易的利用Mysql一致性和簡易版Lease來workaround。
Mysql ACID保證了一條數據記錄的一致性、完整性,不會出現多進程讀寫的一致性問題和唯一正確性。Lease協議(協議細節可以Google之)通過向Master發送一個lease(租期)包,Master在這個lease期之內充當主角色,如果lease期到了則再次去申請lease,如果lease期到了,但是網絡除了問題,這時Master可以i主動下線,讓其他節點去競選Master。舉個例子,三個節點A、B、C經過第一輪選主之后,A成為Master,它獲得了10秒的lease,當前時間假設是00:00:00,那么它Master地位可以用到00:00:10,當時間到達00:00:10時,A、B、C會重新進行Master選舉,每個節點都有可能成為Master(從工程的角度觸發,A繼續為Master的概率更大),如果這時候A的網絡斷了,不能聯通B、C的集群了, 那么A會自動下線,不會去競爭,這樣就不會出現“腦裂”的現象。
? ? ??
? ? ? ?---------------------------------------------- 華麗的分割線 ----------------------------------------------
? ? ??
? ? ? ? 設計方案如下:(server代表集群中的一臺機器,也可看作一個進程,server之間是平等的)
?
-
各個server之間用ntpserver時間同步(保證服務器之間秒級同步即可)
-
各個server持有一個唯一ID號(ip+進程號),通過此id唯一標識一個server實例
-
各個server定義一個lease租期,單位為秒
-
Mysql唯一表唯一一條記錄維護全局Master的信息,ACID保證一致性
-
Master Server每半個lease期向Mysql更新如上的唯一一條記錄,并更新心跳,維護Master狀態
- Slaver Server每半個lease周期從mysql獲取Master Server信息,如果數據庫中Master的Lease超過了當前時間(heartbeat_time+ lease > current_time),則申請當Master。
?
? ? ? 這其中比較棘手的問題是:
? ? ? ? 1、由于數據庫訪問和休眠的時間(lease的一半),有時延的存在,要處理Mysql異常、網絡異常。
? ? ? ? 2、可能存在同時搶占Master的server,這個時候就需要一個驗證機制保證為搶到Master的server自動退位為Slaver
? ? ? 下面給出圖實例 :(10.0.0.1為Master)
? ? ?10.0.0.1 crash了。mysql中維護的10.0.0.1的主信息已過期,其他節點去搶占
? ? ? 各個節點再次讀取數據庫,查看是否是自己搶占成功了:
之后,10.0.0.3作為Master對外服務。此時如果10.0.0.1重啟,可作為Slaver。如果10.0.0.1因為網絡分化或者網絡異常而不能維護心跳,則在超過自身lease時自動停止服務,不會出現“雙Master”的現象。
? ? ? 每個Server遵循如下流程:
? ? ? ? 數據庫設計:
? ? ? ? 某一時刻,數據庫中Master的信息:
?
?
? ? ? ?當前時間: 45分15秒
? ? ? ?當前Master Lease :6秒
? ? ? ?當前Master Lease可用到: 45分21秒
??
? ? ? ?---------------------------------------------- 華麗的分割線?----------------------------------------------
? ? ? ?3、適用的場景
? ? ? ? 一 、生命周期內可使用Mysql、并且各個server之間時間同步。
? ? ? ? 二、需要集群中選出唯一主對外提供服務,其他節點作為slaver做standby,主lease過期時競爭為Master
? ? ? ? 三、對比zookeeper,可滿足如果集群掛掉一半節點,也可正常工作的情況,比如只有一主一備。
? ? ? ? 四、允許選主操作在秒級容錯的系統,選主的時候可能有lease/2秒的時間窗口,此時服務可能不可用。
?
? ? ? ? 五、允許lease/2秒內出現極限雙Master情況,但是概率很小。
? ? ? ? ---------------------------------------------- 華麗的分割線?----------------------------------------------
? ? ? ? 4、Java語言實現描述
?
一些配置信息和時間相關、休眠周期相關的時間變量
final long interval = lease / intervalDivisor;
long waitForLeaseChallenging = 0L;
lease = lease / 1000L;
long challengeFailTimes = 0L;
long takeRest = 0L;
long dbExceptionTimes = 0L;
long offlineTime = 0L;
Random rand = new Random();
Status stateMechine = Status.START;
long activeNodeLease = 0L;
long activeNodeTimeStamp = 0L;
? ? ? ? 數據庫異常的處理:
?
?
KeepAlive keepaliveNode = null;
try {
/* first of all get it from mysql */
keepaliveNode = dbService.accquireAliveNode();
if (stateMechine != Status.START && keepaliveNode==null)
throw new Exception();
// recount , avoid network shake
dbExceptionTimes = 0L;
} catch (Exception e) {
log.fatal("[Scanner] Database Exception with times : " + dbExceptionTimes++);
if (stateMechine == Status.OFFLINE) {
log.warn("[Scanner] Database Exception , OFFLINE ");
} else if (dbExceptionTimes >= 3) {
log.fatal("[Scanner] Database Exception , Node Offline Mode Active , uniqueid : " + uniqueID);
stateMechine = Status.OFFLINE;
dbExceptionTimes = 0L;
offlineTime = System.currentTimeMillis();
online = false;
} else
continue;
}
? ? ? ? 總的循環和狀態機的變遷:
?
?
while (true) {
SqlSession session = dbConnecction.openSession();
ActionScanMapper dbService = session.getMapper(ActionScanMapper.class);
KeepAlive keepaliveNode = null;
try {
/* first of all get it from mysql */
keepaliveNode = dbService.accquireAliveNode();
if (stateMechine != Status.START && keepaliveNode==null)
throw new Exception();
// recount , avoid network shake
dbExceptionTimes = 0L;
} catch (Exception e) {
log.fatal("[Scanner] Database Exception with times : " + dbExceptionTimes++);
if (stateMechine == Status.OFFLINE) {
log.warn("[Scanner] Database Exception , OFFLINE ");
} else if (dbExceptionTimes >= 3) {
log.fatal("[Scanner] Database Exception , Node Offline Mode Active , uniqueid : " + uniqueID);
stateMechine = Status.OFFLINE;
dbExceptionTimes = 0L;
offlineTime = System.currentTimeMillis();
online = false;
} else
continue;
}
try {
activeNodeLease = keepaliveNode!=null ? keepaliveNode.getLease() : activeNodeLease;
activeNodeTimeStamp = keepaliveNode!=null ? keepaliveNode.getTimestamp() : activeNodeTimeStamp;
takeRest = interval;
switch (stateMechine) {
case START:
if (keepaliveNode == null) {
log.fatal("[START] Accquire node is null , ignore ");
// if no node register here , we challenge it
stateMechine = Status.CHALLENGE_REGISTER;
takeRest = 0;
} else {
// check the lease , wether myself or others
if (activeNodeLease < timestampGap(activeNodeTimeStamp)) {
log.warn("[START] Lease Timeout scanner for uniqueid : " + uniqueID + ", timeout : "
+ timestampGap(activeNodeTimeStamp));
if (keepaliveNode.getStatus().equals(STAT_CHALLENGE))
stateMechine = Status.HEARTBEAT;
else {
stateMechine = Status.CHALLENGE_MASTER;
takeRest = 0;
}
} else if (keepaliveNode.getUniqueID().equals(uniqueID)) {
// I'am restart
log.info("[START] Restart Scanner for uniqueid : " + uniqueID
+ ", timeout : " + timestampGap(activeNodeTimeStamp));
stateMechine = Status.HEARTBEAT;
} else {
log.info("[START] Already Exist Keepalive Node with uniqueid : " + uniqueID);
stateMechine = Status.HEARTBEAT;
}
}
break;
case HEARTBEAT:
/* uniqueID == keepaliveNode.uniqueID */
if (keepaliveNode.getUniqueID().equals(uniqueID)) {
if (activeNodeLease < timestampGap(activeNodeTimeStamp)) {
// we should challenge now , without nessesary to checkout Status[CHALLENGE]
log.warn("[HEARTBEAT] HEART BEAT Lease is timeout for uniqueid : " + uniqueID
+ ", time : " + timestampGap(activeNodeTimeStamp));
stateMechine = Status.CHALLENGE_MASTER;
takeRest = 0;
break;
} else {
// lease ok , just update mysql keepalive status
dbService.updateAliveNode(keepaliveNode.setLease(lease));
online = true;
log.info("[HEARTBEAT] update equaled keepalive node , uniqueid : " + uniqueID
+ ", lease : " + lease + "s, remain_usable : " +
((activeNodeTimeStamp * 1000L + lease * 1000L) - System.currentTimeMillis()) + " ms");
}
} else {
/* It's others , let's check lease */
if (activeNodeLease < timestampGap(activeNodeTimeStamp)) {
if (keepaliveNode.getStatus().equals(STAT_CHALLENGE)) {
waitForLeaseChallenging = (long) (activeNodeLease * awaitFactor);
if ((waitForLeaseChallenging) < timestampGap(activeNodeTimeStamp)) {
log.info("[HEARTBEAT] Lease Expired , Diff[" + timestampGap(activeNodeTimeStamp) + "] , Lease[" + activeNodeLease + "]");
stateMechine = Status.CHALLENGE_MASTER;
takeRest = 0;
} else {
log.info("[HEARTBEAT] Other Node Challenging , We wait for a moment ...");
}
} else {
log.info("[HEARTBEAT] Lease Expired , Diff[" + timestampGap(activeNodeTimeStamp) + "] , lease[" + activeNodeLease + "]");
stateMechine = Status.CHALLENGE_MASTER;
takeRest = 0;
}
} else {
online = false;
log.info("[HEARTBEAT] Exist Active Node On The Way with uniqueid : "
+ keepaliveNode.getUniqueID() + ", lease : " + keepaliveNode.getLease());
}
}
break;
case CHALLENGE_MASTER:
dbService.challengeAliveNode(new KeepAlive().setUniqueID(uniqueID).setLease(lease));
online = false;
// wait for the expired node offline automatic
// and others also have changce to challenge
takeRest = activeNodeLease;
stateMechine = Status.CHALLENGE_COMPLETE;
log.info("[CHALLENGE_MASTER] Other Node is timeout["
+ timestampGap(activeNodeTimeStamp) + "s] , I challenge with uniqueid : " + uniqueID
+ ", lease : " + lease + ", wait : " + lease);
break;
case CHALLENGE_REGISTER:
dbService.registerNewNode(new KeepAlive().setUniqueID(uniqueID).setLease(lease));
online = false;
// wait for the expired node offline automatic
// and others also have changce to challenge
takeRest = activeNodeLease;
stateMechine = Status.CHALLENGE_COMPLETE;
log.info("[CHALLENGE_REGISTER] Regiter Keepalive uniqueid : " + uniqueID + ", lease : " + lease);
break;
case CHALLENGE_COMPLETE :
if (keepaliveNode.getUniqueID().equals(uniqueID)) {
dbService.updateAliveNode(keepaliveNode.setLease(lease));
online = true;
log.info("[CHALLENGE_COMPLETE] I Will be the Master uniqueid : " + uniqueID);
// make the uptime correct
stateMechine = Status.HEARTBEAT;
} else {
online = false;
log.warn("[CHALLENGE_COMPLETE] So unlucky , Challenge Failed By Other Node with uniqueid : " + keepaliveNode.getUniqueID());
if (challengeFailTimes++ >= (rand.nextLong() % maxChallenge) + minChallenge) {
// need't challenge anymore in a long time
takeRest=maxChallengeAwaitInterval;
stateMechine = Status.HEARTBEAT;
challengeFailTimes = 0L;
log.info("[CHALLENGE_COMPLETE] Challenge Try Times Used Up , let's take a long rest !");
} else {
stateMechine = Status.HEARTBEAT;
log.info("[CHALLENGE_COMPLETE] Challenge Times : " + challengeFailTimes + ", Never Give Up , to[" + stateMechine + "]");
}
}
break;
case OFFLINE :
log.fatal("[Scanner] Offline Mode Node with uniqueid : " + uniqueID);
if (System.currentTimeMillis() - offlineTime >= maxOfflineFrozen) {
// I am relive forcely
log.info("[Scanner] I am relive to activie node , uniqueid : " + uniqueID);
stateMechine = Status.HEARTBEAT;
offlineTime = 0L;
} else if (keepaliveNode != null) {
// db is reconnected
stateMechine = Status.HEARTBEAT;
offlineTime = 0L;
log.info("[Scanner] I am relive to activie node , uniqueid : " + uniqueID);
}
break;
default :
System.exit(0);
}
session.commit();
session.close();
if (takeRest != 0)
Thread.sleep(takeRest);
log.info("[Scanner] State Stage [" + stateMechine + "]");
} catch (InterruptedException e) {
log.fatal("[System] Thread InterruptedException : " + e.getMessage());
} finally {
log.info("[Scanner] UniqueID : " + uniqueID + ", Mode : " + (online?"online":"offline"));
}
}
}
enum Status {
START, HEARTBEAT, CHALLENGE_MASTER, CHALLENGE_REGISTER, CHALLENGE_COMPLETE, OFFLINE
}
?
?
?
?
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061
微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元

