木鸟杂记

大规模数据系统

6.824 - Raft Implementation (Part 1): Leader Election

Overview

Recording some thoughts and experiences from implementing the 6.824 Lab 2 Raft, for future reference.

Lab Overview

6.824 is a distributed systems course at MIT. I followed the 2018 spring offering. The second lab requires implementing a simple distributed consensus protocol—Raft.

This is a protocol specifically designed to facilitate teaching and engineering implementation. It breaks the protocol down into several relatively independent modules—leader election, log replication, and safety guarantees. Figure 2 in the paper essentially gives all the implementation details of Raft, truly every word is a pearl. But because it is so concise and profound, some state transitions are scattered across different descriptions. If you truly implement it based only on that figure, it’s easy to miss some details.

Author: Muniao’s Notes https://www.qtmuniao.com, please indicate the source when reprinting

Leader Election

This is the content of Lab 2A. To my shame, from the start of design to passing the test cases, it dragged on for more than two months. Although I only wrote code in the evenings and on weekends, progress was indeed slow. However, I gained a lot. I secretly thought that if undergraduate labs were all like this, perhaps it would really be as Zhihu says—at most two courses per semester. Looking back at my junior year with four four-credit courses, the course design was destined to make us slack off.

Rant over, let me talk about the pitfalls I stepped into.

Timers

Raft mainly has two event loops: one where (Follower, Candidate) timeouts trigger an election, and one where (Leader) sends periodic heartbeats (sometimes piggybacking log synchronization). The easiest approach to think of is the loop+sleep method used brainlessly for undergraduate course projects: an outer while true, with an inner sleep using a slightly smaller time interval (say, t) (but at least one order of magnitude smaller than electionTimeout and heartbeatInterval to be sufficient), to periodically check whether the time points (needElection, heartbeat) have arrived.

But my OCD made me feel this wasn’t accurate—the error is at least that detection interval t. With so many threads running around and facing such complex state changes, could errors caused by this inaccuracy lead to bugs? But using Go’s timer seemed quite complicated; every time I thought about this, I couldn’t figure it out, and this delayed me for a while.

Later, reminded by another kid also working on Raft, I noticed that the course had very considerately provided a suggestion—it was still to use loop+sleep to periodically check for timeouts. At that moment, everything clicked, and I realized the comment I made in parentheses above—as long as the check interval is one or two orders of magnitude smaller than the timeout interval, there is basically no problem. The advantage of this implementation is that it is simple, crude, direct, and controllable.

While implementing, I suddenly found it fun and made two slightly different implementations. The code is attached below; for clarity, both have been somewhat simplified.
One uses only a single loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
go func() {
for {
now := time.Now()
if now.Sub(last) < electionTimeout {
time.Sleep(checkGapInMs)
} else {
// At first I worried about the order of these two lines, but later figured it out
// As long as startElection has no IO or other blocking operations, the order doesn't matter
startElection()
last = time.Now()
}
}
}()

The other uses two nested loops, with the inner loop dedicated to waiting (perhaps I was reminded of CPU busy-waiting).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for {
// ping all the peers
for s := 0; s < len(rf.peers); s++ {
// append entry rpc
}(s, args)

// wait until time comes
for {
now := time.Now()
if now.Sub(lastPingTime) < pingGapInMs {
continue
} else {
lastPingTime = time.Now()
break
}
}
}

Although the former is more concise, the latter has clearer logic; sometimes concise code reuses different logic, which may make the semantics slightly unclear. Considering that the first principle of code is for humans to read (What? You say it’s for the machine? Ahem, I think that’s the prerequisite for code being code; otherwise the compiler won’t let it pass), I think the latter is better.

Locking

The course also very considerately provided a hint, but as someone who started “playing with locks” since the freshman-year elevator project, I directly skipped it, charging forward without hesitation, writing code frantically. However, memory can be deceptive, and the lesson was painful—I kept getting deadlocks without knowing where the problem was. Later, I had to obediently read the hint several times and found it was really well written…

The gist, summarized, is to first add locks everywhere global variables are read or written, and then remove locks where there is blocking (RPC calls, etc.).

I’ll set aside this principle for now and talk about two places where I fell into pits:

Holding a lock outside a function call and reacquiring it inside the function call:

The reason for this mistake is that looking at such long code, based on experience, I always wanted to wrap it up. When wrapping it up, I realized—damn, it’s all global variables, better apply for a lock quickly. But later when calling this function, I forgot to place it in the critical section; that is, eating from the bowl while also wanting to scoop from the pot. Deadlock achievement 1 get. No more nonsense, here’s the code:

1
2
3
4
5
6
7
8
9
10
a.
func (rf *Raft) startElection() {
rf.mu.Lock()
// balalala
}

b.
rf.mu.Lock()
// balala
rf.startElection()

Forgetting to release locks on branch break/return

break, continue, return—this sneaky behavior of prematurely ending branches. Although we usually do it happily, it doesn’t match people’s inherent sense of symmetry, which sometimes causes us to forget to handle it. What do I mean by asymmetry? Here’s the code:

1
2
3
4
5
6
7
8
9
10
11
12
if !check(arg) {
return
}

for condition {
if need {
break
}

// because here are so many balalala
// then break can be used to reduce indent
}

v.s.

1
2
3
4
5
6
7
8
9
10
if !check(arg) {
return
} else {
for condition {
if !need {
// because here are so many balalala
// then break can be used to reduce indent
}
}
}

With the latter, a quick glance at the if else alignment makes the number of function exits obvious at a glance. However, with the former, if branch statements are buried in massive amounts of code, it’s easy to forget that some obscure corner still hides an exit, and naturally you won’t unlock.

Specifically in my case, when the candidate asks for votes, if it gets a majority of votes, it directly becomes leader; if someone still gives votes later, I would check whether the current identity is already leader, and if so, return directly without caring about these votes—indeed quite devious… So, you know, I forgot to return the lock before returning. The code is as follows, with deletions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
go func(server int, args RequestVoteArgs) {
// use args to request vote
reply := RequestVoteReply{}
ok := rf.sendRequestVote(server, &args, &reply)

isBecomeLeader := false
rf.mu.Lock()
if rf.state != Candidate || rf.currentTerm != args.Term{
rf.mu.Unlock() //<---this is the wicked spot
return
}

// check the votes
if reply.VotedGranted {
votes++
DPrintf("%d get vote from %d, now votes are %d, total members are:%d",
rf.me, server, votes, peersCount)
if votes > peersCount/2 {
if rf.state == Candidate {
isBecomeLeader = true
DPrintf("%d become leader", rf.me)
}
rf.state = Leader
}
} else if reply.Term > rf.currentTerm {
rf.currentTerm = reply.Term
rf.state = Follower
rf.votedFor = -1
}
rf.mu.Unlock()

if isBecomeLeader {
rf.startHeartbeat()
}
}(s, args)

After Gaps, Remember to Self-Check

Because state changes back and forth across multiple loops (well, okay, just two, but why does it feel like so many?), before and after asynchronous/blocking calls, the identity role (self) and election term (external) may have already changed dramatically. Therefore, self-checking is needed.
To my shame, I only realized this after consulting various materials.

  1. Asynchronous calls, i.e., inside and outside goroutines.
  2. Blocking calls, i.e., before and after RPCs.

When the Leader performs heartbeat checks, because it is an asynchronous call, it needs to first check whether it is still the leader and whether it is still in its own term (the args saved the term at that time); when the RPC returns, perform this check again. Only if it is correct can it proceed with the next actions as the leader.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
go func(server int, args *AppendEntriesArgs) {
if rf.currentTerm != args.Term || rf.state != Leader {
return
}

reply := &AppendEntriesReply{}
rf.sendAppendEntries(server, args, reply)

rf.mu.Lock()
defer rf.mu.Unlock()

if rf.currentTerm != args.Term || rf.state != Leader {
return
}

// need handle the reply
if reply.Term > rf.currentTerm {
rf.currentTerm = reply.Term
rf.state = Follower
rf.votedFor = -1
}


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg