Introduction
In 2018, I worked on some 6.824 labs; my old notes are here. Unfortunately, I got stuck at Part 2C and could never pass the tests, so I put it aside. But in the days that followed, I often thought about this legendary course with a tinge of sadness. Putting it off until today, I can finally come back to fulfill that wish.
This time, there were three reasons I could pass all the tests: first, having done it once before, many of the principles were still fresh in my mind; second, over the past year or so I had gained a lot of hands-on experience with distributed systems at work; and third, my Go skills had improved somewhat. But during the process, I still encountered a great many vexing details. To make it easier to review them later, I’ve organized these details and recorded them here. If they happen to be of the slightest help to others working on this course, that would be another happy occasion.
6.824 and Raft
6.824 is an excellent open course on distributed systems. While doing the labs, I was constantly amazed by its exquisite design and the thoroughness of its materials. That the masters at MIT would share such an essential course with the world truly reflects the character of a great university and its scholars, and is a great fortune for us computer scientists.
Raft is a consensus protocol designed for understandability. Distributed consensus is a very, very classic problem in the distributed field, and also one of the hardest parts of distributed systems. Intuitively speaking, it’s like laying the foundation for a skyscraper on quicksand. Unreliable networks and failure-prone hosts create state changes so complex that they are truly beyond the ability of ordinary people to simulate in their heads. I am rather dull, and can only achieve some understanding through intuitive grasp plus accumulation of details. Returning to Raft, with Paxos as a predecessor in the same field, how did Raft still manage to stand out? I think it comes down to two key points:
- Easy to understand. Paxos is notoriously difficult to understand, and therefore hard to bring into common use. Raft reduces the dimensionality of algorithmic complexity by decoupling it into multiple modules, greatly lowering the difficulty of understanding for ordinary people. In addition, Raft has many elegant designs that avoid introducing complexity as much as possible, further reducing the mental burden.
- Easy to implement. Being easy to understand objectively leads to ease of implementation, but that doesn’t automatically mean you can produce an excellent system from it. If understanding remains at the intuitive level, implementation becomes a castle in the air. The brilliance of the Raft paper is that it has both an intuitive grasp and a detailed organization—it is almost a system design document, and a detailed one at that.
To do well on this lab, you need to consult a large amount of material. I’ve summarized what the lab mentioned and what I found at the end of this article. Of course, there is also the English barrier. Although I eventually passed all the test cases, there are still many points I didn’t implement well and many things I don’t fully understand.
Note: Later, in 2023, I did it again, and finally figured out most of the points.