木鸟杂记 - 大规模数据系统

6.824 - Raft Implementation (Part 1): Leader Election

Posted on 2018-07-11 In Distributed Systems , Raft

Overview

Recording some thoughts and experiences from implementing the 6.824 Lab 2 Raft, for future reference.

Lab Overview

6.824 is a distributed systems course at MIT. I followed the 2018 spring offering. The second lab requires implementing a simple distributed consensus protocol—Raft.

This is a protocol specifically designed to facilitate teaching and engineering implementation. It breaks the protocol down into several relatively independent modules—leader election, log replication, and safety guarantees. Figure 2 in the paper essentially gives all the implementation details of Raft, truly every word is a pearl. But because it is so concise and profound, some state transitions are scattered across different descriptions. If you truly implement it based only on that figure, it’s easy to miss some details.

Data Structures and Algorithms (II): Binary Search

Posted on 2018-03-11 In Data Structures

Overview

My previous understanding of binary search was limited to finding a given integer in a sorted array. Later, I discovered that a whole class of problems can be solved using the binary search mindset. To generalize: if there exists a monotonic (mapping) relationship between the set where the desired result lies (the value range) and the set of numbers being searched (the domain), then the problem can be solved using the binary search idea. This sounds a bit abstract; I will illustrate it with several examples below.

Thanks to its efficiency of halving the problem size with each iteration, the binary search mindset has an extremely wide range of applications.

This article is divided into two major parts. The first part discusses the various details of binary search; the second part extends binary search to the general binary search mindset.

Hadoop Source Code Reading — MapReduce (Part 1): Basic Concepts and Interfaces

Posted on 2018-02-24 In Source Code Reading , Hadoop

Overview

Sorting out some basic interfaces and classes involved in the MapReduce framework.

'Data Structures and Algorithms (I): Non-Recursive Binary Tree Traversal'

Posted on 2018-02-24 In Data Structures

Overview

Recently, I’ve been thinking about some ideas and corresponding implementations for non-recursive tree traversal. I’m writing them down here for future reference.

Android Learning Notes (Part 2): Code Organization

Posted on 2017-12-12 In APP Development

Problem

When modularizing, I tried to separate modules like SearchListener and CallManager from MainActivity. However, when responding to events, it was inevitable to change the state of other resources, so it was necessary to obtain their handles. Thus, MainActivity also needed to be passed in as a handle.

Android Study Notes (1): Implementing the Search Box

Posted on 2017-12-02 In App Development

Goal

Enter keywords and display search results in real time.

Some Design Patterns

Posted on 2017-08-03 In programming language

When writing programs, at a small scale, one cannot yet feel the importance of design patterns. Once the scale grows and requirements iterate, a project that applies appropriate design patterns can always iterate the fastest at the lowest cost.
But a strange point is that I can never remember the names of the specific implementations corresponding to the design patterns, yet I never forget the design ideas behind them — depend on abstractions rather than concretions; open for extension, closed for modification;

Hadoop Source Code Reading — DFS (Part 3): FileSystem

Posted on 2017-07-23 In Source Code Reading , Hadoop

FileSystem

FileSystem is an abstract base class that provides some common methods for LocalFileSystem and DistributedFileSystem. It maintains all used file systems via a HashMap: name -> filesystem, whose key is either “Local” or “Host:Port” (identifying a NameNode).
It inherits from the Configured class and can load some basic parameters via configuration, saved in Configuration.
To improve reliability, it generates a checksum for each file and saves it in a hidden file with the extension .*.crc.

Some Interesting Details

Posted on 2017-07-15 In Programming Language , Java

There are many interesting details in programming. When I come across them, I jot them down here.

`|` Simplifies Judgment

When a bunch of numbers are bitwise ORed together, if more than one number is negative, the result is negative.

public void write(byte b[], int off, int len) throws IOException {
   if ((off | len | (b.length - (len + off)) | (off + len)) < 0)
       throw new IndexOutOfBoundsException();

   for (int i = 0 ; i < len ; i++) {
       write(b[off + i]);
   }
}

from: FilterOutputStream

'Hadoop Source Code Reading: DFS (Part 2): DataNode'

Posted on 2017-07-11 In Source Code Reading , Hadoop

The previous post gathered some miscellaneous small classes together into one article. This post intends to read through the relatively long class DataNode.
Each DataNode represents a data node, corresponding to a folder on a machine. Essentially, it is a collection of a certain number of Blocks, capable of communicating with NameNode, clients, and other DataNodes to operate on this Block collection, mainly including client reads and writes, replication of blocks from other DataNodes, and responding to NameNode operations such as deletion.
Specifically, in terms of implementation: in data structure, it maintains a table mapping blocks to byte arrays; at runtime, the DataNode internally runs an infinite loop, continuously querying NameNode, reporting status (heartbeats), and executing commands (RPC).

Status information. [DataNodeInfo](/hadoop-source-DFS#datanode-info): total size, remaining size, last update time.
Execute commands.
- Client reads and writes Blocks
- Have other DataNodes replicate Blocks
- Delete certain Blocks

In addition, DataNode also maintains a Server Socket to handle requests from Clients or other DataNodes. DataNode will submit its externally exposed host:port to NameNode, which will then forward this information to other relevant DataNodes or clients.
(Excerpted from class comments)

Hadoop Source Code Reading: DFS (Part 1) — Some Basic Classes

Posted on 2017-07-02 Edited on 2017-07-03 In source code reading , Hadoop

I plan to spend about a month thoroughly reading the Hadoop 0.1.0 source code, writing as little fluff as possible and recording as many thoughts as possible.

Let’s start with the Distributed File System (DFS).
DFS, or Distributed File System, aggregates a set of files stored at predefined locations on multiple machines as storage building blocks, implements some distributed operations on this foundation, and thereby exposes a basic file read/write API to the outside.

Debugging and Running Hadoop-0.1.0 Code

Posted on 2017-06-29 Edited on 2017-06-27 In Source Code Reading , Hadoop

Previously, I ambitiously downloaded the Hadoop source code from GitHub, aiming to read through it to broaden my horizons, and even wanted to write a simplified version myself.
Unexpectedly, reading the code felt like chewing wax. After studying the basic RPC, I put it aside.

Overview

Lab Overview

Overview

Overview

Overview

Problem

Goal

FileSystem

| Simplifies Judgment

`|` Simplifies Judgment