木鸟杂记

大规模数据系统

Hadoop Source Code Reading: DFS (Part 1) — Some Basic Classes

I plan to spend about a month thoroughly reading the Hadoop 0.1.0 source code, writing as little fluff as possible and recording as many thoughts as possible.

Let’s start with the Distributed File System (DFS).
DFS, or Distributed File System, aggregates a set of files stored at predefined locations on multiple machines as storage building blocks, implements some distributed operations on this foundation, and thereby exposes a basic file read/write API to the outside.

Author: Muniao Notes https://www.qtmuniao.com, please indicate the source when reposting

Block


blkid和len

Block is the basic unit of file storage in HDFS. It has two key attributes: blkid and len. The former is used to identify a file on the operating system, and the file name is constructed via "blk_" + String.valueOf(blkid); the latter is the length of that file in bytes.
It abstracts two fundamental dimensions of storage: starting position and size. Variables, arrays, files, and so on all follow this pattern.

Registering the Factory Method

Another interesting aspect is that all classes implementing the Writable interface register a factory method. What exactly this is used for will be filled in later.

1
2
3
4
5
6
7
static {                                      // register a ctor
WritableFactories.setFactory
(Block.class,
new WritableFactory() {
public Writable newInstance() { return new Block(); }
});
}

Serialization

Implementing Writable uses Java’s serialization interface (DataOutput) to achieve serialization and deserialization of Block’s basic fields.
Having each class to be serialized implement its own pair of serialization and deserialization functions is a commonly used basic design. When I was interning and writing a desktop program, I wanted to store some control information as XML—the idea was the same as this, but what I didn’t do well was failing to define this Writable interface as an abstraction for this behavior.

Implements the Comparable (probably so it can be compared when indexed) and Writable interfaces

BlockCommand


A wrapper for command (instruction) parameters. The command acts on a series of Blocks under a certain DataNode; there are two operations: moving this group of Blocks to another DataNode, or marking this group of Blocks as invalid.

Implementation

1
2
3
4
boolean transferBlocks = false;
boolean invalidateBlocks = false;
Block blocks[];
DatanodeInfo targets[][];

Two flag variables are used to indicate which operation to perform;
Two arrays are used to store the objects being operated on.

Then, through constructor overloading, three constructors are provided: no-arg, move command, or invalidate command. Read access to each field is also provided.

Implements the Writable interface.

Summary

A simple wrapper for basic command information. Constructors accept parameters to determine the operation type and target objects; flag variables + array objects are used for the implementation.
Bundling a group of data together according to some semantics makes it more convenient to pass between functions and improves reusability.

LocatedBlock


A data pair containing a Block and information about the DataNodes where its replicas reside.

1
2
Block b;
DatanodeInfo locs[];

It is equivalent to maintaining a pointer from a logical Block to its storage locations, used to locate the physical position of a Block.

Implements the Writable interface.

DataNodeInfo


Contains the status information of a DataNode (total size, remaining size, last update time), uses its name (a custom UTF8-stored host:port) as its ID, and maintains references to all Blocks on it, organized as a search tree (TreeSet should be a red-black tree, sorted by the Block’s blkid).

Key Functions

Updates status information (a heartbeat). What a well-chosen name—it’s as if the DataNode is saying, “I’m still alive, my basic vital signs are as follows, blah blah blah.” Vivid and memorable.

1
2
3
4
5
public void updateHeartbeat(long capacity, long remaining) {
this.capacityBytes = capacity;
this.remainingBytes = remaining;
this.lastUpdate = System.currentTimeMillis();
}

Implements the Comparable and Writable interfaces (what’s more interesting is that blocks are not serialized)

DataNodeReport


A POJO. Haha, just thinking about the origin of this name makes me laugh—Uncle Martin is truly talented with a uniquely quirky sense of humor. Looking at its fields, it’s clear that this is a simple encapsulation of heartbeat source + heartbeat information. Each field has package-level access, and several public read methods are also provided.

1
2
3
4
5
String name;
String host;
long capacity;
long remaining;
long lastUpdate;

The ID of DataNodeInfo plus heartbeat information.
Finally, there is a toString function—after all, it’s in the reporting business.


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg