木鸟杂记

大规模数据系统

Hadoop Source Code Reading — MapReduce (Part 1): Basic Concepts and Interfaces

Overview

Sorting out some basic interfaces and classes involved in the MapReduce framework.

Author: 木鸟杂记 https://www.qtmuniao.com, please indicate the source when reposting

RecordReader: Reads key-value pairs from input files. Here, does it refer to the input for map or the input for reduce? The interface has three functions: next(Writable key, Writable value), getPos(), and close(). From this, the interface is similar to an abstract iterator. InputFormat implements this interface.

RecordWriter: Writes key-value pairs to output files. OutputFormat implements this interface. It contains the functions: write(WritableComparable key, Writable value) and close(Reporter reporter).

OutputCollector: Passed as a parameter to Mapper and Reducer to output result data. This interface has only one function: collect(key, val).


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg