木鸟杂记

大规模数据系统

Debugging and Running Hadoop-0.1.0 Code

Previously, I ambitiously downloaded the Hadoop source code from GitHub, aiming to read through it to broaden my horizons, and even wanted to write a simplified version myself.
Unexpectedly, reading the code felt like chewing wax. After studying the basic RPC, I put it aside.

Author: 木鸟杂记 https://www.qtmuniao.com, please indicate the source when reposting

Later, with the mindset that nothing is impossible, only the approach matters, I decided to learn it again from a different angle.
This time, the plan is as follows:

  1. First, debug the code locally.
  2. Then, debug module by module.

Today, I’ll mainly talk about the exploration of the first part, which took two evenings :>.

  • git fork the source code, git clone it to local, and use git tag --list to view all tags to find version 0.1.0.
    Then check it out.
  • Configure JAVA_HOME, HADOOP_HOME, and PATH environment variables.
  • Download and install ant, and configure environment variables (ANT_HOME, PATH).
  • Create a new Java Project in Eclipse, then select Java Project From Existing Ant Buildfile.
    Open from the existing folder, select the hadoop folder, and it will automatically recognize build.xml, then create an Ant project.
  • Right-click build.xml, run as -> ant built, select the second option to configure (choose the correct running folder and build file; if
    it doesn’t work, execute the command ant in the directory containing build.xml, and then the build folder will be generated, and necessary conf files will be generated based on the conf folder template.
  • Modify the hadoop script (in ${HADOOP_HOME}/bin). To avoid damaging the original file, cp hadoop hadoop-debug.
    Then in this script, add some debugging parameters to the last line of the running command. The modification is as follows
    • Before modification
    1
    2
    # run it
    exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"
    • After modification
    1
    2
    3
    # run it
    exec "$JAVA" -Xdebug -Xrunjdwp:transport=dt_socket,address=9090,server=y,suspend=y
    $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"
    • -Xdebug indicates debugging, -Xrunjdwp introduces the following parameters. transport: communication method, address: port.
  • Then choose a module to run, such as NameNode: bin/hadoop-debug namenode -format.
  • In Eclipse, select Debug -> Debug configurations -> Remote Java Application, choose the corresponding project, localhost
    and the corresponding port. Remember to set breakpoints in the code, then you can run happily.

It’s getting late, that’s all for today.


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg