Previously, I ambitiously downloaded the Hadoop source code from GitHub, aiming to read through it to broaden my horizons, and even wanted to write a simplified version myself.
Unexpectedly, reading the code felt like chewing wax. After studying the basic RPC, I put it aside.
Author: 木鸟杂记 https://www.qtmuniao.com, please indicate the source when reposting
Later, with the mindset that nothing is impossible, only the approach matters, I decided to learn it again from a different angle.
This time, the plan is as follows:
- First, debug the code locally.
- Then, debug module by module.
Today, I’ll mainly talk about the exploration of the first part, which took two evenings :>.
git forkthe source code,git cloneit to local, and usegit tag --listto view all tags to find version 0.1.0.
Then check it out.- Configure JAVA_HOME, HADOOP_HOME, and PATH environment variables.
- Download and install ant, and configure environment variables (ANT_HOME, PATH).
- Create a new Java Project in Eclipse, then select Java Project From Existing Ant Buildfile.
Open from the existing folder, select the hadoop folder, and it will automatically recognize build.xml, then create an Ant project. - Right-click build.xml, run as -> ant built, select the second option to configure (choose the correct running folder and build file; if
it doesn’t work, execute the commandantin the directory containing build.xml, and then the build folder will be generated, and necessary conf files will be generated based on the conf folder template. - Modify the hadoop script (in ${HADOOP_HOME}/bin). To avoid damaging the original file,
cp hadoop hadoop-debug.
Then in this script, add some debugging parameters to the last line of the running command. The modification is as follows- Before modification
1
2run it
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"- After modification
1
2
3run it
exec "$JAVA" -Xdebug -Xrunjdwp:transport=dt_socket,address=9090,server=y,suspend=y
JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"-Xdebugindicates debugging,-Xrunjdwpintroduces the following parameters. transport: communication method, address: port.
- Then choose a module to run, such as NameNode:
bin/hadoop-debug namenode -format. - In Eclipse, select Debug -> Debug configurations -> Remote Java Application, choose the corresponding project, localhost
and the corresponding port. Remember to set breakpoints in the code, then you can run happily.
It’s getting late, that’s all for today.
