木鸟杂记

大规模数据系统

2023 Year in Review — Adversity Spurs Change

IMG_8879.JPGIMG_8879.JPG

2023 passed in the blink of an eye. In hindsight, if I had to sum it up in one word, it would be — adversity spurs change.

By “poor” I don’t mean literally destitute, unable to afford food; it’s closer to the “hardship” in the saying “when in adversity, attend to your own virtue in solitude.” Many beliefs I had long relied on could no longer be sustained, so I went through a painful process of reshaping. This wasn’t necessarily a bad thing, but the passive mental tug-of-war at the time still feels agonizing in retrospect. Of course, it was precisely these adversities that forced me to break free from “groupthink” and seek new paths (think different). Though painful in the moment, there was something to be gained — after all, everyone must find their own way.

So what exactly changed this year?

Author: Muniao’s Notes https://www.qtmuniao.com/2024/01/04/2023-summary Please cite the source when reposting

First, in terms of life: due to family reasons, I moved to the UK mid-year. This led to changes at work — I resigned, planning to find something new in the UK. Bolstered by my experience, I was quite optimistic at first, only to be blindsided by reality — completely disoriented. Eventually, I painfully discovered that finding a job in my original field (storage, distributed systems, databases, and other large-scale data systems) was no easy task. Even when I lowered my expectations and applied for data and backend roles, I heard nothing back. Of course, I could list many factors:

  1. Infra-heavy jobs are scarce in the UK
  2. I needed company sponsorship for a work visa
  3. I had no UK study or work experience
  4. My spoken English wasn’t quite adequate
  5. I wasn’t familiar with the UK interview process

But deep down, I knew that despite my preparation and the many applications I sent, I never fully swallowed my pride and went all-in — grinding English, thoroughly preparing for interviews, applying across all directions, and so on. At heart, I still didn’t want to abandon my previous infra career path. Of course, this might also have been an excuse to cover my discomfort with the gap between expectation and reality and my resistance to integrating into a new environment. Either way, I underestimated how hard it would be to settle in here and failed to prepare for it.

Not to mention, the small town I’m in has far too few good places to eat (rolls eyes)! And the UK barely gets any daylight (short days at this latitude aside, it’s constantly overcast and rainy). Over time, it really takes a toll on your mental health. Months without a job, days spent in a tiny dark room, plus the daily routine of grocery shopping and cooking alone made the second half of the year especially difficult. The imagined leisurely exploration never materialized — I simply didn’t have the energy. One of the few bright spots was making some wonderful Chinese friends through badminton (posting on Xiaohongshu abroad really can bring like-minded people together), playing and eating together, barely managing to keep going.

In the first half of the year, while still at my company, I worked on a few important features and was quite happy about it. What left the deepest impression were execution plan pushdown and distributed transaction research. Execution plan pushdown is the bridge connecting the compute layer and the storage layer — a fairly critical module. Thanks to Siwang for the trust. Taking this as a starting point, I extended upward and downward, drawing broadly from various sources, which illuminated some previously fuzzy threads in databases and led me to relevant papers and projects — I grew a lot. On distributed transactions, although I didn’t contribute much, I participated in the research and discussions, and while reading and sharing the relevant chapter of DDIA, I think I finally made sense of all the messy and bizarre consistency models, and understood why “clocks” are the “most important” thing in distributed systems.

In the second half of the year in the UK, unemployed, anxiety was my constant companion, but I still managed to get some things done. The two super, super long chapters of DDIA (yes, I’m talking about batch processing and stream processing) are fully drafted, though the stream processing chapter hasn’t been shared on Bilibili yet. A rough count using wc -m on the first eleven chapters puts the word count at nearly 300,000 Chinese characters. Given that every word was typed by hand, that’s quite substantial. Many passages required understanding, deliberation, and equivalent transformation (those who have done translation will know what I mean), and the process inevitably involved struggle. The last three chapters are genuinely long-winded, so on Bilibili people often ask whether the final chapters will still be updated. I always insist they will, but deep down I know I’ve been teetering between “ending” and “abandoning unfinished.”

Side note: If you’re interested in DDIA but feel like a cat looking at an egg — don’t know where to start — you can subscribe to my column “DDIA Study Group”. I’ll answer questions about the book’s details every week. My experience in large-scale data systems (infra) allows me to provide industrial examples and perspectives on some of the book’s more obscure details. Of course, it’s less about “answering” and more about exchanging ideas and learning from each other.

Other than that, I redid the MIT 6.824 labs (the course has been renamed now). This was my third time through, and it went much more smoothly. This gave rise to more experience, insights, and takeaways, which I collaborated on with the awesome roseduan and compiled into a “course” (in quotes because “notes” or “reflections” might be more accurate). Those interested in Raft and Go might want to check it out.

Then there’s updating my large-scale data systems column “System Daily Notes”. I started this Xiaobot column this year; it now has nearly ninety articles and over a hundred subscribers. At the beginning, the direction wasn’t very clear, so I gave it a rather vague name. But later, as I sorted through my past work experience, continued sharing DDIA, and thought about what I wanted to do in the future, I solidified the positioning as “large-scale data systems”. In today’s world, any digitalized system cannot do without data, and massive amounts of data at that; hence we must turn to distributed systems, i.e., large-scale. In this domain, I’m trying to piece things together bit by bit through work, reading, and sharing, slowly putting down roots and carving out a small place for myself. Below are some articles selected from the column and published on WeChat Official Account:

  • A Little Experience Building and Maintaining the World’s Strongest Object Storage System
  • Firebolt: How to Assemble a Commercial Database in Eighteen Months
  • The Modeling Philosophy Behind the ER Model
  • NUMA-Aware Execution Engine Paper Interpretation
  • The Grand Unification of Data Processing — From Shell Scripts to SQL Engines
  • Life Engineering (Part 1): Multi-round Decomposition
  • A Few Common Misconceptions in Database Interviews
  • A Comprehensive Analysis of Facebook Velox’s Operating Mechanism
  • Writing Good Code: My Three Codes
  • Graph Database Series (Part 3): Graph Representation
  • Distributed Graph Database Series — Graph Models and Cypher

Even as large language models dominate today, the data pipelines behind their ecosystems — collection, storage, computation, and utilization of massive data — still lack standardization, and will remain a major focus of this column going forward.

小报童专栏.png小报童专栏.png

As a programmer, I wrote more words than code this year. After all, unemployed in the second half, I had to make a living by writing. Although the income wasn’t much, the gains were substantial. Writing is essentially a process of honing one’s thinking. Though slow, when integrated over time, the total accumulation is quite considerable.

Near the end of the year, I discovered a content creator: KeDaibiao Lizheng. He interviews many interesting guests — entrepreneurs, returnees, the financially independent, those retired for two years, engineers turned investors — and has a great sense of conversational pacing. I gained a lot of inspiration. After watching some videos, a few points stuck with me:

  1. Framework thinking. For example, knowledge trees and learning by analogy.
  2. Confusing tools with goals. For example, making money and freedom.

Although these two points are oft-repeated and I’ve heard them countless times — even the first one I’ve written about many times — why did they still resonate? Let me set the stage.

The interesting thing about how the human brain understands things is that you need to accumulate enough experiences before you can abstract first-order principles; only after gathering enough first-order principles can you stack second-order principles on top; repeating this cycle leads to a deep, all-encompassing clarity. This abstract tree-like organization is very similar to a commonly used data structure in our database field — the B+ tree. We know that B+ trees have a property: all insertions happen at the leaf nodes. These videos were the same — each guest’s experiences added leaf nodes to my mental tree. Reaching a certain threshold triggered a “cascading split” in my mental tree (common in B+ trees), which in turn led to answers for some deep questions I had previously struggled with in vain.

Even knowing these truths, I may still relapse in the future, still fall into the trap of “scholars disparaging each other” (peer pressure), but I believe the convergence speed will keep getting faster.

I didn’t travel much this year; the only new places were the Southern Xinjiang Grand Circuit and the UK. And because my mind was so tense, the latter was mostly a superficial passing through of scenery — I had no capacity to savor it. In Southern Xinjiang, two places left a deep impression: the Ancient City of Gaochang and the Old City of Kashgar.

高昌故城.jpg高昌故城.jpg

Unlike other ancient Western Region kingdoms, Gaochang was a Han regime — descendants of Han dynasty garrison troops. The city was eventually destroyed by Mongol cavalry; alas, ten thousand palaces turned to dust. Our good fortune was hitching a ride with a tour group that had hired a veteran guide. As the shuttle bus stopped at different areas, the guide would casually point out: “This spot was once such-and-such place, with such-and-such history.” His tone was light and breezy, yet his love for this land and its history overflowed uncontrollably.

If the Ancient City of Gaochang is a “dead” ancient city, then the Old City of Kashgar is a “living” old city — old yet undying, vibrant and full of life. I uploaded a photo with this title to Tuchong and casually submitted it to a contest hosted by China Daily’s New Media Center, and for the first time, I actually won a photography competition.
图虫获奖.png图虫获奖.png

Things in this world are hard to describe and harder to convey. The joy of an accidental success often surpasses the fulfillment of a carefully planned dream. So I still hope I can slowly learn to “relax”, do more “useless” things with good intentions, and form more “useless” good connections. Speaking of which, I’m reminded of a time when a student added me on WeChat. Because they couldn’t get to the “point” I wanted, I — busy at the time — rather rudely cut them off. I truly regret that.

The small towns in the UK combine the scenery and leisure of rural China with the convenience and aesthetics of the city — truly livable. But at that time, I was constantly haunted by the subtle anxiety of being unemployed. Unable to lighten my own heart, naturally I couldn’t fully savor this tranquility. May there be another opportunity in the future.

IMG_8516.JPGIMG_8516.JPG

In the end, humans are always contradictory. For example, we need social interaction to anchor ourselves, yet we must not let conformity constrain us. Thus everything requires striking a balance. The ancients called it “the Doctrine of the Mean”; today we call it “relaxation”. May 2024 bring everyone the courage to be true to what their hearts desire!

Last year’s review: 2022 Year in Review — Fulfillment and Confusion


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg