I joined an LLM company in early 2024, having previously worked in the infra industry (databases, storage, etc.), so I have some very basic insights on switching careers. I haven’t shared on Bilibili for a long time; this live stream forced me to get back in gear. I answered some of your questions and bridged a bit of the information gap. This post is a slightly more organized summary of some points mentioned during the stream, with some materials I find valuable attached at the end.

Bilibili live stream: https://www.bilibili.com/video/BV1uckJBkEto

Cover image

Author: Muniao Miscellany https://www.qtmuniao.com/2026/01/25/llm-switch/ Please indicate the source when reposting.

Category Divide

LLM-related work ranges from the bottom layer to the top layer: infra side, data side, model side | Agent layer, application layer.

For those with distributed systems backgrounds, the infra side and data side are relatively adjacent and easier to transition into. The model side has a high barrier to entry, requiring either a PhD with good publications or several years of experience at recognized companies. The Agent layer and application layer are very similar to traditional backend development—you only need to understand the capability boundaries and practical usage of LLMs. It’s still largely a blue ocean at the moment.

AI infra in a broad sense can be categorized by demand into inference infra, training infra, and data infra. Beyond distributed systems and computer science fundamentals, inference infra also requires some understanding of model principles, as well as familiarity with and ability to tune parallel computing and mainstream inference frameworks. There are far fewer companies training large models than companies using them, so inference infra positions outnumber the other two. Training infra is actually quite similar to inference infra, but requires strong consistency (i.e., reproducibility). Data infra mainly involves building various stages of data crawling and cleaning, distilling some tool libraries.

Data Engineering

I mainly work on data engineering for LLMs, so let me expand a bit. Another way to describe data engineering is building data cleaning pipelines around model training needs—or, more bluntly, washing data.

The main sources of pre-training data are crawling, purchasing, and generating. Data obtained through the first two methods almost always requires cleaning before use. At a high level, cleaning mainly involves extracting structured information, extracting semantic information, filtering according to requirements, and converting to the target format. During this process, a frequently needed operation is large-scale deduplication. Depending on granularity, deduplication can be exact deduplication based on values, or fuzzy deduplication based on semantic information.

The infrastructure around this can also be divided into computing and storage. Computing generally uses Spark and Ray; storage mainly uses file storage and object storage. On top of these, datasets suitable for training and cleaning are abstracted. My column and WeChat public account share more details—welcome to subscribe.

.png

Model Usage

The simplest way to use a model is through Prompts. Slightly more complex approaches include RAG or Agent—these don’t modify model parameters. If you have higher accuracy requirements for tasks in a specific domain, you may need a large amount (10k+) of high-quality data to fine-tune or do RL based on open-source models, which actually modifies model weights. Therefore, whether model parameters are modified can be seen as a simple dividing line for usage threshold—of course, this is in terms of usage depth.

If you’ve read the GPT series papers, you’ll know that the original intention of large models was to be general-purpose, minimizing the need to fine-tune for different downstream tasks. Although large models have some generalization capability, their understanding of unseen specialized data is still limited. Therefore, in many specialized domains with high precision requirements (such as molecular biology, trading sequences, etc.), domain-specific data is still needed to adjust the model’s capabilities.

This corresponds to the depth of demand that requires fine-tuning. However, most tasks that need to be deployed currently require breadth, i.e., complexity. The simplest approach is to use RAG to provide appropriate data as context prompts to the model according to requirements. Slightly more complex, you can use fixed rules for workflow orchestration (using code or tools like n8n), using the model as a component in one part of the workflow. Even more complex is letting the model autonomously choose paths and call tools when solving problems—this is what people now commonly call Agent systems.

Building an Agent system first requires a base model with strong decision-making capabilities and tool usage abilities, and then the user needs to dynamically provide the minimal yet complete context to the model at the right time in some way. Manus and Anthropic have done a lot of exploration in these areas and published many blog posts, attached below.

Vibe Coding

Vibe Coding is the Agent direction where LLMs have first achieved large-scale adoption in industry. I suggest all programmers who haven’t tried it yet give the most cutting-edge Code Agents a spin. Only by trying will you know what current model capabilities can and cannot do. We shouldn’t be superstitious or afraid, but we absolutely shouldn’t underestimate the iteration speed of programming Agents. In an era of rapid change, the “feel from the front lines” is what matters most.

For example, two or three months ago I found it very unwieldy—lacking in both instruction following and user interaction. However, recently (January 2026) it has become quite smooth. Basically, as long as I can provide context in a reasonable way and describe intent precisely, Claude Code already works very well in most scenarios. I even learn many new coding patterns and organizational approaches from it.

On how to control the quality of code output by LLMs, I’ll provide a simple entry point: whether the code will need to be maintained by humans in the future. If code written by an LLM still needs human review, it should still be held to the standards of traditional software engineering (abstraction, encapsulation, reducing complexity). You can ask it to refine the code through several rounds—human bandwidth is limited, after all. But if the code is not intended for human maintenance, or is only temporary or even one-off, there’s no need to treat it too much as a white box. As long as functionality is satisfied and it passes black-box testing, it’s usable. Worst case, next time there’s a similar requirement, don’t reuse the old code—just have the model rewrite it.

Recently, chatting with friends who left to start businesses and listening to many podcasts about Vibe Coding, I have a feeling: the large-scale adoption of Code Agents brings too many brand-new possibilities. For example: rethinking the lifecycle of code, workflow orchestration based on natural language.

Regarding the lifecycle of code, a core observation is that if we can generate code at 10x efficiency in the future, we can also throw away code at 10x speed. That is, code is no longer a handicraft, but can be an industrial product—even a disposable product. This will greatly change our understanding of code. For example, for an event, marketing personnel can directly generate a one-off promotional webpage through a Code Agent. Compared to static posters made with traditional tools like Photoshop, code-based webpages offer more customization, dynamism, and interactivity. Even if they’re thrown away after one use, it’s fine. This order-of-magnitude cost reduction will change the way we use “software in a broad sense.”

Regarding natural language workflow orchestration, those familiar with databases know that databases essentially allow users to compose basic operators to orchestrate data flows through SQL. Abstracting one step further, it’s letting users orchestrate workflows in some DSL (Domain Specific Languages—precise descriptive languages specific to a domain). Code Agents can push this one step further: using natural language (high-dimensionality, fuzziness) to orchestrate common workflows. Roughly, it lets operations staff bypass data warehouse engineers and directly conduct experiments to gain insights. Anthropic recently proposed concepts similar to MCP and Skills, which are precisely pioneers in implementing this philosophy. MCP defines external tools; Skills provide combos based on these tools. Each Skill registers its metadata to the model in the form of a summary, and the model uses these summaries for dynamic planning and principled execution when performing tasks. Thus, when implementing in each specific scenario, what needs to be done is building and curating your own set of Skills.

Therefore, Vibe Coding is an inevitable trend, because it genuinely lowers barriers and improves productivity. Regardless of whether we craftsmen who write code like it or not, having most of our work replaced by industrialization is inevitable. As the earliest to see this trend, why not actively embrace it.

Multimodal Trends

The multimodal capabilities of LLMs can be divided into two parts: understanding and generation. Improving understanding capabilities requires more high-quality image-text data, while improving generation capabilities requires stronger model fusion.

The backbone network for multimodal understanding is still based on Transformers, but at the input end, image tokens are fed into the language model in various ways for training. Therefore, this adapter from image to text sequence (Vision Tower) is very important. If its parameter count is too small and it’s frozen unreasonably, it easily becomes a bottleneck. Therefore, under the current architecture, to improve multimodal understanding quality, data is the top priority, followed by the VT. There are many practical scenarios for multimodal understanding, such as various test-taking scenarios (K12, civil service exams, graduate school entrance exams), object recognition, webpage replication, image replication (SVG), logic and physical reasoning based on images, etc.

The backbone model for multimodal generation is diffusion models—completely different models from Transformers. But they are more stunning in terms of generation effects, so diffusion models are very suitable for creative content. However, their problem is that applying semantic and physical constraints is relatively difficult. This manifests in usage as early image generation tools often struggling to support precise multi-turn dialogue-based modifications, and frequently generating things like hands with seven or eight fingers that don’t conform to physical reality. There are at least two development directions for better generation: one is deeper fusion with language models to meet the needs for semantic understanding and instruction following. Google’s Gemini series does very well in this regard. The other is a complete paradigm shift, which is what many cutting-edge researchers have recently been calling “world models.” That is, constructing a new paradigm that allows LLMs to truly understand and explore the physical laws and boundaries of this world, rather than the current situation where large models are basically a ghost born in human “language space.”

Digital Nomads & Students

The maturation of LLMs and various downstream productivity tools, on top of the Internet as “basic infrastructure,” has given individuals a super leverage of “intelligence” outsourcing. Therefore, the concept of OPC (one-person company) has recently become very popular. Even many cities (such as Suzhou) have directly launched subsidies targeting this concept. So, if you have good ideas and creativity, don’t hesitate to boldly be the first to try.

As for students and new graduates, besides job hunting, you can also boldly explore the possibilities of current LLM adoption. Of course, being able to use LLMs more proficiently and being familiar with the relevant ecosystem is itself very attractive to employers. Opportunities to build large models will become fewer and fewer, and increasingly concentrated in a few companies—you don’t need to compete too hard in this direction. But how to make good use of LLMs may be an opportunity that will bloom everywhere.

References

Here are some materials I’ve read and found valuable:

Math foundations: MIT has a classic linear algebra course https://www.bilibili.com/video/BV1rH4y1N7BW.
Paper roadmap: A public course at Princeton, https://princeton-cos597r.github.io/. I’m also updating my interpretations in my column (https://xiaobot.net/p/system-thinking)—welcome to subscribe.
End-to-end, from scratch: Andrej Karpathy’s video series, plus source code https://github.com/karpathy/nanoGPT
Comprehensive LLM doc: https://s3tlxskbq3.feishu.cn/docx/NyPqdCKraoXz9gxNVCfcIFdnnAc
ML systems: Tianqi Chen’s public course https://mlsys.org/
Manus context engineering: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus
Anthropic agent-related blogs: https://www.anthropic.com/engineering
Understanding Skills: https://mp.weixin.qq.com/s/Bl4ODUxvwO8pYu9nXVmjuQ
A great translated podcast, Cross-Border Drop-In Plan, with lots of firsthand sharing from LLM big shots: https://www.xiaoyuzhoufm.com/podcast/670f3da40d2f24f28978736f

我是青藤木鸟，一个喜欢摄影、专注大规模数据系统的程序员，欢迎关注我的公众号：“木鸟杂记”，有更多的分布式系统、存储和数据库相关的文章，欢迎关注。关注公众号后，回复“资料”可以获取我总结一份分布式数据库学习资料。回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群，可以添加我的微信号：qtmuniao，我拉你入群。加我时记得备注：“分布式系统群”。另外，如果你不想加群，还有一个分布式系统和数据库的论坛（点这里），欢迎来玩耍。