Common Misconceptions in Database Interviews

Due to business needs, I have recently interviewed many database candidates. I have noticed that many candidates share some common misconceptions when preparing for interviews. I would like to take this opportunity to share some advice from my perspective as an interviewer. This article focuses on four misconceptions: weak coding fundamentals, weak engineering literacy, lack of communication and thinking skills, and fragmented knowledge frameworks.

Author: 木鸟杂记 https://www.qtmuniao.com/2023/08/21/database-interview-myth Please credit the source when reposting.

Myth 1: Weak Coding Fundamentals

Databases are an extremely engineering-intensive field, so coding is a very important part of the interview evaluation. It is fair to say that if you code well, you may pass the interview even with a slightly weaker background; but the opposite is not true. I have encountered candidates who can talk about the details of distributed transactions with great fluency, yet cannot even write a basic data structure like a linked list properly. In such cases, even if I wanted to pass them, I am powerless.

When it comes to coding, we do not expect candidates to solve extremely difficult algorithmic problems. Instead, we focus more on basic, engineering-oriented directions.

Basic data structures. Basic data structures include linked lists, hash tables, trees, and other common data structures and related algorithms. Ideally, you should be able to implement them quickly on your own and understand the characteristics of each. Graphs are occasionally tested, but relatively infrequently, and the testing points are quite fixed—usually the most basic traversals (BFS, DFS), shortest path, minimum spanning tree, and topological sort, among three or four fixed algorithms.
Common algorithms. The most basic are several common sorting algorithms. Understand their time complexity, space complexity, and basic pros and cons. Another concept that beginners often find difficult is backtracking algorithms based on trees. This essentially stems from an insufficient understanding of recursion. Of course, it took me a long time to grasp this idea back then, but generally we do not test very difficult backtracking problems. Additionally, there are binary search and divide and conquer. Their core idea is to break down the problem domain and solve subproblems: binary search continuously narrows the problem scope, while divide and conquer not only narrows the scope but also combines the solutions of subproblems to solve the original problem. Dynamic programming is relatively difficult and usually not tested; even if it is, only very simple versions are asked, such as the most basic 0/1 knapsack problem.
Engineering problems. When I interviewed at other companies, the most frequently asked engineering-oriented question was LRU, because it combines multiple data structures such as hash tables and linked lists, and has many corner cases. It is a question that really tests coding fundamentals. However, since many people have memorized it by now, its discriminating power has decreased. Another one is the prefix tree, also known as the Trie. This is asked less frequently, and the code involves many details. I personally tend to ask about the implementation of basic data structures, such as implementing a hash table. This question has many possible follow-ups, such as resizing, thread safety, and so on. Additionally, questions involving file I/O, byte manipulation, and multithreading are occasionally asked. Since these APIs are usually hard to remember, I usually allow candidates to look up anything on the internet, as long as I can watch. In fact, observing how a candidate solves problems and searches for information is itself a form of evaluation.

Myth 2: Weak Engineering Literacy

Modern database codebases often exceed hundreds of thousands of lines. Without good coding standards, they quickly become unmaintainable as the project evolves. Therefore, if candidates pay attention to coding style and demonstrate engineering literacy when writing code, it is a significant plus; conversely, if they name variables arbitrarily without basic abstraction and reuse, it is a significant minus.

Another point that strongly reflects engineering literacy is: the approach to solving practical problems. For example, when interviewers assess engineering code, the problem may be vague and broad. At such times, if the candidate can:

Use some computer science common sense to clarify ambiguous areas. For example, if the problem does not specify whether it is memory-based or file-based, we can first assume everything is in memory, then mention to the interviewer that methods like WAL and snapshot can be used to handle crashes.
Make some simple assumptions to narrow the problem domain. For example, when the data type is uncertain, we can assume the simplest integer type; when the file data format is uncertain, we can assume one data item per line.
Abstract out basic modules and first establish the macro-level thinking. This is the minimum viable prototype mindset commonly used in project work: modularize each sub-problem (such as file I/O, sorting, etc.), use the simplest implementation or even temporarily leave it as a stub, so as to focus on building the main logic, quickly achieve a working prototype, and then discuss with the interviewer to optimize the necessary modules.

Among these, the last point is particularly important, because interview time is usually very limited. If you quickly get bogged down in some engineering detail—such as how to read/write files, how to choose buffer sizes, how to handle offsets—you may either run out of time before finishing, or get stuck due to nervousness and be unable to extricate yourself. At such moments, you must pay attention to using the minimum viable prototype, top-down stepwise refinement, and other methodologies, because these are also common ways we complete tasks in actual work, and they very much reflect one aspect of engineering literacy.

This article is from my continuously updated database and distributed systems column: System Daily Notes (《系统日知录》). There are still two short sections remaining; welcome to visit the column to read more.

database-interview-myth.png

我是青藤木鸟，一个喜欢摄影、专注大规模数据系统的程序员，欢迎关注我的公众号：“木鸟杂记”，有更多的分布式系统、存储和数据库相关的文章，欢迎关注。关注公众号后，回复“资料”可以获取我总结一份分布式数据库学习资料。回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群，可以添加我的微信号：qtmuniao，我拉你入群。加我时记得备注：“分布式系统群”。另外，如果你不想加群，还有一个分布式系统和数据库的论坛（点这里），欢迎来玩耍。